From the course: PySpark Essential Training: Introduction to Building Data Pipelines

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Downloading a dataset

Downloading a dataset

- [Instructor] Now let's download a publicly available dataset that we can use throughout this course. I live in New York City, home of the iconic yellow taxis. Let's say a data team wants insights into those rides, like pickup spots, ride length, or fare amounts. New York City offers a free official dataset for this. It's a popular choice for learning data analytics. It's big but manageable. To download the New York City taxi data, go to the New York City data site. We want to download three files from here. First, under the heading Taxi Zone Maps and Lookup Tables, download the Taxi Zone Lookup Table CSV file. Then scroll down a little to the heading that says 2025, and click to expand it. Click on the links for Yellow Taxi Trip Records for both January 2025 and February 2025, and the downloads will start automatically. The files sizes are just over 50 megabytes, so these files should download pretty quickly. Make sure to store all three files in a location where it can easily…

Contents