From the course: PySpark Essential Training: Introduction to Building Data Pipelines
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Using SQL queries - Python Tutorial
From the course: PySpark Essential Training: Introduction to Building Data Pipelines
Using SQL queries
- [Instructor] Now that we have a temporary view called Taxi in our Spark session, let's write some SQL to query it. We'll start with a simple select statement to find all rights where the total right amount is more than 50 US dollars. Notice a few things here. First of all, instead of using a method on a data frame, we use the SQL method directly on the Spark session instance called Spark that we created at the very beginning of this notebook. Second, the SQL query is just a regular string in quotes. Third, we can simply access the taxi view by its name, no quotation marks needed here. And four, the SQL method returns a new data frame, which is why we need to use the show method again to display the result. Because the method returns a data frame, we can now start shading methods onto the SQL statement like this. This code snippet uses a SQL query to find all rows in the data where the total amount is more than $50. It then uses the filter method of the data frame API to filter the…