From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

What is Spark SQL?

What is Spark SQL?

- [Instructor] Spark SQL is one of the most favorite module for any data engineer in the PySpark world. Let's understand what is Spark SQL? Spark SQL is a module in the Apache Spark that helps you to process the data, clean the data, analyze the data using the SQL-like interface. The idea behind this is that a lot of data engineers has came from an SQL background, so if we have something where we can use the SQL itself, it will be very easy for these data engineers to work on, and that is why the PySpark and the Spark world has the Spark SQL. The Spark SQL is built on the top of the Spark's core distributed computing engine, and the idea is this Spark SQL is going to take the advantage of the distributed processing of Spark and make all your queries highly scalable and high-performing. It feels like you are writing an SQL, but eventually, it is working under the hood as a distributed Spark job. If we talk about some of the key features of the Spark SQL, the first is query interface…

Contents