From the course: Complete Guide to Databricks for Data Engineering
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
What is Spark SQL? - Databricks Tutorial
From the course: Complete Guide to Databricks for Data Engineering
What is Spark SQL?
- [Instructor] Spark SQL is one of the most favorite module for any data engineer in the PySpark world. Let's understand what is Spark SQL? Spark SQL is a module in the Apache Spark that helps you to process the data, clean the data, analyze the data using the SQL-like interface. The idea behind this is that a lot of data engineers has came from an SQL background, so if we have something where we can use the SQL itself, it will be very easy for these data engineers to work on, and that is why the PySpark and the Spark world has the Spark SQL. The Spark SQL is built on the top of the Spark's core distributed computing engine, and the idea is this Spark SQL is going to take the advantage of the distributed processing of Spark and make all your queries highly scalable and high-performing. It feels like you are writing an SQL, but eventually, it is working under the hood as a distributed Spark job. If we talk about some of the key features of the Spark SQL, the first is query interface…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
(Locked)
What is Spark SQL?5m 47s
-
(Locked)
Create temporary views in Databricks10m 17s
-
(Locked)
Create global temp views in Databricks7m 25s
-
(Locked)
Use Spark SQL transformations7m
-
(Locked)
Write DataFrames as managed tables in PySpark9m 26s
-
(Locked)
Write a DataFrame as external table in PySpark8m 31s
-
(Locked)
-
-
-
-
-
-
-
-
-