From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Caching and the DAG - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Caching and the DAG
- [Instructor] We're going to open our next notebook to understand even more about how transforms and actions are executed in Spark. So in the work space, we're going to import, we're going to bring in our Caching Notebook, and import it, and now we're ready to work. So we're going to look at transformations, actions, and a little bit about visualization too. So, we're going to load some sample data into a data frame. And this is sample data that comes with Databricks, it's real easy to work with. So, let's scroll down, and let's look at what this code does. So the datapath here is from the Databricks sample data sets. We're going to take a diamonds dataset, so this is about characteristics of diamonds, and we're going to create a variable called Diamonds, we're going to use a sequel context, and we're going to read, we're going to format a CSV file. And then of course, we've got the continuation character at the end.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Tour the Databricks Environment4m 36s
-
(Locked)
Tour the notebook5m 29s
-
(Locked)
Import and export notebooks2m 56s
-
(Locked)
Calculate Pi on Spark8m 30s
-
(Locked)
Run WordCount of Spark with Scala4m 59s
-
(Locked)
Import data2m
-
(Locked)
Transformations and actions3m 21s
-
(Locked)
Caching and the DAG6m 49s
-
(Locked)
Architecture: Streaming for prediction3m 51s
-
(Locked)
-
-
-
-