From the course: Cloud Hadoop: Scaling Apache Spark

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Caching and the DAG

Caching and the DAG

- [Instructor] We're going to open our next notebook to understand even more about how transforms and actions are executed in Spark. So in the work space, we're going to import, we're going to bring in our Caching Notebook, and import it, and now we're ready to work. So we're going to look at transformations, actions, and a little bit about visualization too. So, we're going to load some sample data into a data frame. And this is sample data that comes with Databricks, it's real easy to work with. So, let's scroll down, and let's look at what this code does. So the datapath here is from the Databricks sample data sets. We're going to take a diamonds dataset, so this is about characteristics of diamonds, and we're going to create a variable called Diamonds, we're going to use a sequel context, and we're going to read, we're going to format a CSV file. And then of course, we've got the continuation character at the end.…

Contents