From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Serverless Spark with Dataproc Notebook - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Serverless Spark with Dataproc Notebook
- [Instructor] In the Spark ecosystem, there are a number of execution environments. As we've seen in other movies in this course, we can use GCP Dataproc for a managed Spark environment. A relatively new capability is one that many of my customers have found super useful, and I wanted to share a preview of it for you here. It's called Dataproc Jupyter Lab Plugin for serverless batch and interactive notebook sessions. That's a lot of words. What does that mean? It means being able to, from a Jupyter Notebook within GCP, scale out a workload when you need to have more than one computer involved in the analysis. What I've done to give you an intro of this is I've shortened this rather long tutorial so that you can see what it looks like and hopefully be compelled to try this tutorial in full yourself. So the first step is to set up a Vertex AI VM workbench instance in a Google Cloud demonstration project. Once that's set up, then you're going to access Jupyter Lab by clicking the link…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Scale Spark on the cloud by example5m 11s
-
(Locked)
Build a quick start with Databricks AWS6m 50s
-
(Locked)
Scale Spark cloud compute with VMs6m 16s
-
(Locked)
Optimize cloud Spark virtual machines6m 5s
-
(Locked)
Use AWS EKS containers and data lake7m 8s
-
(Locked)
Optimize Spark cloud data tiers on Kubernetes4m 17s
-
(Locked)
Build reproducible cloud infrastructure8m 37s
-
(Locked)
Scale on GCP Dataproc or on Terra.bio8m 34s
-
(Locked)
Serverless Spark with Dataproc Notebook5m 25s
-
(Locked)
-