Join now Sign in

From the course: Cloud Hadoop: Scaling Apache Spark

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Serverless Spark with Dataproc Notebook

Serverless Spark with Dataproc Notebook - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial Buy for my team

Serverless Spark with Dataproc Notebook

“

- [Instructor] In the Spark ecosystem, there are a number of execution environments. As we've seen in other movies in this course, we can use GCP Dataproc for a managed Spark environment. A relatively new capability is one that many of my customers have found super useful, and I wanted to share a preview of it for you here. It's called Dataproc Jupyter Lab Plugin for serverless batch and interactive notebook sessions. That's a lot of words. What does that mean? It means being able to, from a Jupyter Notebook within GCP, scale out a workload when you need to have more than one computer involved in the analysis. What I've done to give you an intro of this is I've shortened this rather long tutorial so that you can see what it looks like and hopefully be compelled to try this tutorial in full yourself. So the first step is to set up a Vertex AI VM workbench instance in a Google Cloud demonstration project. Once that's set up, then you're going to access Jupyter Lab by clicking the link…

Contents