From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Optimize cloud Spark virtual machines - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Optimize cloud Spark virtual machines
- [Instructor] So here we are in the AWS EMR console, and I have a cluster spun up from an earlier movie, your console might look different. I have a couple other clusters that I was working with. And inside of the cluster, if you look at it, you can see that we have one master node and two worker nodes. Notice you have the ability to resize, which is something that of course, we did quite a lot of when we were performance tuning. Basically, you end up with a grid of parameters at the different levels, whether it's the VMs or whether it's the Spark settings or the VariantSpark settings. And you go through all those different parameters to figure out what is going to be the most effective for your workload. So how do you do that? Well, first of all, you have to have a easy way to sort of run the workload over and over and I really do like Jupyter Notebooks. The reason is, they're more visual than Bash scripts. So I found…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Scale Spark on the cloud by example5m 11s
-
(Locked)
Build a quick start with Databricks AWS6m 50s
-
(Locked)
Scale Spark cloud compute with VMs6m 16s
-
(Locked)
Optimize cloud Spark virtual machines6m 5s
-
(Locked)
Use AWS EKS containers and data lake7m 8s
-
(Locked)
Optimize Spark cloud data tiers on Kubernetes4m 17s
-
(Locked)
Build reproducible cloud infrastructure8m 37s
-
(Locked)
Scale on GCP Dataproc or on Terra.bio8m 34s
-
(Locked)
Serverless Spark with Dataproc Notebook5m 25s
-
(Locked)
-