From the course: Cloud Hadoop: Scaling Apache Spark

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Optimize cloud Spark virtual machines

Optimize cloud Spark virtual machines - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Optimize cloud Spark virtual machines

- [Instructor] So here we are in the AWS EMR console, and I have a cluster spun up from an earlier movie, your console might look different. I have a couple other clusters that I was working with. And inside of the cluster, if you look at it, you can see that we have one master node and two worker nodes. Notice you have the ability to resize, which is something that of course, we did quite a lot of when we were performance tuning. Basically, you end up with a grid of parameters at the different levels, whether it's the VMs or whether it's the Spark settings or the VariantSpark settings. And you go through all those different parameters to figure out what is going to be the most effective for your workload. So how do you do that? Well, first of all, you have to have a easy way to sort of run the workload over and over and I really do like Jupyter Notebooks. The reason is, they're more visual than Bash scripts. So I found…

Contents