Hadoop in Virtual Machines
     Richard McDougall, VMware
      Sanjay Radia, Hortonworks

       Hadoop Summit, 2012
Part 1
Say What?
•   VMs will just add overhead, due to I/O virt
•   VMs run on SAN, we’re all about local disks
•   Hadoop does it’s own cluster management
•   It’ll do resource management in 2.0
•   And even HA is coming to Hadoop

• And… what is the point, anyway?
But you’ve been asking…
• Can I virtualize my Hadoop, so that I can make
  it easier, quicker to get a cluster up and
  running
• Is it possible to run Hadoop on those spare
  machine cycles I have on hundreds/thousands
  of nodes?
• Can I make my system more available by using
  some of the standard HA features?
And the savvy are asking…
• Can I avoid having to install special hardware
  for the master services, like name-node, job-
  tracker?
• Can I dynamically change the size of the
  cluster to use more resources?
• Can I use VM isolation to increase security or
  guard against resource-intensive neighbors?
• Is it feasible to provision virtual-clusters, giving
  out one each to a business unit?
Ok, so first what about the concerns?
• Use your SAN? … if you want to.




   SAN Storage          NAS Filers       Local Storage

 $2 - $10/Gigabyte   $1 - $5/Gigabyte   $0.05/Gigabyte

     $1M gets:          $1M gets:          $1M gets:
   0.5Petabytes        1 Petabyte        20 Petabytes
  1,000,000 IOPS      400,000 IOPS      10,000,000 IOPS
    1Gbyte/sec         2Gbyte/sec       800 Gbytes/sec
Hadoop Using Local Disks

                          Task Tracker             Datanode
Other          Hadoop
Workload       Virtual
               Machine
                                            Ext4      Ext4    Ext4




Virtualization Host       OS Image - VMDK   VMDK     VMDK     VMDK



                Shared
                Storage
Hadoop Perf in a VM
(Ratio is elapsed time to physical, Lower Is Better)
                   1.2

                    1
 Ratio to Native




                   0.8

                   0.6

                   0.4                             1 VM
                                                   2 VMs
                   0.2

                    0
Evolution of Hadoop on VMs
VM                    VM                         VM             VM

     Current
     Hadoop:               Compute                    T1             T2

     Combined         VM                         VM
     Storage/Co            Storage                    Storage
     mpute


Hadoop in VM               Separate Storage       Separate Compute Clusters
- VM lifecycle             - Separate compute     - Separate virtual clusters
  determined                 from data              per tenant
  by Datanode              - Elastic compute      - Stronger VM-grade security
- NOT Elastic              - Enable shared          and resource isolation
- Limited to Hadoop          workloads            - Enable deployment of
  Multi-Tenancy            - Raise utilization      multiple Hadoop runtime
                                                    versions
1. Hadoop Task Tracker and Data Node in a VM

                                                  Add/Remove
                                       Slot
                                                  Slots?
                                       Slot

  Other
                         Virtual   Task Tracker
                         Hadoop
  Workload
                         Node

                                    Datanode
                                                         Grow/Shrink
                                                         by tens of GB?



  Virtualization Host                 VMDK




Grow/Shrink of a VM is one
approach
2. Add/remove Virtual Nodes

                                      Slot                     Slot
                                      Slot                     Slot

 Other
                        Virtual   Task Tracker   Virtual   Task Tracker
                        Hadoop                   Hadoop
 Workload
                        Node                     Node

                                   Datanode                 Datanode




 Virtualization Host                 VMDK                     VMDK




Just add/remove more
virtual nodes?
But State makes it hard to power-off a node

                                          Slot
                                          Slot

Other
                            Virtual   Task Tracker
                            Hadoop
Workload
                            Node

                                       Datanode




Virtualization Host                      VMDK




 Powering off the Hadoop VM
 would in effect fail the datanode
Adding a node needs data…

                                         Slot                     Slot
                                         Slot                     Slot

Other
                           Virtual   Task Tracker   Virtual   Task Tracker
                           Hadoop                   Hadoop
Workload
                           Node                     Node

                                      Datanode                 Datanode




Virtualization Host                     VMDK                     VMDK




Adding a node would require TBs of
data replication
2. Separated Compute and Data

                                                                           Slot
                                      Slot                Virtual   Slot
                                                       Virtual
                                                          Hadoop           Slot
                        Virtual       Slot           Virtual
                                                       Hadoop       Slot
                        Hadoop                            Node
                                                     Hadoop
                                                       Node
                        Node                         Node           Task Tracker
  Other                           Task Tracker                  Task Tracker
  Workload




                        Virtual
                        Hadoop                   Datanode
                        Node



  Virtualization Host                VMDK                           VMDK



Truly Elastic Hadoop:
Scalable through virtual
nodes
Dataflow with separated Compute/Data

                                     Slot
                      Virtual        Slot                       Virtual
                      Hadoop                                    Hadoop
                      Node                                      Node             Datanode
                                NodeManager




                                 Virtual NIC                       Virtual NIC




Virtualization Host                            Virtual Switch                      VMDK


                                                 NIC Drivers
Performance Analysis of Split

1 Combined Compute/Datanode VM per Host   1 Datanode VM, 1 Compute nodes VM per Host




          Node             Node                  Node              Node
         Manager          Manager               Manager          Manager
         Datanode         Datanode

                                               Datanode         Datanode




 Workload: Teragen, Terasort, Teravalidate
 HW Configuration: 8 cores, 96GB RAM, 16 disks per host x 2 nodes
Performance Analysis of Split
                (Elapsed time: ratio to combined)
1.2


 1


0.8


0.6                                                    Combined
                                                       Split
0.4


0.2


 0
      Teragen            Terasort       Teravalidate
Tying it together: Elastic Hadoop
                                         Coke                        Pepsi




                                                   Hadoop
                                                   Hadoop




                                                                             Hadoop
                        Hadoop




                                                   Queue
                                                   Virtual
                                                   Virtual




                                                                             Virtual
                        Virtual




 Runtime
 Layer

Data Layer
                        Namespace                 Namespace         Namespace


             Distributed File System (HDFS, KFS, GPFS, MAPR, Isilon,…)


                 Host             Host          Host         Host   Host        Host
Demo: Shrink/Expand Cluster
Demo: Shrink/Expand Cluster
Setup 1 Datanodes, 2 Nodemanagers and 2 web servers on
each physical host

      Web Server       Web Server      Web Server         Web Server

      Web Server       Web Server      Web Server         Web Server

     NodeManager      NodeManager     NodeManager        NodeManager

     NodeManager      NodeManager     NodeManager        NodeManager


      Datanode         Datanode        Datanode           Datanode
Demo: Shrink/Expand Cluster
When web load is high in daytime, we can suspend some Nodemanagers and
power on more Web servers.

      Web Server       Web Server      Web Server       Web Server

      Web Server       Web Server      Web Server       Web Server

     NodeManager      NodeManager     NodeManager      NodeManager

     NodeManager      NodeManager     NodeManager      NodeManager


      Datanode         Datanode        Datanode         Datanode
Demo
Part 2
Expand Hadoop Ecosystem
• Hortonworks goal
  – Expand Hadoop ecosystem
  – Provide first class support of various platforms
• Hadoop should run well on VMs
     • VMs offer several advantages as presented earlier
• Take advantage of vSphere for HA



                                                           Page 25
VMware-Hortonworks Joint
            Engineering
• First class support for VMs
  – Topology plugins (Hadoop-8468)
     • 2 VMs can be on same host
         – Pick closer data
         – Schedule tasks closer
         – Don’t put two replicas on same host
  – MR-tmp on HDFS using block pools
     • Elastic Compute-VMs will not need local disk
  – Fast communications within VMs

                                                      Page 26
Hadoop Total System Availability
                  Architecture
                                 Slave Nodes of Hadoop Cluster


                   job             job             job   job    job


 Apps
Running
Outside
                                            Failover

                         JT into Safemode

              NN                              JT               NN
                                                                           N+K
               Server                          Server           Server   failover

                            HA Cluster for Master Daemons
                                                                               27
HA is coming in 1.0
Using Total System Availability Architecture




                                               28
 © Hortonworks Inc. 2011
HA in Hadoop 1 with HDP1
• Total System Availability Architecture
   – Namenode
      • Clients pause automatically
      • JobTracker pauses automatically
   – Other Hadoop master services (JT, …) coming

• Use industry proven HA framework
   – VMWare vSphere-HA
      • Failover, fencing, …
      • Corner cases are tricky – if not addressed, corruption
   – Addition benefits:
      • N-N & N+K failover
      • Migration for maintenance

                                                                 29
Hadoop NN/JT HA with vSphere




                           Page 30
NameNode HA – Failover Times

• NameNode Failover times with vSphere and LinuxHA
   – Failure detection + Failover – 0.5 to 2 minutes
   – OS bootup needed for vSphere – 10-20 seconds
   – Namenode Startup (exit safemode)
       • Small/Medium clusters – 1 to 2 minutes
       • Large cluster – 5 to 15 minutes

• NameNode startup time measurements
   – 60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec
   – 180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec

  Cold Failover is good enough for small/medium clusters
       Failure Detection and Automatic Failover Dominates
                                                                             31
Demo
Summary
• Advantages of Hadoop on VMs
  – Cluster Management
  – Cluster consolidation
  – Greater Elasticity in mixed environment
  – Alternate multi-tenancy to capacity scheduler’s
    offerings
• HA for Hadoop Master Daemons
  – vSphere based HA for NN, JT, … in Hadoop 1
  – Total System Availability Architecture

                                                      Page 33

Hadoop on Virtual Machines

  • 1.
    Hadoop in VirtualMachines Richard McDougall, VMware Sanjay Radia, Hortonworks Hadoop Summit, 2012
  • 2.
  • 3.
    Say What? • VMs will just add overhead, due to I/O virt • VMs run on SAN, we’re all about local disks • Hadoop does it’s own cluster management • It’ll do resource management in 2.0 • And even HA is coming to Hadoop • And… what is the point, anyway?
  • 4.
    But you’ve beenasking… • Can I virtualize my Hadoop, so that I can make it easier, quicker to get a cluster up and running • Is it possible to run Hadoop on those spare machine cycles I have on hundreds/thousands of nodes? • Can I make my system more available by using some of the standard HA features?
  • 5.
    And the savvyare asking… • Can I avoid having to install special hardware for the master services, like name-node, job- tracker? • Can I dynamically change the size of the cluster to use more resources? • Can I use VM isolation to increase security or guard against resource-intensive neighbors? • Is it feasible to provision virtual-clusters, giving out one each to a business unit?
  • 6.
    Ok, so firstwhat about the concerns? • Use your SAN? … if you want to. SAN Storage NAS Filers Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: $1M gets: $1M gets: 0.5Petabytes 1 Petabyte 20 Petabytes 1,000,000 IOPS 400,000 IOPS 10,000,000 IOPS 1Gbyte/sec 2Gbyte/sec 800 Gbytes/sec
  • 7.
    Hadoop Using LocalDisks Task Tracker Datanode Other Hadoop Workload Virtual Machine Ext4 Ext4 Ext4 Virtualization Host OS Image - VMDK VMDK VMDK VMDK Shared Storage
  • 8.
    Hadoop Perf ina VM (Ratio is elapsed time to physical, Lower Is Better) 1.2 1 Ratio to Native 0.8 0.6 0.4 1 VM 2 VMs 0.2 0
  • 9.
    Evolution of Hadoopon VMs VM VM VM VM Current Hadoop: Compute T1 T2 Combined VM VM Storage/Co Storage Storage mpute Hadoop in VM Separate Storage Separate Compute Clusters - VM lifecycle - Separate compute - Separate virtual clusters determined from data per tenant by Datanode - Elastic compute - Stronger VM-grade security - NOT Elastic - Enable shared and resource isolation - Limited to Hadoop workloads - Enable deployment of Multi-Tenancy - Raise utilization multiple Hadoop runtime versions
  • 10.
    1. Hadoop TaskTracker and Data Node in a VM Add/Remove Slot Slots? Slot Other Virtual Task Tracker Hadoop Workload Node Datanode Grow/Shrink by tens of GB? Virtualization Host VMDK Grow/Shrink of a VM is one approach
  • 11.
    2. Add/remove VirtualNodes Slot Slot Slot Slot Other Virtual Task Tracker Virtual Task Tracker Hadoop Hadoop Workload Node Node Datanode Datanode Virtualization Host VMDK VMDK Just add/remove more virtual nodes?
  • 12.
    But State makesit hard to power-off a node Slot Slot Other Virtual Task Tracker Hadoop Workload Node Datanode Virtualization Host VMDK Powering off the Hadoop VM would in effect fail the datanode
  • 13.
    Adding a nodeneeds data… Slot Slot Slot Slot Other Virtual Task Tracker Virtual Task Tracker Hadoop Hadoop Workload Node Node Datanode Datanode Virtualization Host VMDK VMDK Adding a node would require TBs of data replication
  • 14.
    2. Separated Computeand Data Slot Slot Virtual Slot Virtual Hadoop Slot Virtual Slot Virtual Hadoop Slot Hadoop Node Hadoop Node Node Node Task Tracker Other Task Tracker Task Tracker Workload Virtual Hadoop Datanode Node Virtualization Host VMDK VMDK Truly Elastic Hadoop: Scalable through virtual nodes
  • 15.
    Dataflow with separatedCompute/Data Slot Virtual Slot Virtual Hadoop Hadoop Node Node Datanode NodeManager Virtual NIC Virtual NIC Virtualization Host Virtual Switch VMDK NIC Drivers
  • 16.
    Performance Analysis ofSplit 1 Combined Compute/Datanode VM per Host 1 Datanode VM, 1 Compute nodes VM per Host Node Node Node Node Manager Manager Manager Manager Datanode Datanode Datanode Datanode Workload: Teragen, Terasort, Teravalidate HW Configuration: 8 cores, 96GB RAM, 16 disks per host x 2 nodes
  • 17.
    Performance Analysis ofSplit (Elapsed time: ratio to combined) 1.2 1 0.8 0.6 Combined Split 0.4 0.2 0 Teragen Terasort Teravalidate
  • 18.
    Tying it together:Elastic Hadoop Coke Pepsi Hadoop Hadoop Hadoop Hadoop Queue Virtual Virtual Virtual Virtual Runtime Layer Data Layer Namespace Namespace Namespace Distributed File System (HDFS, KFS, GPFS, MAPR, Isilon,…) Host Host Host Host Host Host
  • 19.
  • 20.
    Demo: Shrink/Expand Cluster Setup1 Datanodes, 2 Nodemanagers and 2 web servers on each physical host Web Server Web Server Web Server Web Server Web Server Web Server Web Server Web Server NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Datanode Datanode Datanode Datanode
  • 21.
    Demo: Shrink/Expand Cluster Whenweb load is high in daytime, we can suspend some Nodemanagers and power on more Web servers. Web Server Web Server Web Server Web Server Web Server Web Server Web Server Web Server NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Datanode Datanode Datanode Datanode
  • 22.
  • 23.
  • 24.
    Expand Hadoop Ecosystem •Hortonworks goal – Expand Hadoop ecosystem – Provide first class support of various platforms • Hadoop should run well on VMs • VMs offer several advantages as presented earlier • Take advantage of vSphere for HA Page 25
  • 25.
    VMware-Hortonworks Joint Engineering • First class support for VMs – Topology plugins (Hadoop-8468) • 2 VMs can be on same host – Pick closer data – Schedule tasks closer – Don’t put two replicas on same host – MR-tmp on HDFS using block pools • Elastic Compute-VMs will not need local disk – Fast communications within VMs Page 26
  • 26.
    Hadoop Total SystemAvailability Architecture Slave Nodes of Hadoop Cluster job job job job job Apps Running Outside Failover JT into Safemode NN JT NN N+K Server Server Server failover HA Cluster for Master Daemons 27
  • 27.
    HA is comingin 1.0 Using Total System Availability Architecture 28 © Hortonworks Inc. 2011
  • 28.
    HA in Hadoop1 with HDP1 • Total System Availability Architecture – Namenode • Clients pause automatically • JobTracker pauses automatically – Other Hadoop master services (JT, …) coming • Use industry proven HA framework – VMWare vSphere-HA • Failover, fencing, … • Corner cases are tricky – if not addressed, corruption – Addition benefits: • N-N & N+K failover • Migration for maintenance 29
  • 29.
    Hadoop NN/JT HAwith vSphere Page 30
  • 30.
    NameNode HA –Failover Times • NameNode Failover times with vSphere and LinuxHA – Failure detection + Failover – 0.5 to 2 minutes – OS bootup needed for vSphere – 10-20 seconds – Namenode Startup (exit safemode) • Small/Medium clusters – 1 to 2 minutes • Large cluster – 5 to 15 minutes • NameNode startup time measurements – 60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec – 180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec Cold Failover is good enough for small/medium clusters Failure Detection and Automatic Failover Dominates 31
  • 31.
  • 32.
    Summary • Advantages ofHadoop on VMs – Cluster Management – Cluster consolidation – Greater Elasticity in mixed environment – Alternate multi-tenancy to capacity scheduler’s offerings • HA for Hadoop Master Daemons – vSphere based HA for NN, JT, … in Hadoop 1 – Total System Availability Architecture Page 33

Editor's Notes

  • #9 Hybrid StorageLocal Disks, retains fault domains of individual disks
  • #32 Data – can I read what I wrote, is the service availableWhen I asked one of the original authors of of GFS if there were any decisions they would revist – random writersSimplicity is keyRaw disk – fs take time to stabilize – we can take advantage of ext4, xfs or zfs