1 
Hands on Hadoop 
Daniel Templeton & Inyoung Cho 
Cloudera, Inc.
2 
Your Hosts 
Daniel Templeton 
• Certification Developer 
• Crusty, old HPC guy 
• Likes Perl 
Inyoung Cho 
• Certification Developer 
• Recovering Java 
Evangelist 
• Invented JavaOne Hands-on 
Labs 
©2014 Cloudera, Inc. 2 All rights reserved.
3 
What is “Big Data”? 
• Super-cool marketing buzz word 
• “Come see our new line of BIG DATA toasters…” 
• “The Five V’s” 
• Any data that is difficult to store in a traditional 
RDBMS 
• Too big, changes schemas too often, unstructured, … 
©2014 Cloudera, Inc. 3 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 4 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 5 All rights reserved.
6 
HDFS in a Nutshell 
• Distributed “file system” service 
• Highly scalable and fault resilient 
• Chunks files into “blocks” that are replicated and 
distributed across the cluster 
©2014 Cloudera, Inc. 6 All rights reserved.
7 
MapReduce in a Nutshell 
• Embarrassingly parallel batch execution engine 
• Two phases: map and reduce 
• https://www.youtube.com/watch?v=bcjSe0xCHbE 
• Tasks are scheduled to run where the data is 
• Jobs are written to Java API 
©2014 Cloudera, Inc. 7 All rights reserved.
8 
Hive in a Nutshell 
• SQL engine for Hadoop 
• Translates HiveQL into MapReduce jobs 
©2014 Cloudera, Inc. 8 All rights reserved.
9 
Impala in a Nutshell 
• Hive with the MapReduce 
©2014 Cloudera, Inc. 9 All rights reserved.
10 
Pig in a Nutshell 
• Script-like language for data operations 
• Translates into MapReduce jobs 
©2014 Cloudera, Inc. 10 All rights reserved.
11 
The Lab 
• Self-paced 
• Should take right about 2 hours 
• “Additional Exercises” if you finish early 
• Inyoung and I are here to answer questions 
• Have fun! 
©2014 Cloudera, Inc. 11 All rights reserved.
12 ©2014 Cloudera, Inc. All rights reserved. 
Aaron Myers & 
Daniel Templeton

Java one14 handsonhadoop

  • 1.
    1 Hands onHadoop Daniel Templeton & Inyoung Cho Cloudera, Inc.
  • 2.
    2 Your Hosts Daniel Templeton • Certification Developer • Crusty, old HPC guy • Likes Perl Inyoung Cho • Certification Developer • Recovering Java Evangelist • Invented JavaOne Hands-on Labs ©2014 Cloudera, Inc. 2 All rights reserved.
  • 3.
    3 What is“Big Data”? • Super-cool marketing buzz word • “Come see our new line of BIG DATA toasters…” • “The Five V’s” • Any data that is difficult to store in a traditional RDBMS • Too big, changes schemas too often, unstructured, … ©2014 Cloudera, Inc. 3 All rights reserved.
  • 4.
    What is Hadoop? ©2014 Cloudera, Inc. 4 All rights reserved.
  • 5.
    What is Hadoop? ©2014 Cloudera, Inc. 5 All rights reserved.
  • 6.
    6 HDFS ina Nutshell • Distributed “file system” service • Highly scalable and fault resilient • Chunks files into “blocks” that are replicated and distributed across the cluster ©2014 Cloudera, Inc. 6 All rights reserved.
  • 7.
    7 MapReduce ina Nutshell • Embarrassingly parallel batch execution engine • Two phases: map and reduce • https://www.youtube.com/watch?v=bcjSe0xCHbE • Tasks are scheduled to run where the data is • Jobs are written to Java API ©2014 Cloudera, Inc. 7 All rights reserved.
  • 8.
    8 Hive ina Nutshell • SQL engine for Hadoop • Translates HiveQL into MapReduce jobs ©2014 Cloudera, Inc. 8 All rights reserved.
  • 9.
    9 Impala ina Nutshell • Hive with the MapReduce ©2014 Cloudera, Inc. 9 All rights reserved.
  • 10.
    10 Pig ina Nutshell • Script-like language for data operations • Translates into MapReduce jobs ©2014 Cloudera, Inc. 10 All rights reserved.
  • 11.
    11 The Lab • Self-paced • Should take right about 2 hours • “Additional Exercises” if you finish early • Inyoung and I are here to answer questions • Have fun! ©2014 Cloudera, Inc. 11 All rights reserved.
  • 12.
    12 ©2014 Cloudera,Inc. All rights reserved. Aaron Myers & Daniel Templeton