Big Data and Hadoop
Module 1: Introduction to Big Data and Hadoop
Slide 2
Session Objectives
This Session will help you to:
ᗍ Understand what is Big Data?
ᗍ List the challenges associated with Big Data
ᗍ Understand the difference between Real-time and Batch Processing
ᗍ Understand Hadoop capabilities
ᗍ Understand Hadoop ecosystem
Slide 3
Definition of Big Data
Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and
unstructured data
that is so large that it's difficult to process using traditional database and software techniques
In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.
Big data has the potential to help companies improve operations and make faster, more intelligent decisions
Big Data is the term applied to data sets whose size is beyond the ability of the commonly used software
tools to capture, manage, and process
Slide 4
Walmart
ᗍ US$ 485.651 Billion US Retailer
ᗍ Handles more than a million transactions every day, produces more 2.5 Petabytes on daily
basis
ᗍ Has dedicated data centers across the world to handle the above data (Has one in Bangalore
also)
Slide 5
Facebook
ᗍ It has about a billion users as we speak
ᗍ Generates close to 500 TB of data per day
ᗍ Fires 70 thousand queries on that every
day
ᗍ Inventors and one of the biggest users to
hive
Slide 6
Big Data Context with Case Studies
Cricket Telecast on Star
Sports
ᗍ Keys to success for a team
ᗍ Batsman’s strong or weak zone. Runs scoring graph
ᗍ Bowlers speed, Swing and Wicket Taking delivery
graph
Slide 7
What is Big Data?
ᗍ Huge Amount of Data (Terabytes or Petabytes)
ᗍ Big data is the term for a collection of data sets
so large and complex that it becomes difficult
to process using on-hand database
management tools or traditional data
processing applications
ᗍ The challenges include capture, curation,
storage, search, sharing, transfer, analysis, and
visualization
Three types of data can be identified:
ᗍ Unstructured Data
• Data which do not have a pre-defined data
model
• E.g. Text files, log files
ᗍ Semi-structured Data
• Data which do not have a formal data model
• E.g. XML files
ᗍ Structured Data
• Data which is represented in a tabular format
• E.g. Databases
Slide 8
Types of Data
Slide 9
Characteristics of Big Data – 4 V’s
Volum
e
Variet
y
GB
TB
P
B
M
B
Audi
o
Photo Web
Video
Slide 10
The V’s of Big Data
ᗍ Volume: 12 terabytes of Tweets created each day
ᗍ Velocity: Scrutinize 5 million trade events created each day to identify potential fraud
ᗍ Variety: Trade data, Sensor data, Audio, Video, Flight Tracking, R&D, Log files, Social media and more
ᗍ Veracity: The quality of the data being captured can vary greatly. Accuracy of analysis depends on the
veracity of the source data
Slide 11
Limitations of Big Data/Existing DWH
Solutions
ᗍ Two aspects: Storage of data and Analysis of data
ᗍ Limitation of existing IT infrastructure and resources
ᗍ Vertical Scalability is not always a solution: Upgrading server and
storage
ᗍ RDBMS is not designed to scale out
ᗍ Can not handle unstructured data
ᗍ Cost of commercially available solutions is significantly high
Slide 12
Need for New Approach
ᗍ A new approach to the problem is required:
ᗍ Process all types of data; Structured, Semi Structured and Unstructured
data
ᗍ Store and Process massive amount of data easily
ᗍ Cost of system; Process and Manage data economically
ᗍ Speed of processing
Slide 13
What is Hadoop?
Apache Hadoop is a framework that allows the distributed processing of large data sets
across clusters of commodity computers using a simple programming mode
It is an Open-source Data Management with scale-out storage and distributed
processing
Slide 14
What is Hadoop? (Cont’d)
ᗍ Apache Hadoop is a framework that allows for distributed processing of large data sets stored across
clusters of commodity computers using simple programming model
ᗍ A Free, Java-based programming framework that supports the processing of large data sets in a
distributed
computing environment
ᗍ Based on Google File System (GFS)
ᗍ Runs applications on distributed systems with thousands of nodes
Slide 15
Hadoop Key Features
ᗍ Simple architecture
ᗍ Scalability; Designed for Massive scale
ᗍ Availability; High degree of fault tolerance; Designed to recover from Failures; Robust
ᗍ Low Cost; Low software and hardware costs; Designed to run on commodity servers
ᗍ Speed of Operations; Distributed file system provides fast data transfers among nodes
ᗍ Parallel Programming Model; An easy to use programming paradigm that scales through 1000s of
nodes and petabytes of data
ᗍ Allows data analysis without first be modeled, cleansed and loaded
Slide 16
Hadoop Key Characteristics
Characteristics
Reliable
Economical
Flexible
Scalable
HDFS: Data Storage frame work
Slide 17
Hadoop Core Components
Map Reduce: Data Processing Framework
Slide 18
Hadoop Ecosystem
Unstructured or
Semi-Structured
data
Structured
Data
Apache Oozie (Workflow)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks
(Spark, GIRAPH)
YARN
Cluster Resource Management
HDFS
(Hadoop Distributed
File System)
Flume
Sqoop
Import Or Export
Slide 19
Hadoop Services
The core services of Hadoop are:
ᗍ NameNode
ᗍ DataNode
ᗍ Resource Manager [Job Tracker in 1.0]
ᗍ Node Manager [TaskTracker in 1.0]
ᗍ Secondary NameNode
You can use Hadoop in following modes:
ᗍ Standalone (or Local) Mode
• No Hadoop daemons, entire process runs in a single JVM
• Suitable for running Hadoop programs during initial installation and Hadoop software
testing
• It doesn’t have any DFS available
ᗍ Pseudo-Distributed Mode
• Hadoop daemons up, but on a single machine
• Best suited for development
ᗍ Fully-Distributed/Clustered/Prod Mode
• Hadoop daemons run on a cluster of machines
• Best suited for production environments
Slide 20
Different Hadoop Modes
Slide 21
Hadoop Deployment Modes
ᗍ Stand Standalone or Local mode
• Everything runs on single JVM
• Good for Development
ᗍ Pseudo-Distributed Mode
• All services running on single machine, a cluster simulation on one
machine
• Good For Test Environment
ᗍ Fully Distributed Mode
• Hadoop Services running on multiple machines on a cluster
• Production Environment
ᗍ Its the physical division of data file done by HDFS while
storing it
ᗍ 128 MB of blocks size by default for Hadoop 2.0
ᗍ Example:
File
1
A
B
128 MB
256
MB 128 MB
Slide 22
Blocks
Slide 23
Blocks (Cont’d)
File
2
A
B
128 MB
250
MB
122 MB
File
3
A
B
128 MB
300
MB
44 MB
128 MB
C
ᗍ Computer Racks
• Computer Rack is a physical chassis that can house multiple computers or servers simultaneously.
It is a
mounting rack that has the ability to install more than one computer
ᗍ Block Replication in HDFS
• Provides redundancy and fault tolerance to the data saved
• The default value is 3
Slide 24
Computer Racks & Block Replication
HDFS stores blocks on the cluster in a rack aware fashion i.e. one block on one rack and the other two
blocks on other rack
Rack 1 Rack 2 Rack 3
1 5 9
2 6 10
3 7 11
4 8 12
Block A :
Block B :
BlockC :
Slide 25
HDFS Rack Awareness
Slide 26
Hadoop Distributed File System (HDFS)
The key features of Hadoop HDFS are:
ᗍ Storing large sets of data files (in TB/ PB)
ᗍ Distributed across multiple machines
ᗍ Inbuilt Fault tolerance & Reliability; Data replication
Creating multiple replicas of each data block and distributing them on computers throughout the cluster
to enable reliable and rapid data access
ᗍ Providing high-throughput access to data blocks; Low Latency data access
ᗍ Write once read many concept
Slide 27
Hadoop Distributed File System (HDFS)
(Cont’d)
 Master/slave architecture
 HDFS cluster consists of a single Namenode, a master server that manages the
file system namespace and regulates access to files by clients.
 There are a number of DataNodes usually one per node in a cluster.
 The DataNodes manage storage attached to the nodes that they run on.
 HDFS exposes a file system namespace and allows user data to be stored in
files.
 A file is split into one or more blocks and set of blocks are stored in
DataNodes.
 DataNodes: serves read, write requests, performs block creation, deletion, and
replication upon instruction from Namenode
Slide 28
File system Namespace
HDFS READ ANATOMY
Hadoop Write anatomy
Job Tracker and TaskTracker
The primary function of the job tracker is resource management
(managing the task trackers), tracking resource availability and task life
cycle management (tracking its progress, fault tolerance etc.)
The task tracker has a simple function of following the orders of the job
tracker and updating the job tracker with its progress status periodically
HDFS is rack aware in the sense that the
namenode and the job tracker obtain a list
of rack ids corresponding to each of the
slave nodes (data nodes) and creates a
mapping between the IP address and the
rack id. HDFS uses this knowledge to
replicate data across different racks so that
data is not lost in the event of a complete
rack power outage or switch failure
Rack Awareness
ᗍ Mappers:
Mappers are java programs confirming to Google’s Map Reduce algorithm framework. These programs
run on each of the blocks of big data file saved on the cluster
ᗍ Reducers:
Similar to Mappers, Reducers are also java programs confirming to Google’s Map Reduce algorithm
framework. They are aggregate functions which are supposed to run on the outputs coming out of
mappers
Slide 34
Mapper & Reduce – Basic Concepts
Slide 35
Hadoop Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh Environment variables that are used in the scripts to run Hadoop
core-site.xml Core Hadoop Configuration settings which are common to HDFS
and MapReduce
hdfs-site.xml HDFS Configuration settings for HDFS daemons, the NameNode,
the secondary NameNode and the data nodes
mapred-site.xml MapReduce specific Configuration settings like Job History Server
yarn-site.xml Configuration settings for Shuffle Mechanism with respect to
YARN implementation.
masters A list of machines (one per line) that each run a secondary
NameNode
slaves A list of machines (one per line) that each run a slave machine
running
DataNode and a NodeManager daemons
Slide 36
Hadoop Ecosystem
 # Usage: # hadoop fs -mkdir <paths>
 # Example: hadoop fs -mkdir /root
 #Example : hadoop fs –mkdir /root/training
CREATE A DIRECTORY IN HDFS
 Copies single src file from local file system to the Hadoop
Distributed File System.
 hadoop fs -put <local-src> ... <HDFS_dest_path>
 hadoop fs –put /home/cloudera/Desktop/Employee.csv /root/training
PUT Command
 Lists the contents of a directory
 # Usage: # hadoop fs -ls <args>
 # Example: hadoop fs -ls /root/
 Try Yourself: hadoop fs -lsr /root/
 This –lsr is called recursive search
Ls command
 Copies/Downloads files from HDFS to the local file system
 hadoop fs -get <hdfs_src> <localdst>
 Example: hadoop fs -get /user/root/employee.csv /home/cloudera/desktop/emloyee.csv
 To Avoid this error give new name for file or diffrenet local path.
 hadoop fs -get /user/root/employee.csv /home/cloudera/desktop/emloyee123.csv
Get Command
 To copy file from one hdfs location to other.
 Usage: hadoop fs -cp <source> <dest>
 Example:
 hadoop fs -cp /root/Employee.csv /root/training/Employee.csv
 Please again try hadoop fs –lsr /root/
Cp command
 Same purpose as of put command
 Usage:
 hadoop fs -copyFromLocal <localsrc> URI
 Example:
 hadoop fs –put /home/cloudera/Desktop/Student.csv
/root/training
copyFromLocal
 Same purpose as that of get command
 hadoop fs –copyToLocal <hdfs_src> <localdst>
 Example: hadoop fs - copyToLocal /user/root/employee.csv
/home/cloudera/desktop/emloyee321.csv
copyToLocal
 Display last few lines of a file.
 Usage :
 hadoop fs -tail <path[filename]>
 Example:
 hadoop fs -tail /user/root/employee.csv
Tail Command
 To display the complete file
 Usage: hadoop fs –cat <arg as file name>
Example: hadoop fs –cat /user/root/employee.csv
cat command
 hadoop fs rm:
Removes the specified list of files and empty directories. An
example is shown below:

hadoop fs –rm /root/employee.csv
 Try –rm –r option also and see differnce for /root/
Rm Command
Slide 47
Any Questions

Module 1- Introduction to Big Data and Hadoop

  • 1.
    Big Data andHadoop Module 1: Introduction to Big Data and Hadoop
  • 2.
    Slide 2 Session Objectives ThisSession will help you to: ᗍ Understand what is Big Data? ᗍ List the challenges associated with Big Data ᗍ Understand the difference between Real-time and Batch Processing ᗍ Understand Hadoop capabilities ᗍ Understand Hadoop ecosystem
  • 3.
    Slide 3 Definition ofBig Data Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions Big Data is the term applied to data sets whose size is beyond the ability of the commonly used software tools to capture, manage, and process
  • 4.
    Slide 4 Walmart ᗍ US$485.651 Billion US Retailer ᗍ Handles more than a million transactions every day, produces more 2.5 Petabytes on daily basis ᗍ Has dedicated data centers across the world to handle the above data (Has one in Bangalore also)
  • 5.
    Slide 5 Facebook ᗍ Ithas about a billion users as we speak ᗍ Generates close to 500 TB of data per day ᗍ Fires 70 thousand queries on that every day ᗍ Inventors and one of the biggest users to hive
  • 6.
    Slide 6 Big DataContext with Case Studies Cricket Telecast on Star Sports ᗍ Keys to success for a team ᗍ Batsman’s strong or weak zone. Runs scoring graph ᗍ Bowlers speed, Swing and Wicket Taking delivery graph
  • 7.
    Slide 7 What isBig Data? ᗍ Huge Amount of Data (Terabytes or Petabytes) ᗍ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ᗍ The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
  • 8.
    Three types ofdata can be identified: ᗍ Unstructured Data • Data which do not have a pre-defined data model • E.g. Text files, log files ᗍ Semi-structured Data • Data which do not have a formal data model • E.g. XML files ᗍ Structured Data • Data which is represented in a tabular format • E.g. Databases Slide 8 Types of Data
  • 9.
    Slide 9 Characteristics ofBig Data – 4 V’s Volum e Variet y GB TB P B M B Audi o Photo Web Video
  • 10.
    Slide 10 The V’sof Big Data ᗍ Volume: 12 terabytes of Tweets created each day ᗍ Velocity: Scrutinize 5 million trade events created each day to identify potential fraud ᗍ Variety: Trade data, Sensor data, Audio, Video, Flight Tracking, R&D, Log files, Social media and more ᗍ Veracity: The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data
  • 11.
    Slide 11 Limitations ofBig Data/Existing DWH Solutions ᗍ Two aspects: Storage of data and Analysis of data ᗍ Limitation of existing IT infrastructure and resources ᗍ Vertical Scalability is not always a solution: Upgrading server and storage ᗍ RDBMS is not designed to scale out ᗍ Can not handle unstructured data ᗍ Cost of commercially available solutions is significantly high
  • 12.
    Slide 12 Need forNew Approach ᗍ A new approach to the problem is required: ᗍ Process all types of data; Structured, Semi Structured and Unstructured data ᗍ Store and Process massive amount of data easily ᗍ Cost of system; Process and Manage data economically ᗍ Speed of processing
  • 13.
    Slide 13 What isHadoop? Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming mode It is an Open-source Data Management with scale-out storage and distributed processing
  • 14.
    Slide 14 What isHadoop? (Cont’d) ᗍ Apache Hadoop is a framework that allows for distributed processing of large data sets stored across clusters of commodity computers using simple programming model ᗍ A Free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment ᗍ Based on Google File System (GFS) ᗍ Runs applications on distributed systems with thousands of nodes
  • 15.
    Slide 15 Hadoop KeyFeatures ᗍ Simple architecture ᗍ Scalability; Designed for Massive scale ᗍ Availability; High degree of fault tolerance; Designed to recover from Failures; Robust ᗍ Low Cost; Low software and hardware costs; Designed to run on commodity servers ᗍ Speed of Operations; Distributed file system provides fast data transfers among nodes ᗍ Parallel Programming Model; An easy to use programming paradigm that scales through 1000s of nodes and petabytes of data ᗍ Allows data analysis without first be modeled, cleansed and loaded
  • 16.
    Slide 16 Hadoop KeyCharacteristics Characteristics Reliable Economical Flexible Scalable
  • 17.
    HDFS: Data Storageframe work Slide 17 Hadoop Core Components Map Reduce: Data Processing Framework
  • 18.
    Slide 18 Hadoop Ecosystem Unstructuredor Semi-Structured data Structured Data Apache Oozie (Workflow) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (Spark, GIRAPH) YARN Cluster Resource Management HDFS (Hadoop Distributed File System) Flume Sqoop Import Or Export
  • 19.
    Slide 19 Hadoop Services Thecore services of Hadoop are: ᗍ NameNode ᗍ DataNode ᗍ Resource Manager [Job Tracker in 1.0] ᗍ Node Manager [TaskTracker in 1.0] ᗍ Secondary NameNode
  • 20.
    You can useHadoop in following modes: ᗍ Standalone (or Local) Mode • No Hadoop daemons, entire process runs in a single JVM • Suitable for running Hadoop programs during initial installation and Hadoop software testing • It doesn’t have any DFS available ᗍ Pseudo-Distributed Mode • Hadoop daemons up, but on a single machine • Best suited for development ᗍ Fully-Distributed/Clustered/Prod Mode • Hadoop daemons run on a cluster of machines • Best suited for production environments Slide 20 Different Hadoop Modes
  • 21.
    Slide 21 Hadoop DeploymentModes ᗍ Stand Standalone or Local mode • Everything runs on single JVM • Good for Development ᗍ Pseudo-Distributed Mode • All services running on single machine, a cluster simulation on one machine • Good For Test Environment ᗍ Fully Distributed Mode • Hadoop Services running on multiple machines on a cluster • Production Environment
  • 22.
    ᗍ Its thephysical division of data file done by HDFS while storing it ᗍ 128 MB of blocks size by default for Hadoop 2.0 ᗍ Example: File 1 A B 128 MB 256 MB 128 MB Slide 22 Blocks
  • 23.
    Slide 23 Blocks (Cont’d) File 2 A B 128MB 250 MB 122 MB File 3 A B 128 MB 300 MB 44 MB 128 MB C
  • 24.
    ᗍ Computer Racks •Computer Rack is a physical chassis that can house multiple computers or servers simultaneously. It is a mounting rack that has the ability to install more than one computer ᗍ Block Replication in HDFS • Provides redundancy and fault tolerance to the data saved • The default value is 3 Slide 24 Computer Racks & Block Replication
  • 25.
    HDFS stores blockson the cluster in a rack aware fashion i.e. one block on one rack and the other two blocks on other rack Rack 1 Rack 2 Rack 3 1 5 9 2 6 10 3 7 11 4 8 12 Block A : Block B : BlockC : Slide 25 HDFS Rack Awareness
  • 26.
    Slide 26 Hadoop DistributedFile System (HDFS) The key features of Hadoop HDFS are: ᗍ Storing large sets of data files (in TB/ PB) ᗍ Distributed across multiple machines ᗍ Inbuilt Fault tolerance & Reliability; Data replication Creating multiple replicas of each data block and distributing them on computers throughout the cluster to enable reliable and rapid data access ᗍ Providing high-throughput access to data blocks; Low Latency data access ᗍ Write once read many concept
  • 27.
    Slide 27 Hadoop DistributedFile System (HDFS) (Cont’d)
  • 28.
     Master/slave architecture HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients.  There are a number of DataNodes usually one per node in a cluster.  The DataNodes manage storage attached to the nodes that they run on.  HDFS exposes a file system namespace and allows user data to be stored in files.  A file is split into one or more blocks and set of blocks are stored in DataNodes.  DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode Slide 28 File system Namespace
  • 29.
  • 30.
  • 31.
    Job Tracker andTaskTracker The primary function of the job tracker is resource management (managing the task trackers), tracking resource availability and task life cycle management (tracking its progress, fault tolerance etc.) The task tracker has a simple function of following the orders of the job tracker and updating the job tracker with its progress status periodically
  • 32.
    HDFS is rackaware in the sense that the namenode and the job tracker obtain a list of rack ids corresponding to each of the slave nodes (data nodes) and creates a mapping between the IP address and the rack id. HDFS uses this knowledge to replicate data across different racks so that data is not lost in the event of a complete rack power outage or switch failure
  • 33.
  • 34.
    ᗍ Mappers: Mappers arejava programs confirming to Google’s Map Reduce algorithm framework. These programs run on each of the blocks of big data file saved on the cluster ᗍ Reducers: Similar to Mappers, Reducers are also java programs confirming to Google’s Map Reduce algorithm framework. They are aggregate functions which are supposed to run on the outputs coming out of mappers Slide 34 Mapper & Reduce – Basic Concepts
  • 35.
    Slide 35 Hadoop ConfigurationFiles Configuration Filenames Description of Log Files hadoop-env.sh Environment variables that are used in the scripts to run Hadoop core-site.xml Core Hadoop Configuration settings which are common to HDFS and MapReduce hdfs-site.xml HDFS Configuration settings for HDFS daemons, the NameNode, the secondary NameNode and the data nodes mapred-site.xml MapReduce specific Configuration settings like Job History Server yarn-site.xml Configuration settings for Shuffle Mechanism with respect to YARN implementation. masters A list of machines (one per line) that each run a secondary NameNode slaves A list of machines (one per line) that each run a slave machine running DataNode and a NodeManager daemons
  • 36.
  • 37.
     # Usage:# hadoop fs -mkdir <paths>  # Example: hadoop fs -mkdir /root  #Example : hadoop fs –mkdir /root/training CREATE A DIRECTORY IN HDFS
  • 38.
     Copies singlesrc file from local file system to the Hadoop Distributed File System.  hadoop fs -put <local-src> ... <HDFS_dest_path>  hadoop fs –put /home/cloudera/Desktop/Employee.csv /root/training PUT Command
  • 39.
     Lists thecontents of a directory  # Usage: # hadoop fs -ls <args>  # Example: hadoop fs -ls /root/  Try Yourself: hadoop fs -lsr /root/  This –lsr is called recursive search Ls command
  • 40.
     Copies/Downloads filesfrom HDFS to the local file system  hadoop fs -get <hdfs_src> <localdst>  Example: hadoop fs -get /user/root/employee.csv /home/cloudera/desktop/emloyee.csv  To Avoid this error give new name for file or diffrenet local path.  hadoop fs -get /user/root/employee.csv /home/cloudera/desktop/emloyee123.csv Get Command
  • 41.
     To copyfile from one hdfs location to other.  Usage: hadoop fs -cp <source> <dest>  Example:  hadoop fs -cp /root/Employee.csv /root/training/Employee.csv  Please again try hadoop fs –lsr /root/ Cp command
  • 42.
     Same purposeas of put command  Usage:  hadoop fs -copyFromLocal <localsrc> URI  Example:  hadoop fs –put /home/cloudera/Desktop/Student.csv /root/training copyFromLocal
  • 43.
     Same purposeas that of get command  hadoop fs –copyToLocal <hdfs_src> <localdst>  Example: hadoop fs - copyToLocal /user/root/employee.csv /home/cloudera/desktop/emloyee321.csv copyToLocal
  • 44.
     Display lastfew lines of a file.  Usage :  hadoop fs -tail <path[filename]>  Example:  hadoop fs -tail /user/root/employee.csv Tail Command
  • 45.
     To displaythe complete file  Usage: hadoop fs –cat <arg as file name> Example: hadoop fs –cat /user/root/employee.csv cat command
  • 46.
     hadoop fsrm: Removes the specified list of files and empty directories. An example is shown below:  hadoop fs –rm /root/employee.csv  Try –rm –r option also and see differnce for /root/ Rm Command
  • 47.