Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Topics For Today’s Session
Introduction To Data Analytics
Data Cleaning and Manipulation
Statistics
Data Visualization
Machine Learning
Roles, Responsibilities & Salary
Hands-On
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Introduction To Data Analytics
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Why Data Analytics?
Gather Hidden Insights01
Generate Reports 02
Perform Market Analysis03
Improve Business Requirement 04
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
What is Data Analytics?
Data Analytics refers to the techniques to analyse data to enhanced productivity and business gain.
Business
Administration
Exploratory Data
Analysis
Growth in Business
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Who is a Data Analyst?
Collect Data Analyse Data Create Reports
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Analyst Skills
Statistics Data Cleaning
EDA Data Visualization
Machine Learning
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Statistics
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Statistics
Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation.
Analyse Data
Build a Model Infer Result
Descriptive
Inferential
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Categories in Statistics – Descriptive Statistics
Descriptive
Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or
tables.
Characteristics of Data
Descriptive Statistics
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Categories in Statistics – Descriptive Statistics
Descriptive
There are mainly two measures you need to understand in Descriptive Statistics.
Measures of Centre01
Measures of Spread 02
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Descriptive Statistics – Measures of Centre
Descriptive
There are 3 terms, you have to understand in Measures of Centre.
Mean
Measure of average of all the values in a sample is
called Mean.
110 + 110 + 93 + 96 + 90 + 110 + 110 + 110
8
= 103.625
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Descriptive Statistics – Measures of Centre
Descriptive
There are 3 terms, you have to understand in Measures of Centre.
Measure of average of all the values in a sample is
called Mean.
110 + 110 + 93 + 96 + 90 + 110 + 110 + 110
8
= 103.625
Measure of the central value of the sample set is
called Median.
21,21,21.3,22.8,23,23,23,23
22.8+23
2
= 22.9
Measure of the central value of the sample set is
called Median.
21,21,21.3,22.8,23,23,23,23
22.8+23
2
= 22.9
Median
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Descriptive Statistics – Measures of Centre
Descriptive
There are 3 terms, you have to understand in Measures of Centre.
Measure of average of all the values in a sample is
called Mean.
110 + 110 + 93 + 96 + 90 + 110 + 110 + 110
8
= 103.625
Measure of the central value of the sample set is
called Median.
21,21,21.3,22.8,23,23,23,23
22.8+23
2
= 22.9
Measure of the central value of the sample set is
called Median.
21,21,21.3,22.8,23,23,23,23
22.8+23
2
= 22.9
Mode
The value most recurrent in the sample set is
known as Mode.
21,21,22,23,24,25,25,25,26 Mode - 25
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Descriptive Statistics – Measures of Spread
Descriptive
Range
Range is the given
measure of how spread
apart are the values in a
dataset.
Range = Max(𝑥𝑖) - Min(𝑥𝑖)
Inter Quartile Range
Inter Quartile
Range(IQR) is the
measure of variability,
based on dividing a
dataset into quartiles.
1 2 3 4 5 6 7 8
Q1 Q2 Q3
Variance
Variance describes how
much a random variable
differs from its expected
value.
It entails computing
squares of deviations.
Standard Deviation
Standard Deviation is
the measure of the
dispersion of a set of
data from its mean.
෍
𝑖=1
𝑁
=(𝑥𝑖−𝜇)²
1
𝑁
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Categories in Statistics – Inferential Statistics
Descriptive
Inferential
Inferential Statistics generalizes a large dataset and applies probability to draw a conclusion. It allows us to infer data parameters
based on a statistical model using a sample data.
Statistical Model
Inferential Statistics
Start
Process Step
Decision
Answer
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Inferential Statistics – Hypothesis Testing
Descriptive
Inferential
Statisticians use hypothesis testing to formally check whether the hypothesis is accepted or rejected.
State the Hypotheses – This stage
involves stating the null and alternative
hypotheses.
Formulate an Analysis Plan – This stage involves the
construction of an analysis plan.
Analyse Sample Data – This stage involves the calculation and
interpretation of the test statistic as described in the analysis plan.
Interpret Results – This stage involves the application of the decision rule described in
the analysis plan.
Hypothesis testing is conducted in the following manner:
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Descriptive vs Inferential Statistics
Descriptive Statistics Inferential Statistics
Concerned with Properties of
Population
Makes inferences from the sample
Presents data in a meaningful manner
Compares and predicts the future
outcomes
Outcomes are shown in form of
charts, tables and graphs
Outcomes are in the form of
probability scores
Describes the known data
Tries to make conclusions beyond the
data available
Measures of central tendency and
spread of data
Hypothesis Testing and Analysis of
variance.
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Data Cleaning and Manipulation
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Cleaning and Manipulation
Data Cleaning
The process of detecting and correcting corrupt or
inaccurate records from a database is said to be Data
Cleaning.
Data Manipulation
The process of changing data to make it more
organized and easy to read is known as Data
Manipulation.
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Data Visualization
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Visualization
Data Visualization is the representation of data inform of charts, diagram etc.
Bar Graph Scatter Plot Pie Chart
Box Plot Line Graph
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Bonus:
Machine Learning
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Machine Learning
Machine Learning is a concept which allows the machine to learn from examples and experience, and that too without being
explicitly programmed.
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Analyst: Roles and Responsibilities
Determining Organizational Goals Mine Data Data Cleaning
Analyzing Data Pinpointing Trends and Patterns Creating Reports with Visualizations
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Salary of Data Analyst
Average Salary (US)
Average Salary (IND)
$83,878
₹404,660
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Need of R
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Need of R
R is open-source and freely available.
R is cross-platform compatible.
R is a powerful scripting language.
R is highly flexible and evolved.
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Hands-On
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Hands-On
To perform data analysis on the below data set and gather some insights.
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Analytics @edureka
Program
Starts
2nd
Week
7th Week 15th Week
11th
Week
01
02
03
04
Statistics Essentials
Probability Bayesian Interference
Regression Making Statistics
Data Analytics with R
Data Manipulation Exploratory Analysis Regression
Data Visualization Data Mining Sentiment Analysis
SAS Training
Advanced Statistical Techniques SAS Macros
PROC SQL SAS ODS Advanced SAS Procedures
Tableau Training
LOD Expressions Tableau Desktop Tableau Public
Data Visualization Integration with R
Graduated as Data Analyst
Self-Paced
Instructor - Led
Data Analytics Master Program www.edureka.co/masters-program/data-analyst-certification
Data Analytics @edureka
QlikView
Certification
Training
Advanced
MS Excel
2010
R
Programming
Certification
Training
Analytics for
Retail Banks
Decision Tree
Modelling
Using R
Certification
Training
Machine
Learning
with Mahout
Certification
Training
Advanced
Predictive
Modelling in
R
Certification
Training
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytics Using R | Edureka

Data Analytics For Beginners | Introduction To Data Analytics | Data Analytics Using R | Edureka

  • 2.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Topics For Today’s Session Introduction To Data Analytics Data Cleaning and Manipulation Statistics Data Visualization Machine Learning Roles, Responsibilities & Salary Hands-On
  • 3.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Introduction To Data Analytics
  • 4.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Why Data Analytics? Gather Hidden Insights01 Generate Reports 02 Perform Market Analysis03 Improve Business Requirement 04
  • 5.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification What is Data Analytics? Data Analytics refers to the techniques to analyse data to enhanced productivity and business gain. Business Administration Exploratory Data Analysis Growth in Business
  • 6.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Who is a Data Analyst? Collect Data Analyse Data Create Reports
  • 7.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Analyst Skills Statistics Data Cleaning EDA Data Visualization Machine Learning
  • 8.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Statistics
  • 9.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Statistics Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation. Analyse Data Build a Model Infer Result
  • 10.
  • 11.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Categories in Statistics – Descriptive Statistics Descriptive Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables. Characteristics of Data Descriptive Statistics
  • 12.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Categories in Statistics – Descriptive Statistics Descriptive There are mainly two measures you need to understand in Descriptive Statistics. Measures of Centre01 Measures of Spread 02
  • 13.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Descriptive Statistics – Measures of Centre Descriptive There are 3 terms, you have to understand in Measures of Centre. Mean Measure of average of all the values in a sample is called Mean. 110 + 110 + 93 + 96 + 90 + 110 + 110 + 110 8 = 103.625
  • 14.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Descriptive Statistics – Measures of Centre Descriptive There are 3 terms, you have to understand in Measures of Centre. Measure of average of all the values in a sample is called Mean. 110 + 110 + 93 + 96 + 90 + 110 + 110 + 110 8 = 103.625 Measure of the central value of the sample set is called Median. 21,21,21.3,22.8,23,23,23,23 22.8+23 2 = 22.9 Measure of the central value of the sample set is called Median. 21,21,21.3,22.8,23,23,23,23 22.8+23 2 = 22.9 Median
  • 15.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Descriptive Statistics – Measures of Centre Descriptive There are 3 terms, you have to understand in Measures of Centre. Measure of average of all the values in a sample is called Mean. 110 + 110 + 93 + 96 + 90 + 110 + 110 + 110 8 = 103.625 Measure of the central value of the sample set is called Median. 21,21,21.3,22.8,23,23,23,23 22.8+23 2 = 22.9 Measure of the central value of the sample set is called Median. 21,21,21.3,22.8,23,23,23,23 22.8+23 2 = 22.9 Mode The value most recurrent in the sample set is known as Mode. 21,21,22,23,24,25,25,25,26 Mode - 25
  • 16.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Descriptive Statistics – Measures of Spread Descriptive Range Range is the given measure of how spread apart are the values in a dataset. Range = Max(𝑥𝑖) - Min(𝑥𝑖) Inter Quartile Range Inter Quartile Range(IQR) is the measure of variability, based on dividing a dataset into quartiles. 1 2 3 4 5 6 7 8 Q1 Q2 Q3 Variance Variance describes how much a random variable differs from its expected value. It entails computing squares of deviations. Standard Deviation Standard Deviation is the measure of the dispersion of a set of data from its mean. ෍ 𝑖=1 𝑁 =(𝑥𝑖−𝜇)² 1 𝑁
  • 17.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Categories in Statistics – Inferential Statistics Descriptive Inferential Inferential Statistics generalizes a large dataset and applies probability to draw a conclusion. It allows us to infer data parameters based on a statistical model using a sample data. Statistical Model Inferential Statistics Start Process Step Decision Answer
  • 18.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Inferential Statistics – Hypothesis Testing Descriptive Inferential Statisticians use hypothesis testing to formally check whether the hypothesis is accepted or rejected. State the Hypotheses – This stage involves stating the null and alternative hypotheses. Formulate an Analysis Plan – This stage involves the construction of an analysis plan. Analyse Sample Data – This stage involves the calculation and interpretation of the test statistic as described in the analysis plan. Interpret Results – This stage involves the application of the decision rule described in the analysis plan. Hypothesis testing is conducted in the following manner:
  • 19.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Descriptive vs Inferential Statistics Descriptive Statistics Inferential Statistics Concerned with Properties of Population Makes inferences from the sample Presents data in a meaningful manner Compares and predicts the future outcomes Outcomes are shown in form of charts, tables and graphs Outcomes are in the form of probability scores Describes the known data Tries to make conclusions beyond the data available Measures of central tendency and spread of data Hypothesis Testing and Analysis of variance.
  • 20.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Data Cleaning and Manipulation
  • 21.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Cleaning and Manipulation Data Cleaning The process of detecting and correcting corrupt or inaccurate records from a database is said to be Data Cleaning. Data Manipulation The process of changing data to make it more organized and easy to read is known as Data Manipulation.
  • 22.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Data Visualization
  • 23.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Visualization Data Visualization is the representation of data inform of charts, diagram etc. Bar Graph Scatter Plot Pie Chart Box Plot Line Graph
  • 24.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Bonus: Machine Learning
  • 25.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Machine Learning Machine Learning is a concept which allows the machine to learn from examples and experience, and that too without being explicitly programmed.
  • 26.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Analyst: Roles and Responsibilities Determining Organizational Goals Mine Data Data Cleaning Analyzing Data Pinpointing Trends and Patterns Creating Reports with Visualizations
  • 27.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Salary of Data Analyst Average Salary (US) Average Salary (IND) $83,878 ₹404,660
  • 28.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Need of R
  • 29.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Need of R R is open-source and freely available. R is cross-platform compatible. R is a powerful scripting language. R is highly flexible and evolved.
  • 30.
    Copyright © 2018,edureka and/or its affiliates. All rights reserved. Hands-On
  • 31.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Hands-On To perform data analysis on the below data set and gather some insights.
  • 32.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Analytics @edureka Program Starts 2nd Week 7th Week 15th Week 11th Week 01 02 03 04 Statistics Essentials Probability Bayesian Interference Regression Making Statistics Data Analytics with R Data Manipulation Exploratory Analysis Regression Data Visualization Data Mining Sentiment Analysis SAS Training Advanced Statistical Techniques SAS Macros PROC SQL SAS ODS Advanced SAS Procedures Tableau Training LOD Expressions Tableau Desktop Tableau Public Data Visualization Integration with R Graduated as Data Analyst Self-Paced Instructor - Led
  • 33.
    Data Analytics MasterProgram www.edureka.co/masters-program/data-analyst-certification Data Analytics @edureka QlikView Certification Training Advanced MS Excel 2010 R Programming Certification Training Analytics for Retail Banks Decision Tree Modelling Using R Certification Training Machine Learning with Mahout Certification Training Advanced Predictive Modelling in R Certification Training