What is DataAnalytics?
• Analytics is the use of:
• Data
• Information technology
• Statistical analysis
• Quantitative methods
• Mathematical or computer-based
models
• To help managers:
• Gain improved insight about their
business operations
• Make better, fact-based decisions.
3
8
1
• Goals setting
•Vital, understandable, simple, short, and measurable goals
2
• Setting priorities for measurements
• Decide what to measuring, and what methods to use for measure it
3
• Data gathering
• Available datasets, recording/generating data
4
• Data cleansing
• Outlier rejection, missing values interpolation, data structuring
5
• Data analysis
• Data mining, business intelligence, data visualization, exploratory data analysis
6
• Precise results’ interpretation
• Checking whether they are helpful in meeting initial objectives, results limiting, or
inconclusive
9.
9
1. Goal Setting
•The business unit has to decide on objectives for the
data analytics.
• These objectives might be set out in question format
• For example, if a business is struggling to sell its
products, some relevant questions may be:
• Are we overpricing our goods?
• How is the competition’s product different to ours?
• To answer the question, “Are we overpricing our goods?”
business company have to gather data of:
• Production costs
• Details about the price of similar goods on the market.
10.
10
2. Setting Prioritiesfor
Measurements
• Determining what type of data is needed
to answer the questions regarding
objectives.
• How much time to take for the analysis
of the project.
• The units of measurement going to be
using.
11.
11
3. Data Gathering
•Data can be already available datasets
• Data can be generated by:
• The direct or interview method
• Company would interview “shoppers” regarding their favorite brand of
toothpaste.
• The indirect or questionnaire method
• The questionnaire are distributed to the respondents either by
personal delivery or by mail/email.
• The registration method
• The registration records kept by government organizations, e.g.,
NADRA.
• The experimental method
• Experimentation, simulation.
12.
12
4. Data Cleansing
•Data cleansing process identifying:
• Incomplete
• Incorrect
• Inaccurate
• Irrelevant parts of the data
• The dirty or coarse data is:
• Replaced
• Modified
• Or deleted.
14
5. Data Analysis
•Data analysis is process of:
• Evaluating data using:
• Analytical reasoning
• Logical reasoning
• To examine each component of the data provided.
17
I Preprocessing
• Datacleaning
• Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
• Data integration
• Integration of multiple databases, data cubes, or files
• Data transformation
• Normalization/ scaling and aggregation
• Data reduction
• Obtains reduced representation in volume but produces the same or
similar analytical results
18.
Data Normalization
• Min-maxnormalization
• Z-score normalization
• Normalization by decimal scaling
A
A
A
A
A
A
min
new
min
new
max
new
min
max
min
v
v _
)
_
_
(
'
A
A
dev
stand
mean
v
v
_
'
j
v
v
10
' Where, j is the smallest integer such that Max(| |) < 1
'
v
19.
19
II Feature EngineeringFE
• “Feature engineering is the process of transforming
raw data into features that better represent the
underlying problem to the predictive models,
resulting in improved accuracy on unseen data.”
Jason Brownlee, Machine Learning Mastery.
• As the models are getting better and better, the
focus shifts to what is put into them.
• Transforming data to create model’s inputs.