Data Visualization
by
Dr. Jitender Kumar
Assistant Professor
Department of Computer Science & Engineering
DEENBANDHU CHHOTU RAM UNIVERSITY OF SCIENCE
& TECHNOLOGY
MURTHAL, SONEPAT
DATA VISUALIZATION
 Data Visualization is the graphical
representation of information and data. It is a
particularly efficient way of communicating when
the data is enormous e.g. time series data.
 Data Visualization aids in analyzing voluminous
data in a simpler way i.e. it helps in
understanding trends, outliers, patterns and
taking data-driven decisions.
DATA VISUALIZATION
 Data visualization is the process of acquiring,
interpreting and comparing data in order to
clearly communicate complex ideas, thereby
facilitating the identification and analysis of
meaningful patterns.
 Data visualization without a message behind is
not data visualization at all. It’s just data.
DATA VISUALIZATION PROCESS
 Filtering & processing: Refining and cleaning
data to convert it into information through
analysis, interpretation, contextualization,
comparison, and research.
 Translation & visual representation: Shaping
the visual representation by defining graphic
resources, language, context, and the tone of the
representation, all of which are adapted for the
recipient.
 Perception & interpretation: Finally, the
visualization becomes effective when it has a
perceptive impact on the construction of
knowledge.
PROS OF DATA VISUALIZATION
 Quick access to a wider range of audience
 Conveys a lot of information in a precise way (i.e. The
amount of time one takes to understand the
complex data is reduced to a great extent when the
same data is present in the pictorial form.)
 Makes report more visually appealing
 Easy to remember
CONS OF DATA VISUALIZATION
 Misrepresent information, if an incorrect visual
representation is used i.e. one-sided e.g. the
individual bringing the information for the
equivalent may just think about the significant
part of the information or the information that
requirements center and may reject the
remainder of the information which may prompt
one-sided results.
 It gives assessment not exactness.
SIGNIFICANCE OF DATA
VISUALIZATION
 Changes over time
This is perhaps the most basic and common use
of data visualization. The reason it is the most
common is because most data has an element of
time involved. Therefore, the first step in a lot of
data analyses is to see how the data trends over
time.
 Determining frequency
If time is involved, it is logical that you should
determine how often the relevant events happen
over time.
SIGNIFICANCE OF DATA
VISUALIZATION
 Determining relationships (correlations)
It is difficult to determine the relationship between
two variables without a visualization, yet it is
important to be aware of relationships in data. This
is a great example of the value of data visualization
in data analysis e.g. Scatter plot.
 Examining a network
E.g.: In market research, marketing professionals
need to know which audiences to target with their
message, so they analyze the entire market to
identify audience clusters, bridges between the
clusters, influencers within clusters, and outliers.
SIGNIFICANCE OF DATA
VISUALIZATION
 Scheduling
When planning out a schedule or timeline for
a complex project, things can get confusing.
A Gantt chart solves that issue by clearly
illustrating each task within the project and how
long it will take to complete.
 Provides greater chances to identify new
business opportunities where they arise e.g. in
stock market, it predicts upcoming trends or
sales volumes and the revenue they would
generate.
 Aids in identification of areas that need
attention.
HISTORY OF DATA VISUALIZATION
Although data visualization has only recently
been recognized as a distinct discipline; it has
deep roots, dating back to the 2nd
century
cartographers and surveyors. The early origins of
data visualization can be traced to the ancient
Egyptians surveyors who organized celestial
bodies into tables to assist with the laying out of
towns and the creation of navigational maps to
aid exploration.
HISTORY OF DATA VISUALIZATION
 It was only during the 17th
century, when French
philosopher and mathematician, Rene
Descartes developed a two-dimensional
coordinate system for displaying values along
horizontal and vertical axis, that graphing began
to take shape.
 During the late 18th
century, Scottish social
scientist William Playfair changed the field of
visualization, by pioneering many of today’s
widely used visualizations – including the line
graph and bar chart, and then later the pie chart
and circle graph.
HISTORY OF DATA VISUALIZATION
 During the 19th
century there was a radical increase in the
use of statistical graphics and thematic mapping, which
according to Michael Friendly, occurred at a rate which has
not been matched until modern times. It was during this
period that all modern forms of statistical graphs were
invented including: pie charts, histograms, time-series plots,
contour plots, scatter plots, and any more.
 Part of the increasing push for data visualizations, was
caused by the establishment of state offices throughout
Europe, which were utilizing numbers in social planning,
commerce and transportation. The popularity and support for
visualizations during the 19th
century was regarded as an Age
of Enthusiasm, but was quickly followed by what Friendly
claims to be the Golden Age, with “unparalleled beauty and
many innovations in graphics and thematic cartography”.
HISTORY OF DATA VISUALIZATION
 In the second half of the 20th century, Jacques Bertin
used quantitative graphs to represent information
“intuitively, clearly, accurately, and efficiently”.
 John Tukey and Edward Tufte pushed the bounds of
data visualization; Tukey with his new statistical
approach of exploratory data analysis and Tufte with
his book “The Visual Display of Quantitative
Information” paved the way for refining data
visualization techniques for more than statisticians.
With the progression of technology came the
progression of data visualization; starting with hand-
drawn visualizations and evolving into more technical
applications– including interactive designs leading to
software visualization.
HISTORY OF DATA VISUALIZATION
(21ST
CENTURY)
 Programs like SAS (statistical software suite),
SOFA (statistical open for all), R, Minitab,
cornerstone and more allow for data visualization
in the field of statistics.
 Other data visualization applications, more
focused and unique to individuals, programming
languages such as D3 (a JavaScript library for
producing data visualization in web browsers),
Python, and JavaScript help to make the
visualization of quantitative data a possibility.
TRAITS OF MEANINGFUL DATA
Meaningful data is a high-quality information
that can be used to evaluate the efficacy and
effectiveness of a program.
 Accuracy
 Completeness
 Reliability
 Relevance
 Timeliness
TRAITS OF MEANINGFUL DATA
Characteristic How it is measured
Accuracy
What ever
the data is, it
should be
error free
Is the information correct in every
detail?
Accuracy is a crucial data quality
characteristic because inaccurate information
can cause significant problems with severe
consequences.
e.g. does a customer really have $1 million in
his bank account?
Completeness How comprehensive is the information?
e.g. Let’s say you’re sending a mailing out. You
need a customer’s last name to ensure the mail
goes to the right address – without it, the data
is incomplete.
TRAITS OF MEANINGFUL DATA
Characteristic How it is measured
Timeliness How up- to-date is information? Can it
be used for real-time reporting?
The timeliness of information is an
important data quality characteristic,
because information that isn’t timely can
lead to people making the wrong decisions.
In turn, that costs organizations time,
money, and reputational damage.
TRAITS OF MEANINGFUL DATA
Characteristic How it is measured
Reliability
The source of
data should
not be biased
Does the information contradict other
trusted resources?
e.g. if a person’s birthday is January 1,
1970 in one system, yet it’s June 13, 1973
in another, the information is unreliable.
Relevance Do you really need this information?
If you’re gathering irrelevant information,
you’re wasting time as well as money.
POWER OF VISUAL PERCEPTION
 A human can easily distinguish differences in
line, length, shape, orientation, distances, and
color readily without significant processing effort.
 For example, it may require significant time and
effort (“attentive processing”) to identify the
number of times the digit “5” appears in a series
of numbers; but if that digit is different in size,
orientation, or color, instances of the digit can be
noted quickly through pre-attentive processing.
POWER OF VISUAL PERCEPTION
 It has been observed by a study that we typically
process images 60,000 times faster than a table
or a text, and that our brains typically do a better
job in remembering them for the long term.
 That same research detected that after three
days, analyzed subjects retained between 10%
and 20% of written or spoken information,
compared with 65% of visual information.
POWER OF VISUAL PERCEPTION
Rationale behind the Power of Visuals
 The human mind can see an image for just 13
milliseconds and store the information, provided
that it is associated with a concept. Our eyes can
attend 36,000 visual messages per hour.
 40% of nerve fibers are connected to the retina.
POWER OF VISUAL PERCEPTION
 All of these indicate that human beings are
better at processing visual information, which is
lodged in our long-term memory.
 Consequently, for reports and statements, a
visual representation that uses images is a much
more effective way to communicate information
than text or a table; it also takes up much less
space. This is the real strength of data
visualization.
POWER OF VISUAL PERCEPTION
 Explaining (Why and How)
(Helps in answering questions, support decisions, communicate
information e.g. find a country with the greatest demand for a
product on a demand map)
 Exploring (inquire into a subject in detail)
(When the goal of a visual is to explore, the viewers start by
familiarizing themselves with the dataset, then identify an area of
interest, asking questions, and finding several solutions or
answers.)
 Analyzing
(Visuals prompt viewers to inspect, distill, and transform the most
significant information in a data set so that they can discover
something new or predict upcoming situations or discover the
nature and relationship.)
MAKING ABSTRACT DATA VISIBLE
Fundamental Choices in Data Visualization
 What to represent
 How to represent
MAKING ABSTRACT DATA VISIBLE
Abstract Data
 Numeric (Quantitative) data
 Discrete: Data that consists of whole numbers e.g.
number of children in a family etc.
 Continuous: Data that can take any value within an
interval e.g. people between 70-80 Kgs.
 Non-numeric (Qualitative) data
 Ordinal: Data that follows an order or sequence e.g. in
Jobs scheduling first x Job, then…..
 Categorical Data: It follows no fixed order e.g. varieties
of product sold.
MAKING ABSTRACT DATA VISIBLE
The most fundamental data visualization approaches are :
 Line charts
 Bar chart
 Histogram
 Pie charts
 Scatter plots
 Heat maps
MAKING ABSTRACT DATA VISIBLE
The most fundamental data visualization
approaches are:
 Bubble charts
 Radar charts
 Waterfall charts
 Tree maps
 Area charts
LINE GRAPH
 Line graph or the
linear graph is used to
display the continuous
data and it is useful
for predicting future
events over time.
HISTOGRAM
Bar Graph: It is used
to display the category
of data and compares
the data using solid
bars by representing
the quantities.
Grade Students
A 4
B 12
C 10
D 2
A B C D
0
5
10
15
Students Grade
Grade
Students
HISTOGRAM
A histogram is used to represent the data
that is grouped into continuous ranges and
each range corresponds to a vertical bar.
 The horizontal axis displays the number range
and the vertical axis (frequency) represents the
amount of data that is present in each range.
 The number ranges depend upon the data that is
being used.
HOW DO YOU CONSTRUCT A
HISTOGRAM?
The steps to construct a histogram are as follows:
 Step 1:
Place the intervals on the horizontal axis by
choosing a suitable scale.
 Step 2:
Place frequencies on the vertical axis by
choosing a suitable scale.
 Step 3:
Construct vertical bars according to the given
frequencies.
HISTOGRAM
 The height of the trees (in feet) is given below:
61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 73, 73.5, 74, 74.5, 76,
76.2, 76.5, 77, 77.5, 78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87
Height
Range
Number of Tress
(Frequency)
60 - 65 3
66 - 70 3
71 - 75 8
76 - 80 10
81 - 85 5
86 - 90 1
DIFFERENCE BETWEEN
HISTOGRAM AND BAR
GRAPH/CHART
 Histograms are used to show distributions of variables
while bar charts are used to compare variables.
 Histograms plot quantitative data with ranges of the
data that is grouped into bins or intervals while bar
charts plot categorical data.
 The bars of bar chart typically have the same width. The
widths of the bars in a histogram need not be the same as
long as the total area is one hundred percent if percents
are used or the total count if counts are used.
 Overall: The values in bar charts are given by the length
of the bar while values in histograms are given by areas.
In a histogram, there shouldn’t be any gaps between the
bars.
PIE CHART (CIRCLE GRAPH)
 It shows the
relationships of the
parts of the whole.
The circle is
considered with 100%
and the categories
occupied is
represented with that
specific percentage
like 15%, 45%, etc.
SCATTER PLOT/CHART
 It uses dots to
represent values for
two different numeric
variables. The position
of each dot on the
horizontal and vertical
axis indicates values
for an individual data
point. Scatter plots are
used to observe
relationships between
variables.
HEAT MAP
 It depicts values for a main variable of interest across
two axis as a grid of colored squares. The axis
variables are divided into ranges like a bar chart or
histogram, and each cell's color indicates the value of
the main variable in the corresponding cell range.
 A heatmap uses color to show changes and magnitude
of a third variable to a two-dimensional plot.
 Heatmaps are used to help show patterns and
changes. While they can be used to show changes over
time, they are not designed for detailed analysis.
HEAT MAP
HEAT MAP
BUILDING BLOCKS OF
INFORMATION VISUALIZATION
ANALYTICAL TECHNIQUES
The most fundamental data analysis approaches are:
 Visualization (histograms, scatter plots, surface plots, tree maps,
parallel coordinate plots, etc.)
 Statistics (hypothesis testing, Regression, PCA i.e. Principle
component analysis etc.)
 Data Mining (Association mining etc.)
 Machine Learning (Clustering, Classification, Decision Tress etc.)
Among these approaches, information visualization, or visual
data analysis, is the most reliant on the cognitive skills of human
analysts, and allows the discovery of unstructured actionable
insights that are limited only by human imagination and
creativity. The analyst does not have to learn any sophisticated
methods to be able to interpret the visualizations of the data.
VISUAL ANALYTIC PROCESS
 Sometimes, data can be overwhelming. There’s too much
of it, too little time to comprehend it, or you simply can’t
see the data you have available at your disposal. If so,
visual data analysis can help you make sense of it all, by
combining data analytics and data visualization
techniques.
 Today, data is produced at an incredible rate and the
ability to collect and store the data is increasing at a
faster rate than the ability to analyze it.
 The Visual Analytics Process combines automatic and
visual analysis methods with a tight coupling through
human interaction in order to gain knowledge from data.
VISUAL ANALYTIC PROCESS
VISUAL ANALYTIC PROCESS
The first step is often to
preprocess and transform
the data to derive different
representations for further
exploration (as indicated by
the Transformation arrow
in the figure). Other typical
preprocessing tasks include
data cleaning,
normalization, grouping, or
integration of
heterogeneous data
sources.
VISUAL ANALYTIC PROCESS
 After the transformation, the
analyst may choose between
applying visual or automatic
analysis methods.
 If an automated analysis is used
first, data mining methods are
applied to generate models of the
original data. Once a model is
created the analyst has to evaluate
and refine the models, which can
best be done by interacting with the
data.
 Visualizations allow the analysts to
interact with the automatic methods
by modifying parameters or
selecting other analysis algorithms.
VISUAL ANALYTIC PROCESS
 Alternating between visual and
automatic methods is
characteristic for the Visual
Analytics process and leads to a
continuous refinement and
verification of preliminary results.
 Misleading results in an
intermediate step can thus be
discovered at an early stage,
leading to better results and a
higher confidence.
 If a visual data exploration is
performed first, the user has to
confirm the generated hypotheses
by an automated analysis.
VISUAL ANALYTIC PROCESS
 User interaction with the
visualization is needed to reveal
insightful information, for instance by
zooming in on different data areas or
by considering different visual views
on the data.
 Findings in the visualizations can be
used to steer model building in the
automatic analysis.
 In summary, in the Visual Analytics
Process knowledge can be gained from
visualization, automatic analysis, as
well as the preceding interactions
between visualizations, models, and
the human analysts.
VISUAL ANALYTIC TECHNIQUES
 Time-series: A single variable is captured over a
period of time, such as the unemployment rate
over a 10-year period. A line chart, area chart,
stock chart may be used to demonstrate the trend.
 Ranking: Categorical subdivisions are ranked in
ascending or descending order, such as a ranking
of sales performance (the measure) by
salespersons (the category, with each salesperson
a categorical subdivision) during a single period. A
bar chart may be used to show the comparison
across the salespersons.
VISUAL ANALYTIC TECHNIQUES
 Part-to-whole: Categorical subdivisions are
measured as a ratio to the whole (i.e., a percentage
out of 100%). A pie chart, pyramid chart, treemap
chart or bar chart can show the comparison of
ratios, such as the market share represented by
competitors in a market.
 Deviation: Categorical subdivisions are compared
against a reference, such as a comparison of actual
vs. budget expenses for several departments of a
business for a given time period. A bar chart with
errors can show the comparison of the actual versus
the reference amount.
VISUAL ANALYTIC TECHNIQUES
 Frequency distribution: Shows the number of
observations of a particular variable for a given
interval, such as the number of years in which the
stock market return is between intervals such as
0–10%, 11–20%, etc. A histogram, a type of bar
chart, may be used for this analysis.
 Correlation: Comparison between observations
represented by two variables (X,Y) to determine if
they tend to move in the same or opposite
directions. For example, plotting unemployment
(X) and inflation (Y) for a sample of months. A
scatter plot is typically used for this message.
VISUAL ANALYTIC TECHNIQUES
 Nominal comparison: Comparing categorical
subdivisions in no particular order, such as the
sales volume by product code. A bar chart may be
used for this comparison.
 Geographic or geospatial: Comparison of a
variable across a map or layout, such as the
unemployment rate by state or the number of
persons on the various floors of a building. A
cartogram is a typical graphic used.
REFERENCES
S. C. Gupta and V. K. Kapoor, “Mathematical
Statistics,” Sultan Chand & Sons, publication.
https://www.import.io/post/what-is-data-
visualization/
Prem S. Mann, “Introductory Statistics,” Wiley.
Thank you

Data visualisation in data analytics with python

  • 1.
    Data Visualization by Dr. JitenderKumar Assistant Professor Department of Computer Science & Engineering DEENBANDHU CHHOTU RAM UNIVERSITY OF SCIENCE & TECHNOLOGY MURTHAL, SONEPAT
  • 2.
    DATA VISUALIZATION  DataVisualization is the graphical representation of information and data. It is a particularly efficient way of communicating when the data is enormous e.g. time series data.  Data Visualization aids in analyzing voluminous data in a simpler way i.e. it helps in understanding trends, outliers, patterns and taking data-driven decisions.
  • 3.
    DATA VISUALIZATION  Datavisualization is the process of acquiring, interpreting and comparing data in order to clearly communicate complex ideas, thereby facilitating the identification and analysis of meaningful patterns.  Data visualization without a message behind is not data visualization at all. It’s just data.
  • 4.
    DATA VISUALIZATION PROCESS Filtering & processing: Refining and cleaning data to convert it into information through analysis, interpretation, contextualization, comparison, and research.  Translation & visual representation: Shaping the visual representation by defining graphic resources, language, context, and the tone of the representation, all of which are adapted for the recipient.  Perception & interpretation: Finally, the visualization becomes effective when it has a perceptive impact on the construction of knowledge.
  • 5.
    PROS OF DATAVISUALIZATION  Quick access to a wider range of audience  Conveys a lot of information in a precise way (i.e. The amount of time one takes to understand the complex data is reduced to a great extent when the same data is present in the pictorial form.)  Makes report more visually appealing  Easy to remember
  • 6.
    CONS OF DATAVISUALIZATION  Misrepresent information, if an incorrect visual representation is used i.e. one-sided e.g. the individual bringing the information for the equivalent may just think about the significant part of the information or the information that requirements center and may reject the remainder of the information which may prompt one-sided results.  It gives assessment not exactness.
  • 7.
    SIGNIFICANCE OF DATA VISUALIZATION Changes over time This is perhaps the most basic and common use of data visualization. The reason it is the most common is because most data has an element of time involved. Therefore, the first step in a lot of data analyses is to see how the data trends over time.  Determining frequency If time is involved, it is logical that you should determine how often the relevant events happen over time.
  • 8.
    SIGNIFICANCE OF DATA VISUALIZATION Determining relationships (correlations) It is difficult to determine the relationship between two variables without a visualization, yet it is important to be aware of relationships in data. This is a great example of the value of data visualization in data analysis e.g. Scatter plot.  Examining a network E.g.: In market research, marketing professionals need to know which audiences to target with their message, so they analyze the entire market to identify audience clusters, bridges between the clusters, influencers within clusters, and outliers.
  • 9.
    SIGNIFICANCE OF DATA VISUALIZATION Scheduling When planning out a schedule or timeline for a complex project, things can get confusing. A Gantt chart solves that issue by clearly illustrating each task within the project and how long it will take to complete.  Provides greater chances to identify new business opportunities where they arise e.g. in stock market, it predicts upcoming trends or sales volumes and the revenue they would generate.  Aids in identification of areas that need attention.
  • 10.
    HISTORY OF DATAVISUALIZATION Although data visualization has only recently been recognized as a distinct discipline; it has deep roots, dating back to the 2nd century cartographers and surveyors. The early origins of data visualization can be traced to the ancient Egyptians surveyors who organized celestial bodies into tables to assist with the laying out of towns and the creation of navigational maps to aid exploration.
  • 11.
    HISTORY OF DATAVISUALIZATION  It was only during the 17th century, when French philosopher and mathematician, Rene Descartes developed a two-dimensional coordinate system for displaying values along horizontal and vertical axis, that graphing began to take shape.  During the late 18th century, Scottish social scientist William Playfair changed the field of visualization, by pioneering many of today’s widely used visualizations – including the line graph and bar chart, and then later the pie chart and circle graph.
  • 12.
    HISTORY OF DATAVISUALIZATION  During the 19th century there was a radical increase in the use of statistical graphics and thematic mapping, which according to Michael Friendly, occurred at a rate which has not been matched until modern times. It was during this period that all modern forms of statistical graphs were invented including: pie charts, histograms, time-series plots, contour plots, scatter plots, and any more.  Part of the increasing push for data visualizations, was caused by the establishment of state offices throughout Europe, which were utilizing numbers in social planning, commerce and transportation. The popularity and support for visualizations during the 19th century was regarded as an Age of Enthusiasm, but was quickly followed by what Friendly claims to be the Golden Age, with “unparalleled beauty and many innovations in graphics and thematic cartography”.
  • 13.
    HISTORY OF DATAVISUALIZATION  In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information “intuitively, clearly, accurately, and efficiently”.  John Tukey and Edward Tufte pushed the bounds of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book “The Visual Display of Quantitative Information” paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization; starting with hand- drawn visualizations and evolving into more technical applications– including interactive designs leading to software visualization.
  • 14.
    HISTORY OF DATAVISUALIZATION (21ST CENTURY)  Programs like SAS (statistical software suite), SOFA (statistical open for all), R, Minitab, cornerstone and more allow for data visualization in the field of statistics.  Other data visualization applications, more focused and unique to individuals, programming languages such as D3 (a JavaScript library for producing data visualization in web browsers), Python, and JavaScript help to make the visualization of quantitative data a possibility.
  • 15.
    TRAITS OF MEANINGFULDATA Meaningful data is a high-quality information that can be used to evaluate the efficacy and effectiveness of a program.  Accuracy  Completeness  Reliability  Relevance  Timeliness
  • 16.
    TRAITS OF MEANINGFULDATA Characteristic How it is measured Accuracy What ever the data is, it should be error free Is the information correct in every detail? Accuracy is a crucial data quality characteristic because inaccurate information can cause significant problems with severe consequences. e.g. does a customer really have $1 million in his bank account? Completeness How comprehensive is the information? e.g. Let’s say you’re sending a mailing out. You need a customer’s last name to ensure the mail goes to the right address – without it, the data is incomplete.
  • 17.
    TRAITS OF MEANINGFULDATA Characteristic How it is measured Timeliness How up- to-date is information? Can it be used for real-time reporting? The timeliness of information is an important data quality characteristic, because information that isn’t timely can lead to people making the wrong decisions. In turn, that costs organizations time, money, and reputational damage.
  • 18.
    TRAITS OF MEANINGFULDATA Characteristic How it is measured Reliability The source of data should not be biased Does the information contradict other trusted resources? e.g. if a person’s birthday is January 1, 1970 in one system, yet it’s June 13, 1973 in another, the information is unreliable. Relevance Do you really need this information? If you’re gathering irrelevant information, you’re wasting time as well as money.
  • 19.
    POWER OF VISUALPERCEPTION  A human can easily distinguish differences in line, length, shape, orientation, distances, and color readily without significant processing effort.  For example, it may require significant time and effort (“attentive processing”) to identify the number of times the digit “5” appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing.
  • 20.
    POWER OF VISUALPERCEPTION  It has been observed by a study that we typically process images 60,000 times faster than a table or a text, and that our brains typically do a better job in remembering them for the long term.  That same research detected that after three days, analyzed subjects retained between 10% and 20% of written or spoken information, compared with 65% of visual information.
  • 21.
    POWER OF VISUALPERCEPTION Rationale behind the Power of Visuals  The human mind can see an image for just 13 milliseconds and store the information, provided that it is associated with a concept. Our eyes can attend 36,000 visual messages per hour.  40% of nerve fibers are connected to the retina.
  • 22.
    POWER OF VISUALPERCEPTION  All of these indicate that human beings are better at processing visual information, which is lodged in our long-term memory.  Consequently, for reports and statements, a visual representation that uses images is a much more effective way to communicate information than text or a table; it also takes up much less space. This is the real strength of data visualization.
  • 23.
    POWER OF VISUALPERCEPTION  Explaining (Why and How) (Helps in answering questions, support decisions, communicate information e.g. find a country with the greatest demand for a product on a demand map)  Exploring (inquire into a subject in detail) (When the goal of a visual is to explore, the viewers start by familiarizing themselves with the dataset, then identify an area of interest, asking questions, and finding several solutions or answers.)  Analyzing (Visuals prompt viewers to inspect, distill, and transform the most significant information in a data set so that they can discover something new or predict upcoming situations or discover the nature and relationship.)
  • 24.
    MAKING ABSTRACT DATAVISIBLE Fundamental Choices in Data Visualization  What to represent  How to represent
  • 25.
    MAKING ABSTRACT DATAVISIBLE Abstract Data  Numeric (Quantitative) data  Discrete: Data that consists of whole numbers e.g. number of children in a family etc.  Continuous: Data that can take any value within an interval e.g. people between 70-80 Kgs.  Non-numeric (Qualitative) data  Ordinal: Data that follows an order or sequence e.g. in Jobs scheduling first x Job, then…..  Categorical Data: It follows no fixed order e.g. varieties of product sold.
  • 26.
    MAKING ABSTRACT DATAVISIBLE The most fundamental data visualization approaches are :  Line charts  Bar chart  Histogram  Pie charts  Scatter plots  Heat maps
  • 27.
    MAKING ABSTRACT DATAVISIBLE The most fundamental data visualization approaches are:  Bubble charts  Radar charts  Waterfall charts  Tree maps  Area charts
  • 28.
    LINE GRAPH  Linegraph or the linear graph is used to display the continuous data and it is useful for predicting future events over time.
  • 29.
    HISTOGRAM Bar Graph: Itis used to display the category of data and compares the data using solid bars by representing the quantities. Grade Students A 4 B 12 C 10 D 2 A B C D 0 5 10 15 Students Grade Grade Students
  • 30.
    HISTOGRAM A histogram isused to represent the data that is grouped into continuous ranges and each range corresponds to a vertical bar.  The horizontal axis displays the number range and the vertical axis (frequency) represents the amount of data that is present in each range.  The number ranges depend upon the data that is being used.
  • 31.
    HOW DO YOUCONSTRUCT A HISTOGRAM? The steps to construct a histogram are as follows:  Step 1: Place the intervals on the horizontal axis by choosing a suitable scale.  Step 2: Place frequencies on the vertical axis by choosing a suitable scale.  Step 3: Construct vertical bars according to the given frequencies.
  • 32.
    HISTOGRAM  The heightof the trees (in feet) is given below: 61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 73, 73.5, 74, 74.5, 76, 76.2, 76.5, 77, 77.5, 78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87 Height Range Number of Tress (Frequency) 60 - 65 3 66 - 70 3 71 - 75 8 76 - 80 10 81 - 85 5 86 - 90 1
  • 33.
    DIFFERENCE BETWEEN HISTOGRAM ANDBAR GRAPH/CHART  Histograms are used to show distributions of variables while bar charts are used to compare variables.  Histograms plot quantitative data with ranges of the data that is grouped into bins or intervals while bar charts plot categorical data.  The bars of bar chart typically have the same width. The widths of the bars in a histogram need not be the same as long as the total area is one hundred percent if percents are used or the total count if counts are used.  Overall: The values in bar charts are given by the length of the bar while values in histograms are given by areas. In a histogram, there shouldn’t be any gaps between the bars.
  • 34.
    PIE CHART (CIRCLEGRAPH)  It shows the relationships of the parts of the whole. The circle is considered with 100% and the categories occupied is represented with that specific percentage like 15%, 45%, etc.
  • 35.
    SCATTER PLOT/CHART  Ituses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.
  • 36.
    HEAT MAP  Itdepicts values for a main variable of interest across two axis as a grid of colored squares. The axis variables are divided into ranges like a bar chart or histogram, and each cell's color indicates the value of the main variable in the corresponding cell range.  A heatmap uses color to show changes and magnitude of a third variable to a two-dimensional plot.  Heatmaps are used to help show patterns and changes. While they can be used to show changes over time, they are not designed for detailed analysis.
  • 37.
  • 38.
  • 39.
  • 40.
    ANALYTICAL TECHNIQUES The mostfundamental data analysis approaches are:  Visualization (histograms, scatter plots, surface plots, tree maps, parallel coordinate plots, etc.)  Statistics (hypothesis testing, Regression, PCA i.e. Principle component analysis etc.)  Data Mining (Association mining etc.)  Machine Learning (Clustering, Classification, Decision Tress etc.) Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data.
  • 41.
    VISUAL ANALYTIC PROCESS Sometimes, data can be overwhelming. There’s too much of it, too little time to comprehend it, or you simply can’t see the data you have available at your disposal. If so, visual data analysis can help you make sense of it all, by combining data analytics and data visualization techniques.  Today, data is produced at an incredible rate and the ability to collect and store the data is increasing at a faster rate than the ability to analyze it.  The Visual Analytics Process combines automatic and visual analysis methods with a tight coupling through human interaction in order to gain knowledge from data.
  • 42.
  • 43.
    VISUAL ANALYTIC PROCESS Thefirst step is often to preprocess and transform the data to derive different representations for further exploration (as indicated by the Transformation arrow in the figure). Other typical preprocessing tasks include data cleaning, normalization, grouping, or integration of heterogeneous data sources.
  • 44.
    VISUAL ANALYTIC PROCESS After the transformation, the analyst may choose between applying visual or automatic analysis methods.  If an automated analysis is used first, data mining methods are applied to generate models of the original data. Once a model is created the analyst has to evaluate and refine the models, which can best be done by interacting with the data.  Visualizations allow the analysts to interact with the automatic methods by modifying parameters or selecting other analysis algorithms.
  • 45.
    VISUAL ANALYTIC PROCESS Alternating between visual and automatic methods is characteristic for the Visual Analytics process and leads to a continuous refinement and verification of preliminary results.  Misleading results in an intermediate step can thus be discovered at an early stage, leading to better results and a higher confidence.  If a visual data exploration is performed first, the user has to confirm the generated hypotheses by an automated analysis.
  • 46.
    VISUAL ANALYTIC PROCESS User interaction with the visualization is needed to reveal insightful information, for instance by zooming in on different data areas or by considering different visual views on the data.  Findings in the visualizations can be used to steer model building in the automatic analysis.  In summary, in the Visual Analytics Process knowledge can be gained from visualization, automatic analysis, as well as the preceding interactions between visualizations, models, and the human analysts.
  • 47.
    VISUAL ANALYTIC TECHNIQUES Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart, area chart, stock chart may be used to demonstrate the trend.  Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by salespersons (the category, with each salesperson a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the salespersons.
  • 48.
    VISUAL ANALYTIC TECHNIQUES Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart, pyramid chart, treemap chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.  Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart with errors can show the comparison of the actual versus the reference amount.
  • 49.
    VISUAL ANALYTIC TECHNIQUES Frequency distribution: Shows the number of observations of a particular variable for a given interval, such as the number of years in which the stock market return is between intervals such as 0–10%, 11–20%, etc. A histogram, a type of bar chart, may be used for this analysis.  Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message.
  • 50.
    VISUAL ANALYTIC TECHNIQUES Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.  Geographic or geospatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used.
  • 51.
    REFERENCES S. C. Guptaand V. K. Kapoor, “Mathematical Statistics,” Sultan Chand & Sons, publication. https://www.import.io/post/what-is-data- visualization/ Prem S. Mann, “Introductory Statistics,” Wiley.
  • 52.

Editor's Notes

  • #21 if you close your eyes in this situation, however, you will quickly start wobbling. This simple experiment shows that it is visual feedback that enables you to remain balanced.
  • #41 https://www.visual-analytics.eu/faq/#:~:text=The%20Visual%20Analytics%20Process%20combines,in%20the%20Visual%20Analytics%20Process.