DATA VISUALIZATION
GIUSEPPE MASETTI
ESCI 872 – APPLIED TOOLS FOR OCEAN MAPPING – INTRODUCTION TO OCEAN DATA SCIENCE
Durham, NH – September 3, 2019
V0
WHY DO WE NEED DATA VISUALIZATION?
“Computer scientists are going to have to realize that
primary memory is the human brain, not RAM”
(Buxton, 2001)
AMOUNT OF
AVAILABLE DATA
HUMAN COGNITIVE
ABILITIES
Time
WHY DO WE NEED DATA VISUALIZATION?
“We are all cognitive cyborgs in this Internet age
in the sense that we rely heavily on cognitive
tools to amplify our mental abilities.”
(Ware, 2010)
“Often the most effective way to describe, explore,
and summarize a set of numbers – even a very large
set – is to look at pictures of those numbers.”
(Tufte, 2001)
CRITERIA FOR DATA VISUALIZATION
Perceptual hierarchy of visual cues
(Cleveland and McGill, 1985)
Accuracy
LENGTH (ALIGNED)
LENGTH
SLOPE ANGLE
AREA COLOR INTENSITY
COLOR HUE
VOLUME
CRITERIA FOR DATA VISUALIZATION
Which chart type?
Try with different ones!
Example of chart-chooser → Abela (2009)
CRITERIA FOR DATA VISUALIZATION
Which colormap?
Think at the following color wheel …
(source: Wikimedia Commons)
(source: colorbrewer2.org)
Diverging Sequential Qualitative
… and the story that you want to tell!
CRITERIA FOR DATA VISUALIZATION
𝑳𝒊𝒆 𝑭𝒂𝒄𝒕𝒐𝒓 =
𝑆𝑖𝑧𝑒 𝑜𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝑆ℎ𝑜𝑤𝑛 𝑖𝑛 𝐺𝑟𝑎𝑝ℎ𝑖𝑐
𝑆𝑖𝑧𝑒 𝑜𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝐷𝑎𝑡𝑎
(Tufte, 1991)
CRITERIA FOR DATA VISUALIZATION
𝑫𝒂𝒕𝒂 𝑰𝒏𝒌 𝑹𝒂𝒕𝒊𝒐 =
𝐷𝑎𝑡𝑎 𝐼𝑛𝑘
𝑇𝑜𝑡𝑎𝑙 𝐼𝑛𝑘 𝑖𝑛 𝑡ℎ𝑒 𝐺𝑟𝑎𝑝ℎ𝑖𝑐
(Tufte, 1983)
vs
???
(data source: http://pypl.github.io/PYPL.html)
CRITERIA FOR DATA VISUALIZATION
𝑫𝒂𝒕𝒂 𝑰𝒏𝒌 𝑹𝒂𝒕𝒊𝒐 =
𝐷𝑎𝑡𝑎 𝐼𝑛𝑘
𝑇𝑜𝑡𝑎𝑙 𝐼𝑛𝑘 𝑖𝑛 𝑡ℎ𝑒 𝐺𝑟𝑎𝑝ℎ𝑖𝑐
(Tufte, 1983)
Experiment on Data Ink Ratio
(Inbar et al., 2007)
• Approach: 87 students rated 2 graphs from Tufte (1983) work.
• Findings: a clear preference of non-minimalist bar-graphs.
• Take away message: “People did not like Tufte’s minimalist design of bar-
graphs; they seem to prefer "chartjunk" instead”.
DATA VISUALIZATION WITH PYTHON
(VanderPlas, 2017)
DATA VISUALIZATION WITH PYTHON
(VanderPlas, 2017)
• Well-tested, popular tool → First release: 2003
• Designed like Matlab → Ease the switch from Matlab
• Many rendering backends → Cross-platform, multiple formats
• A major weakness is the rendering speed for large data → Slow!
• Able to create just about any chart (with some efforts)
(source: Matplotlib gallery)
MODULE TASK → ADD PLOTTING CAPABILITIES
1. MINIMAL DEFINITION
2.+ __init__()
3.+ INITIALIZATION PARAMETER
4.+ ERROR CHECK
5.+ __str__()
6.+ read()
MODULE TASK → ADD PLOTTING CAPABILITIES
1. MINIMAL DEFINITION
2.+ __init__()
3.+ INITIALIZATION PARAMETER
4.+ ERROR CHECK
5.+ __str__()
6.+ read()
7.+ plot()
GO TO “INTRODUCTION TO MATPLOTLIB” NOTEBOOK
QUESTIONS?
Contact me at: gmasetti@ccom.unh.edu

ePOM - Intro to Ocean Data Science - Data Visualization

  • 1.
    DATA VISUALIZATION GIUSEPPE MASETTI ESCI872 – APPLIED TOOLS FOR OCEAN MAPPING – INTRODUCTION TO OCEAN DATA SCIENCE Durham, NH – September 3, 2019 V0
  • 2.
    WHY DO WENEED DATA VISUALIZATION? “Computer scientists are going to have to realize that primary memory is the human brain, not RAM” (Buxton, 2001) AMOUNT OF AVAILABLE DATA HUMAN COGNITIVE ABILITIES Time
  • 3.
    WHY DO WENEED DATA VISUALIZATION? “We are all cognitive cyborgs in this Internet age in the sense that we rely heavily on cognitive tools to amplify our mental abilities.” (Ware, 2010) “Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.” (Tufte, 2001)
  • 4.
    CRITERIA FOR DATAVISUALIZATION Perceptual hierarchy of visual cues (Cleveland and McGill, 1985) Accuracy LENGTH (ALIGNED) LENGTH SLOPE ANGLE AREA COLOR INTENSITY COLOR HUE VOLUME
  • 5.
    CRITERIA FOR DATAVISUALIZATION Which chart type? Try with different ones! Example of chart-chooser → Abela (2009)
  • 7.
    CRITERIA FOR DATAVISUALIZATION Which colormap? Think at the following color wheel … (source: Wikimedia Commons)
  • 8.
    (source: colorbrewer2.org) Diverging SequentialQualitative … and the story that you want to tell!
  • 9.
    CRITERIA FOR DATAVISUALIZATION 𝑳𝒊𝒆 𝑭𝒂𝒄𝒕𝒐𝒓 = 𝑆𝑖𝑧𝑒 𝑜𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝑆ℎ𝑜𝑤𝑛 𝑖𝑛 𝐺𝑟𝑎𝑝ℎ𝑖𝑐 𝑆𝑖𝑧𝑒 𝑜𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝐷𝑎𝑡𝑎 (Tufte, 1991)
  • 10.
    CRITERIA FOR DATAVISUALIZATION 𝑫𝒂𝒕𝒂 𝑰𝒏𝒌 𝑹𝒂𝒕𝒊𝒐 = 𝐷𝑎𝑡𝑎 𝐼𝑛𝑘 𝑇𝑜𝑡𝑎𝑙 𝐼𝑛𝑘 𝑖𝑛 𝑡ℎ𝑒 𝐺𝑟𝑎𝑝ℎ𝑖𝑐 (Tufte, 1983) vs ??? (data source: http://pypl.github.io/PYPL.html)
  • 11.
    CRITERIA FOR DATAVISUALIZATION 𝑫𝒂𝒕𝒂 𝑰𝒏𝒌 𝑹𝒂𝒕𝒊𝒐 = 𝐷𝑎𝑡𝑎 𝐼𝑛𝑘 𝑇𝑜𝑡𝑎𝑙 𝐼𝑛𝑘 𝑖𝑛 𝑡ℎ𝑒 𝐺𝑟𝑎𝑝ℎ𝑖𝑐 (Tufte, 1983) Experiment on Data Ink Ratio (Inbar et al., 2007) • Approach: 87 students rated 2 graphs from Tufte (1983) work. • Findings: a clear preference of non-minimalist bar-graphs. • Take away message: “People did not like Tufte’s minimalist design of bar- graphs; they seem to prefer "chartjunk" instead”.
  • 12.
    DATA VISUALIZATION WITHPYTHON (VanderPlas, 2017)
  • 13.
    DATA VISUALIZATION WITHPYTHON (VanderPlas, 2017)
  • 14.
    • Well-tested, populartool → First release: 2003 • Designed like Matlab → Ease the switch from Matlab • Many rendering backends → Cross-platform, multiple formats • A major weakness is the rendering speed for large data → Slow! • Able to create just about any chart (with some efforts)
  • 15.
  • 17.
    MODULE TASK →ADD PLOTTING CAPABILITIES 1. MINIMAL DEFINITION 2.+ __init__() 3.+ INITIALIZATION PARAMETER 4.+ ERROR CHECK 5.+ __str__() 6.+ read()
  • 18.
    MODULE TASK →ADD PLOTTING CAPABILITIES 1. MINIMAL DEFINITION 2.+ __init__() 3.+ INITIALIZATION PARAMETER 4.+ ERROR CHECK 5.+ __str__() 6.+ read() 7.+ plot()
  • 19.
    GO TO “INTRODUCTIONTO MATPLOTLIB” NOTEBOOK
  • 20.
    QUESTIONS? Contact me at:gmasetti@ccom.unh.edu