2. About me
Marco Torchiano
Associate Professor, Politecnico di Torino
Senior Member IEEE
Faculty Fellow – Nexa Center for Internet
and Society
Member UNI CT504–Software Engineering
Contacts:
– mailto:marco.torchiano@polito.it
– http://softeng.polito.it/torchiano/
– Twitter: @mtorchiano
3
3. Current Research Interests
Mobile UI Automated Testing
PhD student working on fragility
(Open)Data Quality
PhD student working on KB quality
Software Energy Consumption
Several collaborations
Also: MDD, Survey methodology, code
obfuscation, SE education, …
4
4. Agenda
Introduction to Data Visualization
A little bit of history
Visual perception
Graphical integrity
Visual encoding
Visual relationships
5
9. A simple literature review
ICSE 2017 Main Track
Analyzed 68 papers
118 graphs (figures with quantitative
information)
– ~ 1.7 per paper on average
199 tables
– ~ 3 per paper on average
11
11. Typical graph mistakes
Severe
Pie chart
Non zero-based bars
Double scale
Major
Grid with different axis ranges
Unlabeled Axis
Clarity
Rotated labels
Heavy Grid or Background
Too similar colors
Pattern fill
Raster image
Overplotting
14
12. Mistake frequency
Graphs Papers
Severity Freq Prop Freq Prop
Severe 15 15% 10 9%
Major 7 6% 5 4%
Clarity 37 31% 21 18%
any 53 45% 26 22%
15
Frequency of mistakes over all graphs
13. Tables general guidelines
Never, ever use vertical rules
Never use double rules
Put the units in the column heading
Not in the body of the table
Always precede a decimal point by a
digit; thus 0.1 not just .1
Provide all the values
Avoid …
16
Booktabs - Publication quality tables in LaTeX
14. Typical table mistakes
Formatting
Use of avoidable rules
– Mostly vertical and also horizontal
Misaligned numbers
Variable number of decimals
Table as image
17
28. Categorical encoding
Encoding of categorical levels
Position (along an axis)
Size
Color
– Intensity
– Saturation
– Hue
Shape
Fill pattern
Line style
38
Ordinal
29. Gestalt principles
Visual features that lead the viewer to
group visual objects together
40
Similarity Connection Closure
Proximity Enclosure Continuity
30. Similarity in shape + color
600,000
650,000
700,000
750,000
800,000
850,000
Q1 Q2 Q3 Q4
Booking Billing
41
Still difficult to
evaluate the trend
36. Principles of integrity
Proportionality
Representation as physical quantities
should be proportional to the represented
numbers
Utility
Graphical element should convey useful
information
Clarity
Labeling should counter graphical
distortion and ambiguity
48
39. Data-ink ratio (Utility)
Proportion of a graphic’s ink devoted
to the non-redundant display of data
information
1 – (proportion of a graphic that can
be erased without loss of information )
51
46. Clarity
Textual elements should provide
effective support to understanding
Hierarchical
– Size and position reflects importance
Readable
– Large enough
Horizontal
Close to data (avoid legends)
Always label the axes
58
48. Expenses Category Function
60
Ricerca
Vendite
Ges one
Contabilità
0
10
20
30
40
50
60
70
PagheA
rezzature
ViaggiConsum
abili
So
w
are
Altro
Proportionality:
3D perspective
falsify size
Utility: shading
convey no info
Clarity: bar
overlaps prevent
identification
and assessment
50. Information Visualization
Visual Perception
Visual Properties & Objects
Quantitative Reasoning
Quantitative Relationship & Comparison
Information Visualization
Visual Patterns, Trends, Exceptions
Understanding
Data
Representation/Encoding
Quantitative
51. Relationships
Within a category
Nominal comparison
Ranking
Part-to-whole
Distribution
Between measures
Time series
Deviation
Correlation
63
52. Nominal comparison
Compare quantitative values
corresponding to categorical levels
Small differences are difficult to see
– Non zero-based scale can emphasize
Dot plots can be used for small
differences
– They do not require zero based scale
64
53. Bars
0 20 40 60 80 100
Large
Medium
Small
Micro
Number of companies
65
54. Bar must be zero based
68
Proportionality:Clarity:
missing axis +
angled labels
59. Ranking
Purpose Sort order Bars orientation
Highlight the
highest value
Descending
H: highest on top
V: highest on left
Highlight the
lowest value
Ascending
H: lowest on top
V: lowest on left
74
• Same type as nominal comparison
• Pay attention to order
60. Part-to-whole
Best unit: percentage
Stacked bar graph
Difficult to read individual values
Area
Perceptual limitations
75
68. Pie Charts
Are a bad idea!
But if you insist…
Labels placed close to slices
Labels include values (percentages)
Only with a small number of categories
– Up to four
– Avoid rainbow pie
When proportions are distinct enough
85
69. Pie Misuse
86
13.60%
15.00%
16.70%
17.80%
20.40%
CONTRA FUTURE FUTURE
DEC 18 15
CASH - EU PRINCIPAL
POUND STERLING
PAYABLE 16OCT15 DEU
EUROS RECEIV. 16OCT15
DEU
US DOLLARS PAYABLE
08OCT15 DEU
Invesco Global Targeted Returns Fund class E EUR Acc Top 5 Assets
70. Fund Portfolio
87
13.60%
15.00%
16.70%
17.80%
20.40%
CONTRA FUTURE FUTURE
DEC 18 15
CASH - EU PRINCIPAL
POUND STERLING
PAYABLE 16OCT15 DEU
EUROS RECEIV. 16OCT15
DEU
US DOLLARS PAYABLE
08OCT15 DEU
Invesco Global Targeted Returns Fund class E EUR Acc
Proportionality:
area graph have
perception issues
Utility: shadow and
gradient fill convey
no info
Clarity: the separate
legend with color
coding makes
identification difficult
Data: slices do not
sum to 100%
72. Distribution
Two main types
Show distribution of single set of values
Show and compare two or more
distributions
89
73. Single distribution
Histogram
Vertical bar graph
Frequency for subdivision
– Quantitative ranges
– Categories
Emphasis on number of occurrences
Frequency polygon
Line graphs
Frequency density function
Emphasis on the shape of the distribution
90
74. Box plot
Outlier
Max value
75th percentile
Median
50th percentile
25th percentile
Min value
77. Confidence Intervals
94
Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error
Michael Correll, and Michael Gleicher
IEEE Transactions on Visualization and Computer Graphics, Dec. 2014
78. Interval may be Asymmetric
95
It is physically
impossible to
modify -6 files
79. Correlation
Relationships between two paired sets
of quantitative values
Scatter plot w/possible trend line
– Ok for educated audience
Correlation bar graph
Paired bar graph
96
88. Multiple variables
Correlation between 3+ variables
E.g. two measures in time series
Multiple units of measure
Double quantitative (y) axis
Multiple graphs
One variable not encoded explicitly
106
94. Small multiples
A.k.a.
Trellis
Lattice
Grid
Set of aligned graphs sharing (at least
one) scale and axis
Enable ease of comparison among
different measures
112
95. Small multiples
113
FT EU unemployment tracker
http://blogs.ft.com/ftdata/2015/04/17/eu-unemployment-tracker/
96. Time series
Series of relationships between
quantitative values that are associated
with categorical subdivisions of time
Communicate change
Time grows horizontally from left to
right
Cultural convention
Bars highlight individual points and hide
overall
120
102. Suggested Readings
Stephen Few, 2004.
Show me the numbers.
Analytics Press.
http://www.perceptualedge.com/blog/
Edward R. Tufte, 1983.
The Visual Display of
Quantitative Information.
Graphics Press.
127
103. Suggested readings
Andy Kirk, 2016
Data Visualization –
A Handbook for Data Driven Design
Sage
Tamara Munzner, 2014
Visualization Analysis and Design
CRC Press
Nathan Yau, 2011
Visualize This: The FlowingData Guide to
Design, Visualization, and Statistics
Wiley
128
104. References
Stephanie Evergreen, 2013.
Presenting Data Effectively:
Communicating Your
Findings for Maximum
Impact, SAGE Publications.
Alberto Cairo, 2012. The
Functional Art: An
introduction to information
graphics and visualization,
New Riders.
129
105. Reference
John W. Tukey, 1977,
Exploratory Data Analysis,
Pearson
William S. Cleveland, 1994,
The Elements of Graphing
Data, Hobart Press
130
106. References
C. Ware. Information Visualization: Perception
for Design. Morgan Kaufmann Publishers,
Inc., San Francisco, California, 2000
C. Healey, and J. Enns. Attention and Visual
Memory in Visualization and Computer
Graphics. IEEE Transactions on Visualization
and Computer Graphics, 18(7), 2012
I. Inbar, N. Tractinsky and J.Meyer.
Minimalism in information visualization:
attitudes towards maximizing the data-ink
ratio.
http://portal.acm.org/citation.cfm?id=1362587
131
107. References
S.Few, “Practical Rules for Using Color in Charts”
http://www.perceptualedge.com/articles/visual_busi
ness_intelligence/rules_for_using_color.pdf
D. Borland and R. M. Taylor Ii, "Rainbow Color
Map (Still) Considered Harmful," in IEEE Computer
Graphics and Applications, vol. 27, no. 2, pp. 14-
17, March-April 2007.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber
=4118486
http://www.color-blindness.com
http://www.csc.ncsu.edu/faculty/healey/PP/inde
x.html
132
Notas do Editor
Visualization compares multiple values and puts the information into context.
A single number means nothing.
Try to spot the 5s and count them
16
7 41
8.5 2.5
Which of the squares is darker? A or B?
ELEMENT (Geometry, Aesthetic )
Gestalt – pattern
Only length does matter require a ZERO BASED SCALE
Width of bars do not encode any information
(13.4/11.2)/(11.4/4.2)
Lie factor: 2.26
Large Medium Micro Small
80 50 30 10
Where is the axis?
Data: Le percentuali non sommano a 100%, come implicitamente ci si aspetta da una torta
Proportionality: I diagrammi hanno problemi percettivi per quanto riguarda la proporzionalità
Utility: L'ombreggiatura e il riempimento a gradiente non portano alcuna informazione
Clarity: La legenda separata e legata tramite codice di colore rende difficile l'identificazione degli spicchi
Data: Le percentuali non sommano a 100%, come implicitamente ci si aspetta da una torta
Proportionality: I diagrammi hanno problemi percettivi per quanto riguarda la proporzionalità
Utility: L'ombreggiatura e il riempimento a gradiente non portano alcuna informazione
Clarity: La legenda separata e legata tramite codice di colore rende difficile l'identificazione degli spicchi
We suggest the use of gradient plots (which use transparency to encode uncertainty) and violin plots (which use width) as better alternatives for inferential tasks than bar charts with error bars.