A Chinese proverb states that "a picture is worth 1000 words"... it may even be worth more. Expanding on this point, this talk goes beyond aesthetics by introducing data visualization as a powerful tool for data exploration and knowledge communication. However, although data visualizations can be used to make story narratives more apprehendable and statistics easier to digest, they can also be used for deceit, misinformation and even propaganda. The negative impact of storytelling through data will be a prominent part of this talk where we will cover how misinformation can prevail unintentionally by misinterpreting the knowledge extracted from data, and intentionally by “fitting” the visualization to the message that must be conveyed.
Unraveling Multimodality with Large Language Models.pdf
Telling a Story – or Even Propaganda – Through Data Visualization
1. 7/16/19 1Demetris Trihinas
trihinas.d@unic.ac.cy
1Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Telling a Story
– or Even Propaganda –
Through Data Visualization
Demetris Trihinas
Department of Computer Science
ailab @ University of Nicosia
trihinas.d@unic.ac.cy
2. 7/16/19 2Demetris Trihinas
trihinas.d@unic.ac.cy
2Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Full-Time Faculty Member
University of Nicosia
“Designing and developing scalable and self-adaptive tools for data
management, exploration and visualization”
@dtrihinas
http://dtrihinas.info
https://ailab.unic.ac.cy/https://www.slideshare.net/DemetrisTrihinas
@AilabUnic
5. 7/16/19 5Demetris Trihinas
trihinas.d@unic.ac.cy
5Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Unemployment Data in the US
Colored visualization of unemployment per area
Which areas
have low
unemployment?
7. 7/16/19 7Demetris Trihinas
trihinas.d@unic.ac.cy
7Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Seismic Activity in California
Alcatraz
NationalPark
Hollywood
At the national
park are there no
seismic activity?
Is this a good
place to live?
8. 7/16/19 8Demetris Trihinas
trihinas.d@unic.ac.cy
8Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Visualization
Easier –for humans– to conceptually understand data by visually
focusing on the main information.
Data visualization is a tool for both disseminating knowledge
and a form of knowledge communication.
9. 7/16/19 9Demetris Trihinas
trihinas.d@unic.ac.cy
9Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Why Visual Representations?Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
What is the data
“telling” us?
How about letting me
“see” the data first?
10. 7/16/19 10Demetris Trihinas
trihinas.d@unic.ac.cy
10Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Visualization is also a Data Exploration Tool
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
CIS 602, Fall 2014
●
●
●
●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x1
y1
●
●
●●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x2
y2
●
●
●
●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x3
y3
●
●
●
●●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x4
y4
Why Visual?
19
[F. J. Anscombe]
Linear
dependency
…“perfect” linear
dependency
Without
”outlier…”
Should we just consider this an error and throw this point away?
13. 7/16/19 13Demetris Trihinas
trihinas.d@unic.ac.cy
13Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Dashboards, spreadsheets and visuals
only tell you what is happening.
But, they do not tell you why…
15. 7/16/19 15Demetris Trihinas
trihinas.d@unic.ac.cy
15Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Today’s Talk
• Data visualization as a communication and data
exploration tool
• Data storytelling
• Give your data a voice!
• The unintentional and intentional “bewares”
• Tools of the trade
16. 7/16/19 16Demetris Trihinas
trihinas.d@unic.ac.cy
16Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What is a Story?
A story is a set of – observations, facts, or
events, true or invented – that are presented
in a specific order such that they create an
emotional reaction in the audience.
17. 7/16/19 17Demetris Trihinas
trihinas.d@unic.ac.cy
17Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Storytelling
Data storytelling uses a narrative
tailored to a specific audience with the
intent to communicate information
extracted from (raw) data.
18. 7/16/19 18Demetris Trihinas
trihinas.d@unic.ac.cy
18Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What Makes a (Good) Story?
hypothesis –> data –> insights –> narrative –> visuals
The narrative
through visuals is
the key vehicle to
convey insights
extracted from
the data.
you start here…
19. 7/16/19 19Demetris Trihinas
trihinas.d@unic.ac.cy
19Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Storytelling – It’s a Brain Thing…
Narratives aid the memory process, via the
emotional aspect of a story which can engage
more parts of the brain, making the story, and
its elements, easier to recall.
How Stories Change the Brain. P. Zak, Berkeley, 2013.
21. 7/16/19 21Demetris Trihinas
trihinas.d@unic.ac.cy
21Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
the data
Camcorder vs Digital Camera Sales
Camcorder Digital Camera
…but also part
of the insights
23. 7/16/19 23Demetris Trihinas
trihinas.d@unic.ac.cy
23Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… has a Personal Touch
It’s not about how much
money you spent but how
many miles you traveled
and the equivalent of
those miles.
24. 7/16/19 24Demetris Trihinas
trihinas.d@unic.ac.cy
24Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… can be Interactive
Harsh reality
10 - 12% of our
lives is devoted to
travelling between
work, leisure and
our homes
25. 7/16/19 25Demetris Trihinas
trihinas.d@unic.ac.cy
25Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… can be Interactive
Focus even more on interesting information – time is factored in visual.
26. 7/16/19 26Demetris Trihinas
trihinas.d@unic.ac.cy
26Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Journalism is the Future…
Journalists need to be data-savvy. It used to be that you
would get stories by chatting to people in bars, and it
still might be that you’ll do it that way some times. But
now it’s also going to be about equipping yourself with
the tools to analyze data and picking out what is
interesting. And keeping it in perspective, helping
people out by really seeing where it all fits together, and
what’s going on in the world.
Sir Tim Berners-Lee (2013)
30. 7/16/19 30Demetris Trihinas
trihinas.d@unic.ac.cy
30Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
WSJ: The Impact of Vaccines (2015)
Data from CDC 1920-2014 (US)
Heatmap
“cool to warm” scale denoting number of infection cases
31. 7/16/19 31Demetris Trihinas
trihinas.d@unic.ac.cy
31Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
ProPublica: A Disappearing Planet
https://projects.propublica.org/extinctions/Sliding time window
Data from UN Red List of Species (2013)
Stack bar plot
quantities out of total
Stack bar plot
“clustered” by species
32. 7/16/19 32Demetris Trihinas
trihinas.d@unic.ac.cy
32Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Bloomberg: Most dangerous jobs (2015)
Data from U.S. Department of Labor
Tagline
Stacked bar plot with
highlighting on focused
category
34. 7/16/19 34Demetris Trihinas
trihinas.d@unic.ac.cy
34Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What is the Intended Story?
Mean arrival delay versus distance from New York City
Each point represents a destination, and the size of each point represents the number of
flights from New York to that destination in 2013.
Which is
the best
airline?
35. 7/16/19 35Demetris Trihinas
trihinas.d@unic.ac.cy
35Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Make a Figure for the “Generals”
• Common dataviz misconceptions:
• The audience sees your figures
and immediately infers the points
you are trying to make.
• The audience can rapidly process complex visualizations and
understand the key trends and relationships that are shown.
• Follow your audience “language” and thinking process.
Claus Wilke, “Fundamentals of Data Visualization”, https://serialmentor.com/dataviz/
37. 7/16/19 37Demetris Trihinas
trihinas.d@unic.ac.cy
37Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Death to Pie Charts [and Comic Sans]ie Charts
Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html
“I hate pie charts.
I mean, really hate them.”
Share of coverage
on TechCrunch
Redesign
Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html
“I hate pie charts.
I mean, really hate them.”
Share of coverage
on TechCrunch
38. 7/16/19 38Demetris Trihinas
trihinas.d@unic.ac.cy
38Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Storytelling Pie vs Bars
So, what to use instead?
http://www.storytellingwithdata.com/blog/2014/06/alternatives-to-pies
imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children
44. 7/16/19 44Demetris Trihinas
trihinas.d@unic.ac.cy
44Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Correlation
• Correlation is a statistical technique that tells us how
strongly related are pairs of variables.
• But… correlation does not tell us the why and how
behind the relationship.
• So… correlation just says that a relationship exists.
45. 7/16/19 45Demetris Trihinas
trihinas.d@unic.ac.cy
45Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Ice-Cream and Sunglass Sales
As the sales of ice creams is increasing so do
the sales of sunglasses.
46. 7/16/19 46Demetris Trihinas
trihinas.d@unic.ac.cy
46Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Causation
• Causation denotes that any change in the value of one
variable will cause a change in the value of another
variable.
• This means that one variable makes other to happen.
47. 7/16/19 47Demetris Trihinas
trihinas.d@unic.ac.cy
47Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Exercise and Calories
• When a person is exercising then the amount of
calories burned increases every minute.
• The former (exercise) is causing the latter (calories
burned) to happen.
48. 7/16/19 48Demetris Trihinas
trihinas.d@unic.ac.cy
48Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Ice-Cream and Homicides in New York
• A study in the 90’s showed that ice-cream sales are the
cause of homicides in New York.
• As the sales of ice-cream rise and fall, so do the
number of homicides -> correlation.
• But… does the consumption of ice-cream actually
cause the death of people in NY?
https://www.nytimes.com/2009/06/19/nyregion/19murder.html
49. 7/16/19 49Demetris Trihinas
trihinas.d@unic.ac.cy
49Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Correlation Does NOT Imply Causation
• The two things are, yes, correlated.
• But this does NOT mean one causes other.
Correlation is something which
we think, when we can’t see
under the covers.
So the less the information we
have the more we are forced
to observe correlations.
50. 7/16/19 50Demetris Trihinas
trihinas.d@unic.ac.cy
50Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
There is NO Correlation without Causation
If neither A nor B causes the other, and the two are
correlated, there must be some common cause. It may not
be a direct cause of each of them, but it’s there somewhere
“upstream” in the picture.
Bottom line:
you have to keep “digging”… don’t be lazy!
52. 7/16/19 52Demetris Trihinas
trihinas.d@unic.ac.cy
52Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Using a Sample of the Data
• How many football games do US citizens got to?
• To get an -exact- answer (100% correct), you must ask
everyone in the US (>350M people) -> Not practical!
• Use a random sample, meaning ask (much) less people
-> but we won’t be 100% correct.
53. 7/16/19 53Demetris Trihinas
trihinas.d@unic.ac.cy
53Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Small Sample Sizes
• Picking an adequate sample size is “part science and
part art”
• But statements, like “75% of (some group) plan to use
(some product) this year” become suspect when the
sample size is just 24 companies.
• Even worse… the sample size is NOT mentioned in the
study or visual at all.
54. 7/16/19 54Demetris Trihinas
trihinas.d@unic.ac.cy
54Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Biased Sampling
• This involves over/under polling a non-representative
group.
• A survey reveals that “81% of bank customers would
use mobile banking if it were available…”
• Meaningless if survey only polled people on their
mobile devices.
55. 7/16/19 55Demetris Trihinas
trihinas.d@unic.ac.cy
55Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Random Sample Selection
• Random… means random!
• You cannot just select 1000 people from one city, the
sample wont represent the whole country.
• You cannot just send FB messages to 1000 random
people, you will get a representation of FB users, and
of course not all of the country’s citizens use FB.
• So… constructing a random sample is actually hard!
56. 7/16/19 56Demetris Trihinas
trihinas.d@unic.ac.cy
56Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
Using the mean with values across non-uniform
populations.
What is the
starting
salary at a
company?
57. 7/16/19 57Demetris Trihinas
trihinas.d@unic.ac.cy
57Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
Using the median to hide a skewed data.
Invest with me.
My portfolio’s
median profit
is 8%.
median
mean
58. 7/16/19 58Demetris Trihinas
trihinas.d@unic.ac.cy
58Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
A survey is only as accurate as it’s standard error.
59. 7/16/19 59Demetris Trihinas
trihinas.d@unic.ac.cy
59Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Semi-Attached Persona
• Stating one thing as a proof for something else.
• For example, if an ad says “15% of CEOs drive a BMW;
more than any other brand”– what does that prove?
• The implication is that CEOs are some sort of
authorities on cars or it could be the other way around,
BMWs “make” CEOs.
60. 7/16/19 60Demetris Trihinas
trihinas.d@unic.ac.cy
60Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Lie Factor
• The size of the graphic effect should be directly
proportional to the numerical quantities:
Edward Tufte: Principles of Graphical Integrity
e Lie Factor
Size of effect shown in graphic
Size of effect in data
61. 7/16/19 61Demetris Trihinas
trihinas.d@unic.ac.cy
61Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Proportional Data -> Proportional Vizd bar chart?
Rule: Use channel proportional to data!
62. 7/16/19 62Demetris Trihinas
trihinas.d@unic.ac.cy
62Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Lying through Graphics
Lie Factor - Graphical Integrity
Magnitude in data
must correspond to
magnitude of mark
Flowing Data
Effect in Data: factor 1.14
Effect in Graphic: factor 5
Lie Factor: 5/1.14 = 4.38
35%
39.6%Scale Distortions
64. 7/16/19 64Demetris Trihinas
trihinas.d@unic.ac.cy
64Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Conceptualizing Scale via Comparison
To truly understand
scale a comparison
must be made.
This is good
visualization because
we have UK as
reference
65. 7/16/19 65Demetris Trihinas
trihinas.d@unic.ac.cy
65Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Diverging Value-Scaleample: Diverging Value-Scal
Who won the election?
Election
maps carry
significant
bias
66. 7/16/19 66Demetris Trihinas
trihinas.d@unic.ac.cy
66Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
US Election 2016 – Displaying Polls
• Donald Trump’s campaign used the actual US map to
present poll results.
• Influencing swing voters by feeding “your” news.
“…a lot of red folks… we’re winning...” The “reality”
72. 7/16/19 72Demetris Trihinas
trihinas.d@unic.ac.cy
72Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Instead of a Conclusion…
Data can be the source of a story, or it can be the
tool with which the story is told – or it can be both.
Like any source, it should be treated with
skepticism; and like any tool, we should be
conscious of how it can shape the stories that are
created with it.
73. 7/16/19 73Demetris Trihinas
trihinas.d@unic.ac.cy
73Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Telling a Story
– or Even Propaganda –
Through Data Visualization
Questions?
Demetris Trihinas
Department of Computer Science
ailab @ University of Nicosia
trihinas.d@unic.ac.cy