SlideShare uma empresa Scribd logo
1 de 58
Introduction to
Data Storytelling
Anand Madhav
2
3
4
Soho, London - Today Soho, London – 1853/54
Cholera deaths in London - 1854
5
Water Pump at Broad Street,
London
We’ve been telling stories with data for a long time
6
How a nurse changed the course
of a war using data storytelling.
Created by Florence Nightingale for
Queen Victoria during England’s war
with France.
Visualizes deaths due to:
Red: War wounds
Black: Other war-related causes
Blue: Avoidable hospital diseases
“Can’t I just show the data?”
“Why do I need visuals?”
How many numbers are above 100?
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
How many numbers are below 10?
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
Which quadrant has the highest total?
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
Answer them again,
with design changes…
11
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
How many numbers are above 100?
How many numbers are below 10?
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
Which quadrant has the highest total?
23 32 71 72 58 87 11 77 70 16
17 21 56 44 68 51 84 20 60 40
37 8 107 14 12 41 69 14 18 71
62 55 59 64 33 55 71 58 103 92
101 56 45 34 43 15 73 78 6 93
39 53 22 26 26 94 60 82 99 74
11 12 36 67 70 71 97 59 73 99
75 74 69 69 51 48 2 66 92 98
15 10 41 58 104 94 92 84 74 82
12 52 10 57 33 77 88 81 81 91
15 56 25 30 21 7 66 66 78 87
29 23 5 34 11 96 74 99 99 88
37 10 43 15 50 71 65 60 101 98
46 34 19 102 57 70 95 84 63 91
3 34 39 37 60 81 65 63 9 71
48 46 25 50 22 64 91 76 71 79
Visually representing data helps us to see patterns in the data quickly
“The greatest value of a picture is
when it forces us to notice
what we never expected to see.”
— John Tukey
15
Datasauras dataset, animated by Autodesk Research
With the growth of self-service BI, 85% of companies have lost track of how many
dashboards they generated
What QUESTION does
the dashboard answer?
Is the ANSWER evident
from the dashboard?
What ACTION should the
user take now?
BUT 3 THINGS
ARE UNCLEAR ON
MOST DASHBOARDS
16
Stories have a huge impact on humans
17
Storytelling has a 30X Return on Investment
Rob Walker and Joshua Glenn auctioned common
items like mugs, golf balls, toys, etc. The item
descriptions were stories purpose-written by 200+
contributing writers.
Items that were bought for $250 sold for over
$8,000 – a return of over 3,000% for storytelling!
Stories are memorable and viral
People remember stories. They’ll act on them.
People share stories. That enables collective
action.
We analyze data to improve people’s decision
making. For this to be effective, data stories are
needed more than ever before.
http://significantobjects.com/
Data alone can be useless and misleading without context
Data storytelling is a critical skill for data scientists, analysts & managers
19
Share your data & analysis as data stories
Whenever you share inferences from data – whether
it’s as a presentation, or an email or document with
your analysis, or as a dashboard – craft it as a story.
Today we will look at a few engaging data stories and
find out how to convert an analysis into a memorable
story – even if you’ve never told a story before.
But analysts present their work, not their message
Data scientists present their analysis – what they did,
and what they found. That’s not what the audience
needs.
Audiences need a message that tells them what to do,
and why. Told in an engaging way. As a story.
New digital tools allow us to do this at scale
New digital tools allow us to do this at scale
Sachin
R Tendulkar
Sourav
C Ganguly
Rahul
Dravid
Mohammad
Azharuddin
Yuvraj
Singh
Virender
Sehwag
Mahendra
S Dhoni
Alaysinhji
D Jadeja
Navjot
S Sidhu
Gautam
Gambhir
Krishnamachari
Srikkanth
Kapil
Dev
Dillip
B Vengsarkar
Suresh
K Raina
Ravishankar
J Shastri
Sunil
M Gavaskar
Mohammad
Kaif
Virat
Kohli
Vinod
G Kambli
Vangipurappu V
S Laxman
Rabindra
R Singh
Sanjay
V Manjrekar
Mohinder
Amarnath
Manoj
M Prabhakar
Rohit
G Sharma
Irfan
K Pathan
Nayan
R Mongia
Ajit
B Agarkar
Dinesh
Mongia
Harbhajan
Singh
Krishna K
D Karthik
Sandeep
M Patil
Anil
Kumble
Yashpal
Sharma
Javagal
Srinath
Hemang
K Badani
Yusuf
K Pathan
Robin
V Uthappa
Raman
Lamba
Zaheer
Khan
Ravindra
A Jadeja
Pathiv
A Patel
Sadagopan
Ramesh
Roger M
H Binny
Woorkeri
V Raman
Sunil
B Joshi
Kiran
S More
Praveen
K Amre
Ashok
Malhotra
Chetan
Sharma
Kapil Dev’s 175 against
Zimbabwe in 1983
Gavaskar’s 107 against
New Zealand in 1987
Srikkanth’s 95 against Sri
Lanka in 1982
Siddhu’s 134 against
England in Gwalior, 1993
Sachin’s 200 against South
Africa in 2010
Dhoni’s 198 against Sri
Lamka in 2005
Sachin’s 134 against
Australia in 1998
Ganguly’s 183 against Sri
Lanka in 1999
Sehwag’s 146 against Sri
Lanka in 2009
Yusuf Pathan’s 123 against New
Zealand in 2010
Kohli’s 107 against
Engalnd in 2011
Segmenting India Geo-demographically
Previously, the client was treating contiguous regions as a
homogenous entity, from a channel content perspective.
To deliver targeted content, we divided India into 6 clusters
based on their demographic behavior. Specifically, three
composite indices were created based on the economic
development lifecycle:
• Education (literacy, higher education) that leads to...
• Skilled jobs (in mfg. or services) that leads to...
• Purchasing power (higher income, asset ownership)
Districts were divided (at the average cut-off) by:
Offering targeted content to these clusters will reach a
more homogenous demographic population.
Skilled
Poorer Richer
Unskilled Skilled
Uneducated Educated Uneducated Educated
Unskilled
Purchasing power
Skilled jobs
Education
Poor Breakout Aspirant Owner Business Rich
Poor
Rural, uneducated agri
workers. Young population
with low income and asset
ownership. Mostly in Bihar,
Jharkhand, UP, MP.
Breakout
Rural, educated agri workers
poised for skilled labor.
Higher asset ownership.
Parts of UP, Bihar, MP.
Aspirant
Regions with skilled labor
pools but low purchasing
power. Cusp of economic
development. Mostly WB,
Odisha, parts of UP
Owner
Regions with unskilled labor
but high economic prosperity
(landlords, etc..) Mostly AP,
TN, parts of Karnataka,
Gujarat
Business
Lower education but working
in skilled jobs, and
prosperous. Typical of
business communities. Parts
of Gujarat, TN, Urban UP,
Punjab, etc.
Rich
Urban educated
population
working in skilled
jobs. All metros,
large cities, parts
of Kerala, TN
The 6 clusters are
This is a dataset (1975 – 1990) that
has been around for several years and
has been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
• Are birthdays uniformly distributed?
• Do doctors or parents exercise the C-section option to move dates?
• Is there any day of the month that has unusually high or low births?
• Are there any months with relatively high or low births?
Very high births in September.
But this is fairly well known. Most
conceptions happen during the
winter holiday season
Relatively few births during the
Christmas and Thanksgiving
holidays, as well as New Year
and Independence Day.
Most people prefer not
to have children on the
13th of any month, given
that it’s an unlucky day
Some special days like April
Fool’s day are avoided, but
Valentine’s Day is quite
popular
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
Let’s look at 15 years of US Birth Data
Education
LINK
Fraud
The pattern in India is quite different
This is a birth date dataset that’s
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
• Is there an aversion to the 13th or is there a local cultural nuance?
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Very few children are born in the
month of August, and thereafter.
Most births are concentrated in
the first half of the year
We see a large number of
children born on the 5th, 10th,
15th, 20th and 25th of each month
– that is, round numbered dates
Such round numbered patterns a
typical indication of fraud. Here,
birthdates are brought forward to
aid early school admission
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
Education
LINK
Fraud
This adversely impacts children’s marks
It’s a well-established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of
the month tend to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,
due to a higher proportion of younger children
Education
LINK
Fraud
European brewery identified €15 m cost savings after consolidating vendors
26
A leading European brewery’s plants
purchased commodity raw materials from
several vendors each – and had low volume
discounts.
Plants also placed multiple orders placed
every week, leading to higher logistics cost.
Gramener built a custom analytics solution
that showed how each plant performed
compared to peers – shaming those with
poor performance.
With this, they identified savings of €15 m
— which the plant managers couldn’t refute.
€15 m 40%
savings potential identified
annually
vendor based reduction
identified
Global airline reduced cargo turnaround time by 15% with scenario modeling
27
SEE LIVE DEMO
A global airline company took up a service
level agreement to deliver cargo from the
flight to the warehouse in under 1.5 hours.
This target was 15% lower than their current
best.
Several factors affect cargo delay across
airports. Availability of forklifts, staff size,
cargo type, part shipment, and many others.
Altering any of these is expensive and takes
long.
Gramener built a visual analytics solution
that showed where cargo was delayed.
This allowed the airline to reduce the
turnaround time by 15% from 1.76 hrs to 1.5
hrs. The worst-case turnaround time also
reduced by 34% from 2.9 hrs to 1.92 hrs.
15% 34%
cargo turnaround time
reduction (from 1.76 to 1.5 hrs)
reduction in worst-case
turnaround time
Evening Morning Night
Fri Mon Sat Sun Thu Tue Wed
FAH N70 RPP TDS ZDH
20-40% 40-60% 60-80% <20% Full
Recovery times are neutral during the evening and morning shifts (mornings are slightly worse), night times are the best.
Recovery times are worst on Fridays, and best on Saturdays & Wednesdays.
Specifically, Friday mornings are particularly bad. So are Thursday mornings.
The FAH product category has the best recovery time, while ZDH is much worse.
However, RPP on Sundays is unusually slow.
Part shipped products tend to perform worse than full-shipments. Specifically the <20% and 40-60% part-shipments.
This is especially problematic for ZDH
Product category
Part shipment
Weekday
Shift
This slide is best viewed in slideshow mode. The animations tell a story that isn’t obvious on the static version.
An energy utility detected billing fraud
This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the
number of customers with a customers with a specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than
someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million subscribers)
had 10 years worth of customer billing data
available.
Most fraud detection software failed to load the
data, and sampled data revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their usage very
carefully, and turn of their lights and fans the instant
their usage hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.
This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the tariff slab boundaries.
This clearly shows collusion of some
form with the customers.
Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
217 219 200 200 200 200 200 200 200 350 200 200
250 200 200 200 201 200 200 200 250 200 200 150
250 150 150 200 200 200 200 200 200 200 200 150
150 200 200 200 200 200 200 200 200 200 200 50
200 200 200 150 180 150 50 100 50 70 100 100
100 100 100 100 100 100 100 100 100 100 110 100
100 150 123 123 50 100 50 100 100 100 100 100
0 111 100 100 100 100 100 100 100 100 50 50
0 100 27 100 50 100 100 100 100 100 70 100
1 1 1 100 99 50 100 100 100 100 100 100
This happens with specific
customers, not randomly. Here are
such customers’ meter readings.
Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%
Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%
Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%
Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%
Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%
Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%
Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%
Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%
Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
If we define the “extent of fraud” as
the percentage excess of the 100
unit
meter reading, the value varies
considerably across sections,
and time
New section
manager arrives
… and is
transferred out
… with some explainable
anomalies.
Why would
these happen?
“So how can I make data stories?”
30
You have data.
You have analysis.
Now what?
Understanding the audience & intent
Finding insights
Storylining
Designing data stories
Understanding
audience & intent
Step 1
Understanding the audience & intent
Finding insights
Storylining
Designing data stories
DO IT: Who might be an audience for your
analysis?
• Lookback at your recent analytics project.
• Who do you know that can use this analysis?
(Come up with a real or hypothetical personas)
CHECK IT: Verify these yourself
 Is there a name for the individual?
 Was the role specific enough? (Head of sales
instead of just executive)
The same data analysis can be relevant to
many people — each group is called persona.
• The trends in sales data for an organization is
relevant for a CEO, head of sales, region leads,
individual sales team member & every
employee.
• The analysis of polio cases in UP is relevant to
the Minister of health, polio campaign manager,
field workers, NGOs, journalists & general
public.
This section will dive deeper into defining a persona
Define your audience, they determine the story
DO IT: Start with your own hypothesis
• Pick one of the personas you had listed earlier.
• What problems do you think your persona is facing?
• How do you feel the persona will use the analysis?
• Frame it as a user scenario.
CHECK IT: Verify user scenario with a partner
 Is it framed as “As a [persona], I’m in [situation]
where I face [problem], leading to
[consequence]. Solving it by [action] leads to
[impact]”
 Would the persona relate to this user scenario if
they heard it?
List scenario(s) for each persona
For each persona, answer the following questions:
1. What situation are they currently in?
2. What problems do they face?
3. What is the consequence?
4. What action can they need to take using your analysis?
5. What is the impact of this action?
Combine these as a user scenario:
“As a [persona], I’m in [situation] where I face [problem], leading
to [consequence]. Solving it by [action] leads to [impact]”
• John: As a Marketing manager, I have to create region-wise
budget for the next quarter. I don’t know which regions give the
highest RoI, so my spend isn’t optimized. Solving it by prioritizing
the region will lead to maximum ROI.
Clear needs & future scenario leads to effective communication.
Know your audience’s needs, that helps align the message
Reference: SPIN Selling by Neil Rackham
Here are three examples in real life
35
Purchasing Commodities Cargo Delay Customer Churn
Person, Role Adam, the purchasing head of a
leading European brewery
Cris, the operations head of a
leading US airline
Ravi, the marketing manager of
an Asian telecom company
Situation had plants that purchased
commodities from several vendors.
Discounts were low. Number of
weekly orders were high.
had an SLA to deliver cargo from
the flight to the warehouse in under
1.5 hours – 15% lower than their
current best performance.
Found that the cost of replacing
customers was thrice the cost of
retention.
Problem But he didn’t know which plants
and commodities were a problem.
Every plant denied it.
But she didn’t know what were the
biggest drivers of this delay –
people, assets, or type of cargo.
But he didn’t know which
customers to make offers to in
order to retain them.
Action By consolidating vendors and
reducing order frequency,
By adding resources only to the
largest levers of delay,
By predicting which customer was
likely to churn,
Impact they could increase their discounts
and reduce logistics cost.
she could reduce turnaround time
with the lowest spend.
they could tailor a retention offer
and reduce re-acquisition cost.
Finding Insights
Step 2
Understanding the audience & intent
Finding insights
Storylining
Designing data stories
Insights must be Big, Useful, and Surprising
Filter the analyses using these as a checklist
IS THE INSIGHT
BIG
IS THE INSIGHT
USEFUL
IS THE INSIGHT
SURPRISING
The analysis must, of course, be statistically significant.
But it should also be numerically significant.
We want a result that substantially changes the outcome.
What should the audience do after hearing the insight?
Can they take an action that improves their objective?
Even if it’s informational, what should they do next?
Is this something they didn’t know? Is it non-obvious?
Does it overturn a domain-driven belief or a gut feel?
Or does it bring consensus to a group with divided opinion?
Marking each analysis as Big, Useful or Surprising (High, Medium, Low)
38
Only those that are high or medium on all aspects are insights
Insights Big Useful Surprising
Twice as many Detractors talk about our Product’s ease of use. Low Medium High
Typing with capitalization in a credit application indicates creditworthiness Low Low High
Almost 20% of all voice search queries are triggered by just 25 words Low High Medium
More engaged employees have fewer accidents Low High Low
About 50% of American small businesses do not have a website High Medium Low
The recommendation system influences about 80% of content streamed on
Netflix
Big Low Low
Storylining
Step 3
Understanding the audience & intent
Finding insights
Storylining
Designing data stories
A business storyline
• Our NPS improved 6%
• It was 34% in 4Q18. Now it’s at 40% in 2Q19
• Despite lower satisfaction with our Support,
our NPS grew
• This increase in NPS was mainly due to better
Product Quality & Research
Gladiator’s storyline
• The Emperor asks General Maximus to take
control of Rome and give it back to people
• The ambitious Prince murders the emperor.
• Maximus is sold as a gladiator slave. His family
is murdered
• Maximus grows famous, fights the Prince in the
arena, and wins
• He joins his family in death. Rome is in the
hands of the people
Outlines are the backbone on which you flesh out your story.
This section explains how to create storylines
Storylines are plot outlines. They summarize the entire story
Notice “characters” in red. All stories
have characters, human or otherwise.
40
DO IT: Write your takeaway as one sentence
What’s the one thing you want the audience to
remember from your story?
What’s the one message that the audience
should take away?
CHECK IT: Verify these yourself
 Is it a single, complete, sentence?
 Does it deliver what you want the audience to
remember?
 Will your audience care a lot about this?
Close your eyes. Think of a childhood tale.
Summarize the moral of the story in one line
We easily we remember these stories and their
summary as a moral several years later.
Close your eyes. Think of a business
presentation from last week. Can you easily
summarize the message in one line?
Stories are designed around a moral. A single
takeaway. An “elevator pitch”
It’s a one-sentence summary of the most important message for the audience.
1. Start with the takeaway (The elevator pitch. The moral of the story.)
41
DO IT: Write your supporting analysis
1. List all possible analysis
2. Re-word them as sentences
3. Strike off what’s not relevant
CHECK IT: Verify these yourself
 Is each necessary? Does each analysis
support the takeaway?
 Are they sufficient? Do the analyses prove
the takeaway?
What supports your takeaway from
“The Lion and the Mouse”?
http://www.read.gov/aesop/007.html
 The lion was an Asiatic lion
 The lion had a huge paw
 The lion spared the mouse it caught
 The lion was caught by a hunter’s net
 It was stalking its prey when it got caught
 The mouse was nibbling grass nearby
 The mouse took few minutes to cut the net
Only include analysis that proves the takeaway.
Ensure that they fully prove the takeaway.
2. Find analysis that supports your takeaway. Ignore irrelevant content
There’s no right or wrong answer. Think
about how it supports your takeaway.
42
3. Convert analysis into messages by adding context
43
DO IT: Add context to your analysis
1. Take each relevant analysis
2. Convert it to a message for the audience by
adding context
CHECK IT: Verify these yourself
 Will your audience understand the messages
without explanation?
 Will your audience understand why this
message is relevant?
Analysis doesn’t mean anything to people. When
it does, it’s a message. We do this by adding
context. Three ways to add context are:
1. Compare with similar numbers.
Our $15 mn sales is $3 mn more than last
year, $1 mn below budget, and twice our
nearest competitors.
2. Explain with analogies.
If we stopped producing, it’ll take 3 months to
dispose our excess inventory of $2 mn.
3. Add business interpretation.
Usage is correlated with discounts. For every
$1 discount, customer LTV increases by $24.
Frame each analysis as a message that the audience will understand and find relevant
4. Structure the messages into a pyramid or a tree
44
Example of a business tree
Launch sales were 30% less than target due to
high competition
• Launch sales were projected at $20 mn in the
first month, but achieved only $14 mn
o Sales in every region were 20-50% lower.
o Only Philippines & Korea were on target
• Competitors discounted price by 35% - which
is unsustainable for them
o 80 store discounts increased from 15% to 35%
o The maximum sustainable discount is 20%
• Stores offered higher discounts saw less than
20% of our target sales
Construct a pyramid or tree-like outline
• Start with the takeaway at the root of the tree
• Add a message that supports the takeaway
• Add further details or supporting messages
• Messages must prove the first message, and
only the first message
• Strike off any message that isn’t required to
prove or support the takeaway
• Add next message that supports takeaway
• Add details to prove the second message
• Remaining messages for the takeaway
• Add details as required
Arrange messages hierarchically to prove & support the parent message
4. Structure the messages into a pyramid or a tree
Conventional approach is to explain how we did
the analysis & found the insight
Insight is lost in the set of slides, takes too long to
reach to the first insight.
Instead, start with insight first, and then take the
audience through arguments to support it.
Starts with the main message, and then answers
why & how the insight makes sense.
Title
Analysis
section 1
Methodology Insight
Analysis
section 2
Methodology Insight
Insight that answers
a business question
Supporting
argument 1
Methodology
Supporting
reference
Supporting
argument 2
Methodology
Supporting
reference
5. Re-order the messages to increase memorability and motivation
46
Order messages into an emotionally contrasting,
motivating sequence.
Take this aspects-based flow:
• Our profits doubled. But our sales only grew
20%. Our gross margins stayed flat.
The “emotional arc” is falling,
and not motivating.
Here’s the same message re-ordered:
• Our sales grew mildly at 20%. Our margins
didn’t improve at all. But our profits doubled!
This emotional arc falls before
rising. This is more motivating.
Structure your supporting messages into a
memorable flow. Here are 7 flows that help:
1. Time: e.g. Past, Present, Future
Sales was $15 mn. Now it’s $18 mn. We
expect it to grow to $20 mn.
2. Place: e.g. NA, EU, APAC
3. Aspects: e.g. company, competition, context
4. Benefits: e.g. better, faster, cheaper
5. Scale: e.g. local, regional, global
6. Balance: e.g. pros, cons
7. Priority or climactic: least to most important
Remember: Emotional contrast requires bad news – it makes good look better
Designing
data stories
Step 4
Understanding the audience & intent
Finding insights
Storylining
Designing data stories
How the data should be interpreted decides the type of chart to be used
48
https://gramener.github.io/visual-vocabulary-vega/
Deviation
Change-
over-Time
Spatial Ranking
Correlation
Part-to-
Whole
Flow
Magnitude
Distribution
We use visual design cues to support our annotations & message
49
• Pre-attentive processing drives our attention
towards certain elements more than others.
• We can leverage these to highlight aspects
of the chart that are relevant to our story.
• For ex, when listing a set of countries, if the
relevant insight is for one country, we can
make it stand out as below:
Position is the most powerful encoding.
The eye and brain are naturally wired to detect mis-
alignment of the smallest order
1
Colour, when used in context, is powerful.
We can detect miniscule changes or variations in colour
when comparing an element with neighbouring elements.
This is what makes true colour (32-pixel colour, i.e. 4
billion) a necessity in computer graphics
2
Size is a useful differentiator.
The eye can detect moderate size variations at
moderate distances. Size also has a natural
interpretation: that of priority.
3
Several other
encodings are possible
Aesthetics such as angle,
shadows, shapes, patterns,
density, labelling, enclosures,
etc. can each be used to map
data.
4
Class Xth English Marks Distribution
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
4 type of annotations help the audience understand your intent
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Marks
# students
Teachers add marks to stop some students from failing
This chart shows Class 10 students’ English marks in Tamil Nadu, India, in
2011. The X-axis has the mark a student has scored. The Y-axis has the # of
students who scored that mark.
This is a bell curve. But the spike at 35 (the mark at which students pass) is
unusual. Teachers must be adding marks to some of the students who are
likely to fail by a small margin.
Large number of students score
exactly 35 marks
Few (but not 0) students score
between 30-35
What’s unusual
Large number of students
score 35 marks.
Few (but not 0) students score
between 30-35
Only some students get this benefit.
Identify a fair policy that will be applied consistently.
Summarize the chart in its title
Don’t describe the chart.
Don’t write the question to answer.
Write the answer itself. Like a headline.
Explain the chart
How should the user read it?
What do you say when you talk through it?
Explain what the visual is. Then the axes. Then
its contents. Then the inference.
Recommend an action
How should I act on this?
You need to change the audience.
(Otherwise, you made no difference.)
Highlight essential elements
What should the user focus their eyes on?
Point it out.
Interpret what they’re seeing – in words.
Communicating data analysis is not enough
This is a chart in a dashboard for state Health Department
Some users of the dashboard know how to interpret this well.
What do you think this dashboard says?
Please take a minute and share your responses.
Annotations convert visuals in data stories that are clearer
4 out of the 8 aspirational Districts in UP moved up into the
moderate performance group since last year
The table below shows all 8 aspirational districts in
Uttar Pradesh. “Rank” shows the district’s rank on
overall performance. “Monthly “ show the composite
index on May’19. “FY value” shows the YTD value.
Sonbhadra, Fatehpur, Balrampur, and Chitrakoot
have steadily improved performance over last 6 months
by ~30% between Dec 2018 and May 2019. They are
now up into the moderate performance group.
These four
aspirational
districts moved
from the poor
to the moderate
performance
group.
Heading
as a summary
Explanatio
n
of the visual
Highlights
of the key points
Takeaway
with actions for the
user
Action: Explore why these 4 districts improved more than the others.
Apply the learnings to other aspirational districts
Insights and Story telling approach
54
Stage 1- Identify
Business Problem
Define the problem
statement by understanding:
• What is the basic need
and desired outcome?
• Who will benefit?
• What is the impact?
• What is the success
criteria?
Stage 2- Translate to Data
Problem
• Breakdown the problem
statement into multiple use-
cases
• Connect each use case with
a data set
• Understand any limitations
on data sources- Internal
and External?
Stage 4- Translate to
Business Answer
• Stitch insights from
individual use case to
create a story
• Connect data story to help
in better decision making
• Measure success
Stage 3- Data Answer
Target each use case with
data through:
• EDA and transformation
• Modelling
• Generating insights
• Sales Rep
• Data Consultant
• Account Manager
• Solution Lead
• Analyst Lead
• Data Consultant
• Account Manager
• Solution Architect
• Solution Lead
• Analyst Lead
• Data Consultant
• Data Scientist
• Solution Architect
• Solution Lead
• Data Consultant
• Account Manager
• Solution Lead
8 super spreaders are responsible for 2/3rd of all suspected cases in the district
55
This visual represents the Covid-19 suspects in
a district
Each dot is a suspect. Red tested positive,
Green negative, Grey awaiting results and Blue
not tested yet
Contact tracing for 40 positive cases resulted in
~1,400 suspected cases, an average of 35
contacts per person
Top 8 super spreaders are responsible for two-
thirds of all suspected cases
Of these 1,400 suspected cases, test results
are waited for roughly 5% of the cases.
Among the declared results 11% are positive
One positive patient also came in contact with
11 people from another district
“Most of us need to listen to the music
to understand how beautiful it is.
But often that’s how we present statistics: we
just show the notes,
we don’t play the music.”
— Hans Rosling
Introduction to Data Storytelling by Anand Madhav
Thank You!
Anand Madhav
@classicwild
/anandmadhav
Share feedback at
https://bit.ly/3h4BdYc
THANK YOU

Mais conteúdo relacionado

Mais de Gramener

5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX ProgramGramener
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceGramener
 
Recession Proofing With Data : Webinar
Recession Proofing With Data : WebinarRecession Proofing With Data : Webinar
Recession Proofing With Data : WebinarGramener
 
Engage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarEngage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarGramener
 
Structure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesStructure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesGramener
 
Dawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarDawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarGramener
 
5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : WebinarGramener
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - WebinarGramener
 
Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Gramener
 
Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Gramener
 
The Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceThe Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceGramener
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesGramener
 
The value of storytelling through data
The value of storytelling through dataThe value of storytelling through data
The value of storytelling through dataGramener
 
Humanizing Data Storytelling for Greater Business Impact
Humanizing Data Storytelling for Greater Business ImpactHumanizing Data Storytelling for Greater Business Impact
Humanizing Data Storytelling for Greater Business ImpactGramener
 
Data and Storytelling | What Now?
Data and Storytelling | What Now?Data and Storytelling | What Now?
Data and Storytelling | What Now?Gramener
 
Data & Storytelling - What Now?
Data & Storytelling  - What Now? Data & Storytelling  - What Now?
Data & Storytelling - What Now? Gramener
 
Introduction to Data Storytelling | Rasagy Sharma - Gramener
Introduction to Data Storytelling | Rasagy Sharma - GramenerIntroduction to Data Storytelling | Rasagy Sharma - Gramener
Introduction to Data Storytelling | Rasagy Sharma - GramenerGramener
 
How AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeHow AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeGramener
 
The ultimate guide to data storytelling | Materclass
The ultimate guide to data storytelling | MaterclassThe ultimate guide to data storytelling | Materclass
The ultimate guide to data storytelling | MaterclassGramener
 
Recession-proofing your business with data
Recession-proofing your business with dataRecession-proofing your business with data
Recession-proofing your business with dataGramener
 

Mais de Gramener (20)

5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad Performance
 
Recession Proofing With Data : Webinar
Recession Proofing With Data : WebinarRecession Proofing With Data : Webinar
Recession Proofing With Data : Webinar
 
Engage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarEngage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: Webinar
 
Structure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesStructure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best Outcomes
 
Dawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarDawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - Webinar
 
5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 
Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020
 
Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)
 
The Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceThe Art of Storytelling Using Data Science
The Art of Storytelling Using Data Science
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to Stories
 
The value of storytelling through data
The value of storytelling through dataThe value of storytelling through data
The value of storytelling through data
 
Humanizing Data Storytelling for Greater Business Impact
Humanizing Data Storytelling for Greater Business ImpactHumanizing Data Storytelling for Greater Business Impact
Humanizing Data Storytelling for Greater Business Impact
 
Data and Storytelling | What Now?
Data and Storytelling | What Now?Data and Storytelling | What Now?
Data and Storytelling | What Now?
 
Data & Storytelling - What Now?
Data & Storytelling  - What Now? Data & Storytelling  - What Now?
Data & Storytelling - What Now?
 
Introduction to Data Storytelling | Rasagy Sharma - Gramener
Introduction to Data Storytelling | Rasagy Sharma - GramenerIntroduction to Data Storytelling | Rasagy Sharma - Gramener
Introduction to Data Storytelling | Rasagy Sharma - Gramener
 
How AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeHow AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take Notice
 
The ultimate guide to data storytelling | Materclass
The ultimate guide to data storytelling | MaterclassThe ultimate guide to data storytelling | Materclass
The ultimate guide to data storytelling | Materclass
 
Recession-proofing your business with data
Recession-proofing your business with dataRecession-proofing your business with data
Recession-proofing your business with data
 

Último

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Último (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Intro to Data Storytelling

  • 2. 2
  • 3. 3
  • 4. 4 Soho, London - Today Soho, London – 1853/54
  • 5. Cholera deaths in London - 1854 5 Water Pump at Broad Street, London
  • 6. We’ve been telling stories with data for a long time 6 How a nurse changed the course of a war using data storytelling. Created by Florence Nightingale for Queen Victoria during England’s war with France. Visualizes deaths due to: Red: War wounds Black: Other war-related causes Blue: Avoidable hospital diseases
  • 7. “Can’t I just show the data?” “Why do I need visuals?”
  • 8. How many numbers are above 100? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  • 9. How many numbers are below 10? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  • 10. Which quadrant has the highest total? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  • 11. Answer them again, with design changes… 11
  • 12. 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79 How many numbers are above 100?
  • 13. How many numbers are below 10? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  • 14. Which quadrant has the highest total? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  • 15. Visually representing data helps us to see patterns in the data quickly “The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey 15 Datasauras dataset, animated by Autodesk Research
  • 16. With the growth of self-service BI, 85% of companies have lost track of how many dashboards they generated What QUESTION does the dashboard answer? Is the ANSWER evident from the dashboard? What ACTION should the user take now? BUT 3 THINGS ARE UNCLEAR ON MOST DASHBOARDS 16
  • 17. Stories have a huge impact on humans 17 Storytelling has a 30X Return on Investment Rob Walker and Joshua Glenn auctioned common items like mugs, golf balls, toys, etc. The item descriptions were stories purpose-written by 200+ contributing writers. Items that were bought for $250 sold for over $8,000 – a return of over 3,000% for storytelling! Stories are memorable and viral People remember stories. They’ll act on them. People share stories. That enables collective action. We analyze data to improve people’s decision making. For this to be effective, data stories are needed more than ever before. http://significantobjects.com/
  • 18. Data alone can be useless and misleading without context
  • 19. Data storytelling is a critical skill for data scientists, analysts & managers 19 Share your data & analysis as data stories Whenever you share inferences from data – whether it’s as a presentation, or an email or document with your analysis, or as a dashboard – craft it as a story. Today we will look at a few engaging data stories and find out how to convert an analysis into a memorable story – even if you’ve never told a story before. But analysts present their work, not their message Data scientists present their analysis – what they did, and what they found. That’s not what the audience needs. Audiences need a message that tells them what to do, and why. Told in an engaging way. As a story.
  • 20. New digital tools allow us to do this at scale
  • 21. New digital tools allow us to do this at scale Sachin R Tendulkar Sourav C Ganguly Rahul Dravid Mohammad Azharuddin Yuvraj Singh Virender Sehwag Mahendra S Dhoni Alaysinhji D Jadeja Navjot S Sidhu Gautam Gambhir Krishnamachari Srikkanth Kapil Dev Dillip B Vengsarkar Suresh K Raina Ravishankar J Shastri Sunil M Gavaskar Mohammad Kaif Virat Kohli Vinod G Kambli Vangipurappu V S Laxman Rabindra R Singh Sanjay V Manjrekar Mohinder Amarnath Manoj M Prabhakar Rohit G Sharma Irfan K Pathan Nayan R Mongia Ajit B Agarkar Dinesh Mongia Harbhajan Singh Krishna K D Karthik Sandeep M Patil Anil Kumble Yashpal Sharma Javagal Srinath Hemang K Badani Yusuf K Pathan Robin V Uthappa Raman Lamba Zaheer Khan Ravindra A Jadeja Pathiv A Patel Sadagopan Ramesh Roger M H Binny Woorkeri V Raman Sunil B Joshi Kiran S More Praveen K Amre Ashok Malhotra Chetan Sharma Kapil Dev’s 175 against Zimbabwe in 1983 Gavaskar’s 107 against New Zealand in 1987 Srikkanth’s 95 against Sri Lanka in 1982 Siddhu’s 134 against England in Gwalior, 1993 Sachin’s 200 against South Africa in 2010 Dhoni’s 198 against Sri Lamka in 2005 Sachin’s 134 against Australia in 1998 Ganguly’s 183 against Sri Lanka in 1999 Sehwag’s 146 against Sri Lanka in 2009 Yusuf Pathan’s 123 against New Zealand in 2010 Kohli’s 107 against Engalnd in 2011
  • 22. Segmenting India Geo-demographically Previously, the client was treating contiguous regions as a homogenous entity, from a channel content perspective. To deliver targeted content, we divided India into 6 clusters based on their demographic behavior. Specifically, three composite indices were created based on the economic development lifecycle: • Education (literacy, higher education) that leads to... • Skilled jobs (in mfg. or services) that leads to... • Purchasing power (higher income, asset ownership) Districts were divided (at the average cut-off) by: Offering targeted content to these clusters will reach a more homogenous demographic population. Skilled Poorer Richer Unskilled Skilled Uneducated Educated Uneducated Educated Unskilled Purchasing power Skilled jobs Education Poor Breakout Aspirant Owner Business Rich Poor Rural, uneducated agri workers. Young population with low income and asset ownership. Mostly in Bihar, Jharkhand, UP, MP. Breakout Rural, educated agri workers poised for skilled labor. Higher asset ownership. Parts of UP, Bihar, MP. Aspirant Regions with skilled labor pools but low purchasing power. Cusp of economic development. Mostly WB, Odisha, parts of UP Owner Regions with unskilled labor but high economic prosperity (landlords, etc..) Mostly AP, TN, parts of Karnataka, Gujarat Business Lower education but working in skilled jobs, and prosperous. Typical of business communities. Parts of Gujarat, TN, Urban UP, Punjab, etc. Rich Urban educated population working in skilled jobs. All metros, large cities, parts of Kerala, TN The 6 clusters are
  • 23. This is a dataset (1975 – 1990) that has been around for several years and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known. For example, • Are birthdays uniformly distributed? • Do doctors or parents exercise the C-section option to move dates? • Is there any day of the month that has unusually high or low births? • Are there any months with relatively high or low births? Very high births in September. But this is fairly well known. Most conceptions happen during the winter holiday season Relatively few births during the Christmas and Thanksgiving holidays, as well as New Year and Independence Day. Most people prefer not to have children on the 13th of any month, given that it’s an unlucky day Some special days like April Fool’s day are avoided, but Valentine’s Day is quite popular More births Fewer births … on average, for each day of the year (from 1975 to 1990) Let’s look at 15 years of US Birth Data Education LINK Fraud
  • 24. The pattern in India is quite different This is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns. For example, • Is there an aversion to the 13th or is there a local cultural nuance? • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Very few children are born in the month of August, and thereafter. Most births are concentrated in the first half of the year We see a large number of children born on the 5th, 10th, 15th, 20th and 25th of each month – that is, round numbered dates Such round numbered patterns a typical indication of fraud. Here, birthdates are brought forward to aid early school admission More births Fewer births … on average, for each day of the year (from 2007 to 2013) Education LINK Fraud
  • 25. This adversely impacts children’s marks It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children Education LINK Fraud
  • 26. European brewery identified €15 m cost savings after consolidating vendors 26 A leading European brewery’s plants purchased commodity raw materials from several vendors each – and had low volume discounts. Plants also placed multiple orders placed every week, leading to higher logistics cost. Gramener built a custom analytics solution that showed how each plant performed compared to peers – shaming those with poor performance. With this, they identified savings of €15 m — which the plant managers couldn’t refute. €15 m 40% savings potential identified annually vendor based reduction identified
  • 27. Global airline reduced cargo turnaround time by 15% with scenario modeling 27 SEE LIVE DEMO A global airline company took up a service level agreement to deliver cargo from the flight to the warehouse in under 1.5 hours. This target was 15% lower than their current best. Several factors affect cargo delay across airports. Availability of forklifts, staff size, cargo type, part shipment, and many others. Altering any of these is expensive and takes long. Gramener built a visual analytics solution that showed where cargo was delayed. This allowed the airline to reduce the turnaround time by 15% from 1.76 hrs to 1.5 hrs. The worst-case turnaround time also reduced by 34% from 2.9 hrs to 1.92 hrs. 15% 34% cargo turnaround time reduction (from 1.76 to 1.5 hrs) reduction in worst-case turnaround time Evening Morning Night Fri Mon Sat Sun Thu Tue Wed FAH N70 RPP TDS ZDH 20-40% 40-60% 60-80% <20% Full Recovery times are neutral during the evening and morning shifts (mornings are slightly worse), night times are the best. Recovery times are worst on Fridays, and best on Saturdays & Wednesdays. Specifically, Friday mornings are particularly bad. So are Thursday mornings. The FAH product category has the best recovery time, while ZDH is much worse. However, RPP on Sundays is unusually slow. Part shipped products tend to perform worse than full-shipments. Specifically the <20% and 40-60% part-shipments. This is especially problematic for ZDH Product category Part shipment Weekday Shift This slide is best viewed in slideshow mode. The animations tell a story that isn’t obvious on the static version.
  • 28. An energy utility detected billing fraud This plot shows the frequency of all meter readings from Apr- 2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Most fraud detection software failed to load the data, and sampled data revealed little or no insight. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
  • 29. This plot shows the frequency of all meter readings from Apr- 2010 to Mar-2011. An unusually large number of readings are aligned with the tariff slab boundaries. This clearly shows collusion of some form with the customers. Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 217 219 200 200 200 200 200 200 200 350 200 200 250 200 200 200 201 200 200 200 250 200 200 150 250 150 150 200 200 200 200 200 200 200 200 150 150 200 200 200 200 200 200 200 200 200 200 50 200 200 200 150 180 150 50 100 50 70 100 100 100 100 100 100 100 100 100 100 100 100 110 100 100 150 123 123 50 100 50 100 100 100 100 100 0 111 100 100 100 100 100 100 100 100 50 50 0 100 27 100 50 100 100 100 100 100 70 100 1 1 1 100 99 50 100 100 100 100 100 100 This happens with specific customers, not randomly. Here are such customers’ meter readings. Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109% Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54% Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34% Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14% Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15% Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33% Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14% Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17% Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11% If we define the “extent of fraud” as the percentage excess of the 100 unit meter reading, the value varies considerably across sections, and time New section manager arrives … and is transferred out … with some explainable anomalies. Why would these happen?
  • 30. “So how can I make data stories?” 30
  • 31. You have data. You have analysis. Now what? Understanding the audience & intent Finding insights Storylining Designing data stories
  • 32. Understanding audience & intent Step 1 Understanding the audience & intent Finding insights Storylining Designing data stories
  • 33. DO IT: Who might be an audience for your analysis? • Lookback at your recent analytics project. • Who do you know that can use this analysis? (Come up with a real or hypothetical personas) CHECK IT: Verify these yourself  Is there a name for the individual?  Was the role specific enough? (Head of sales instead of just executive) The same data analysis can be relevant to many people — each group is called persona. • The trends in sales data for an organization is relevant for a CEO, head of sales, region leads, individual sales team member & every employee. • The analysis of polio cases in UP is relevant to the Minister of health, polio campaign manager, field workers, NGOs, journalists & general public. This section will dive deeper into defining a persona Define your audience, they determine the story
  • 34. DO IT: Start with your own hypothesis • Pick one of the personas you had listed earlier. • What problems do you think your persona is facing? • How do you feel the persona will use the analysis? • Frame it as a user scenario. CHECK IT: Verify user scenario with a partner  Is it framed as “As a [persona], I’m in [situation] where I face [problem], leading to [consequence]. Solving it by [action] leads to [impact]”  Would the persona relate to this user scenario if they heard it? List scenario(s) for each persona For each persona, answer the following questions: 1. What situation are they currently in? 2. What problems do they face? 3. What is the consequence? 4. What action can they need to take using your analysis? 5. What is the impact of this action? Combine these as a user scenario: “As a [persona], I’m in [situation] where I face [problem], leading to [consequence]. Solving it by [action] leads to [impact]” • John: As a Marketing manager, I have to create region-wise budget for the next quarter. I don’t know which regions give the highest RoI, so my spend isn’t optimized. Solving it by prioritizing the region will lead to maximum ROI. Clear needs & future scenario leads to effective communication. Know your audience’s needs, that helps align the message Reference: SPIN Selling by Neil Rackham
  • 35. Here are three examples in real life 35 Purchasing Commodities Cargo Delay Customer Churn Person, Role Adam, the purchasing head of a leading European brewery Cris, the operations head of a leading US airline Ravi, the marketing manager of an Asian telecom company Situation had plants that purchased commodities from several vendors. Discounts were low. Number of weekly orders were high. had an SLA to deliver cargo from the flight to the warehouse in under 1.5 hours – 15% lower than their current best performance. Found that the cost of replacing customers was thrice the cost of retention. Problem But he didn’t know which plants and commodities were a problem. Every plant denied it. But she didn’t know what were the biggest drivers of this delay – people, assets, or type of cargo. But he didn’t know which customers to make offers to in order to retain them. Action By consolidating vendors and reducing order frequency, By adding resources only to the largest levers of delay, By predicting which customer was likely to churn, Impact they could increase their discounts and reduce logistics cost. she could reduce turnaround time with the lowest spend. they could tailor a retention offer and reduce re-acquisition cost.
  • 36. Finding Insights Step 2 Understanding the audience & intent Finding insights Storylining Designing data stories
  • 37. Insights must be Big, Useful, and Surprising Filter the analyses using these as a checklist IS THE INSIGHT BIG IS THE INSIGHT USEFUL IS THE INSIGHT SURPRISING The analysis must, of course, be statistically significant. But it should also be numerically significant. We want a result that substantially changes the outcome. What should the audience do after hearing the insight? Can they take an action that improves their objective? Even if it’s informational, what should they do next? Is this something they didn’t know? Is it non-obvious? Does it overturn a domain-driven belief or a gut feel? Or does it bring consensus to a group with divided opinion?
  • 38. Marking each analysis as Big, Useful or Surprising (High, Medium, Low) 38 Only those that are high or medium on all aspects are insights Insights Big Useful Surprising Twice as many Detractors talk about our Product’s ease of use. Low Medium High Typing with capitalization in a credit application indicates creditworthiness Low Low High Almost 20% of all voice search queries are triggered by just 25 words Low High Medium More engaged employees have fewer accidents Low High Low About 50% of American small businesses do not have a website High Medium Low The recommendation system influences about 80% of content streamed on Netflix Big Low Low
  • 39. Storylining Step 3 Understanding the audience & intent Finding insights Storylining Designing data stories
  • 40. A business storyline • Our NPS improved 6% • It was 34% in 4Q18. Now it’s at 40% in 2Q19 • Despite lower satisfaction with our Support, our NPS grew • This increase in NPS was mainly due to better Product Quality & Research Gladiator’s storyline • The Emperor asks General Maximus to take control of Rome and give it back to people • The ambitious Prince murders the emperor. • Maximus is sold as a gladiator slave. His family is murdered • Maximus grows famous, fights the Prince in the arena, and wins • He joins his family in death. Rome is in the hands of the people Outlines are the backbone on which you flesh out your story. This section explains how to create storylines Storylines are plot outlines. They summarize the entire story Notice “characters” in red. All stories have characters, human or otherwise. 40
  • 41. DO IT: Write your takeaway as one sentence What’s the one thing you want the audience to remember from your story? What’s the one message that the audience should take away? CHECK IT: Verify these yourself  Is it a single, complete, sentence?  Does it deliver what you want the audience to remember?  Will your audience care a lot about this? Close your eyes. Think of a childhood tale. Summarize the moral of the story in one line We easily we remember these stories and their summary as a moral several years later. Close your eyes. Think of a business presentation from last week. Can you easily summarize the message in one line? Stories are designed around a moral. A single takeaway. An “elevator pitch” It’s a one-sentence summary of the most important message for the audience. 1. Start with the takeaway (The elevator pitch. The moral of the story.) 41
  • 42. DO IT: Write your supporting analysis 1. List all possible analysis 2. Re-word them as sentences 3. Strike off what’s not relevant CHECK IT: Verify these yourself  Is each necessary? Does each analysis support the takeaway?  Are they sufficient? Do the analyses prove the takeaway? What supports your takeaway from “The Lion and the Mouse”? http://www.read.gov/aesop/007.html  The lion was an Asiatic lion  The lion had a huge paw  The lion spared the mouse it caught  The lion was caught by a hunter’s net  It was stalking its prey when it got caught  The mouse was nibbling grass nearby  The mouse took few minutes to cut the net Only include analysis that proves the takeaway. Ensure that they fully prove the takeaway. 2. Find analysis that supports your takeaway. Ignore irrelevant content There’s no right or wrong answer. Think about how it supports your takeaway. 42
  • 43. 3. Convert analysis into messages by adding context 43 DO IT: Add context to your analysis 1. Take each relevant analysis 2. Convert it to a message for the audience by adding context CHECK IT: Verify these yourself  Will your audience understand the messages without explanation?  Will your audience understand why this message is relevant? Analysis doesn’t mean anything to people. When it does, it’s a message. We do this by adding context. Three ways to add context are: 1. Compare with similar numbers. Our $15 mn sales is $3 mn more than last year, $1 mn below budget, and twice our nearest competitors. 2. Explain with analogies. If we stopped producing, it’ll take 3 months to dispose our excess inventory of $2 mn. 3. Add business interpretation. Usage is correlated with discounts. For every $1 discount, customer LTV increases by $24. Frame each analysis as a message that the audience will understand and find relevant
  • 44. 4. Structure the messages into a pyramid or a tree 44 Example of a business tree Launch sales were 30% less than target due to high competition • Launch sales were projected at $20 mn in the first month, but achieved only $14 mn o Sales in every region were 20-50% lower. o Only Philippines & Korea were on target • Competitors discounted price by 35% - which is unsustainable for them o 80 store discounts increased from 15% to 35% o The maximum sustainable discount is 20% • Stores offered higher discounts saw less than 20% of our target sales Construct a pyramid or tree-like outline • Start with the takeaway at the root of the tree • Add a message that supports the takeaway • Add further details or supporting messages • Messages must prove the first message, and only the first message • Strike off any message that isn’t required to prove or support the takeaway • Add next message that supports takeaway • Add details to prove the second message • Remaining messages for the takeaway • Add details as required Arrange messages hierarchically to prove & support the parent message
  • 45. 4. Structure the messages into a pyramid or a tree Conventional approach is to explain how we did the analysis & found the insight Insight is lost in the set of slides, takes too long to reach to the first insight. Instead, start with insight first, and then take the audience through arguments to support it. Starts with the main message, and then answers why & how the insight makes sense. Title Analysis section 1 Methodology Insight Analysis section 2 Methodology Insight Insight that answers a business question Supporting argument 1 Methodology Supporting reference Supporting argument 2 Methodology Supporting reference
  • 46. 5. Re-order the messages to increase memorability and motivation 46 Order messages into an emotionally contrasting, motivating sequence. Take this aspects-based flow: • Our profits doubled. But our sales only grew 20%. Our gross margins stayed flat. The “emotional arc” is falling, and not motivating. Here’s the same message re-ordered: • Our sales grew mildly at 20%. Our margins didn’t improve at all. But our profits doubled! This emotional arc falls before rising. This is more motivating. Structure your supporting messages into a memorable flow. Here are 7 flows that help: 1. Time: e.g. Past, Present, Future Sales was $15 mn. Now it’s $18 mn. We expect it to grow to $20 mn. 2. Place: e.g. NA, EU, APAC 3. Aspects: e.g. company, competition, context 4. Benefits: e.g. better, faster, cheaper 5. Scale: e.g. local, regional, global 6. Balance: e.g. pros, cons 7. Priority or climactic: least to most important Remember: Emotional contrast requires bad news – it makes good look better
  • 47. Designing data stories Step 4 Understanding the audience & intent Finding insights Storylining Designing data stories
  • 48. How the data should be interpreted decides the type of chart to be used 48 https://gramener.github.io/visual-vocabulary-vega/ Deviation Change- over-Time Spatial Ranking Correlation Part-to- Whole Flow Magnitude Distribution
  • 49. We use visual design cues to support our annotations & message 49 • Pre-attentive processing drives our attention towards certain elements more than others. • We can leverage these to highlight aspects of the chart that are relevant to our story. • For ex, when listing a set of countries, if the relevant insight is for one country, we can make it stand out as below: Position is the most powerful encoding. The eye and brain are naturally wired to detect mis- alignment of the smallest order 1 Colour, when used in context, is powerful. We can detect miniscule changes or variations in colour when comparing an element with neighbouring elements. This is what makes true colour (32-pixel colour, i.e. 4 billion) a necessity in computer graphics 2 Size is a useful differentiator. The eye can detect moderate size variations at moderate distances. Size also has a natural interpretation: that of priority. 3 Several other encodings are possible Aesthetics such as angle, shadows, shapes, patterns, density, labelling, enclosures, etc. can each be used to map data. 4
  • 50. Class Xth English Marks Distribution 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  • 51. 4 type of annotations help the audience understand your intent 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. Large number of students score exactly 35 marks Few (but not 0) students score between 30-35 What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the chart in its title Don’t describe the chart. Don’t write the question to answer. Write the answer itself. Like a headline. Explain the chart How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Highlight essential elements What should the user focus their eyes on? Point it out. Interpret what they’re seeing – in words.
  • 52. Communicating data analysis is not enough This is a chart in a dashboard for state Health Department Some users of the dashboard know how to interpret this well. What do you think this dashboard says? Please take a minute and share your responses.
  • 53. Annotations convert visuals in data stories that are clearer 4 out of the 8 aspirational Districts in UP moved up into the moderate performance group since last year The table below shows all 8 aspirational districts in Uttar Pradesh. “Rank” shows the district’s rank on overall performance. “Monthly “ show the composite index on May’19. “FY value” shows the YTD value. Sonbhadra, Fatehpur, Balrampur, and Chitrakoot have steadily improved performance over last 6 months by ~30% between Dec 2018 and May 2019. They are now up into the moderate performance group. These four aspirational districts moved from the poor to the moderate performance group. Heading as a summary Explanatio n of the visual Highlights of the key points Takeaway with actions for the user Action: Explore why these 4 districts improved more than the others. Apply the learnings to other aspirational districts
  • 54. Insights and Story telling approach 54 Stage 1- Identify Business Problem Define the problem statement by understanding: • What is the basic need and desired outcome? • Who will benefit? • What is the impact? • What is the success criteria? Stage 2- Translate to Data Problem • Breakdown the problem statement into multiple use- cases • Connect each use case with a data set • Understand any limitations on data sources- Internal and External? Stage 4- Translate to Business Answer • Stitch insights from individual use case to create a story • Connect data story to help in better decision making • Measure success Stage 3- Data Answer Target each use case with data through: • EDA and transformation • Modelling • Generating insights • Sales Rep • Data Consultant • Account Manager • Solution Lead • Analyst Lead • Data Consultant • Account Manager • Solution Architect • Solution Lead • Analyst Lead • Data Consultant • Data Scientist • Solution Architect • Solution Lead • Data Consultant • Account Manager • Solution Lead
  • 55. 8 super spreaders are responsible for 2/3rd of all suspected cases in the district 55 This visual represents the Covid-19 suspects in a district Each dot is a suspect. Red tested positive, Green negative, Grey awaiting results and Blue not tested yet Contact tracing for 40 positive cases resulted in ~1,400 suspected cases, an average of 35 contacts per person Top 8 super spreaders are responsible for two- thirds of all suspected cases Of these 1,400 suspected cases, test results are waited for roughly 5% of the cases. Among the declared results 11% are positive One positive patient also came in contact with 11 people from another district
  • 56. “Most of us need to listen to the music to understand how beautiful it is. But often that’s how we present statistics: we just show the notes, we don’t play the music.” — Hans Rosling Introduction to Data Storytelling by Anand Madhav
  • 57. Thank You! Anand Madhav @classicwild /anandmadhav Share feedback at https://bit.ly/3h4BdYc

Notas do Editor

  1. Source: How to Vanish Management Reporting Mania, Sep 2014: https://www.cfo.com/management-accounting/2014/09/vanquish-management-report-mania/
  2. Data alone can be useless and misleading without the context. Data needs to tell a story. Data needs to be associated with its surroundings, context, methods, ways and so on. That is where stories come into picture. Stories build a connection point with your audience, it removes complexity. We as humans have been looking at numbers for last 50 years max 100 years. We’ve been looking at texts since Gutenberg press. That’s precisely when as a civilization we started reading texts, we as a species have been looking at imagery ever since as a species we had a pair of eyes i.e. several millions of years. Visual imagery short circuits the brain’s pathway and leads you to jump to conclusion than any kind of analytical process and it frees the mind to start interpreting the results
  3. A leading European brewery’s plants purchased commodity raw materials from several vendors each – and had low volume discounts. Plants also placed multiple orders placed every week, leading to higher logistics cost. When plant managers were shown the data, they objected, saying “That’s not always the case.” Or, “That’s the only way– no one else does better.” Gramener built a custom analytics solution that sourced their SAP order data, automatically identified which plants ordered which commodities the most from multiple vendors – and when. It showed how each plant performed compared to peers – shaming those with poor performance. With this, they identified savings of €15 m — which the plant managers couldn’t refute.
  4. We did the simplest possible thing – plot the number of customers who had meter readings of 0, 1, 2, 3, etc.. – all the way up to 300 and beyond. (Effectively, we drew a histogram.) As expected, it was log-normal. Relatively few users with low meter readings, and few with high meter readings. But what was striking were the spikes – at 50 units, 100 units, 200 units and 300 units – precisely at the slab boundaries. Given the metering system, there is a strong economic incentive to stay at or within a slab boundary. Exceeding it increases the unit rate. However, there are two ways this could happen. Either the consumer watches their meter carefully, and the instant it hits 100, stops using their lights and fans – or a certain amount of money changes hands. It was easy to see from this that there was fraud happening, but what stumped us were the spikes at 10, 20, 30, 40, etc.. Here, there’s no economic incentive. There’s no significant difference between a meter reading of 10 vs 11, so there was no incentive to commit fraud. However, we later learnt that we were looking at this the wrong way. This was not a case of fraud, but of laziness. These were the meter readings taken by staff that never visited the premises, and were cooking up numbers. When people cook up numbers, they cook up round numbers. (An official said that he had to let go of one person who had not taken readings in a colony of houses for as long as six months. “Sir, there’s a pack of dogs in the colony” was his official statement.) The other question is, what is the nature of this fraudulent contract. Is it monthly? The meter reading guy appears and charges a small sum to adjust the reading? Or is it an annual contract that’s paid upfront? We looked at the meter readings of some of the people who were consistently at the slab boundaries. For example, the table in the middle has the readings of 10 customers, one per row. In the first row, the readings are consistently at 200 for 9 of the 12 months. However, there’s a spike in Jan-11 to 350 units. This indicated a monthly contract with a failure to pay in just one month. However, we later learnt that many of the people on this list were famous personalities. In fact, the lady in the first row had an event at their place in Jan-11, and the actual reading was expected to be well over a thousand units. But since the electricity board has a policy of not often auditing those that were in the highest slab (above 300), a more likely explanation was a collusion of the lineman with the customer to place her in the highest slab just this month, to avoid scrutiny. Lastly, we were examining the level at which fraud can be controlled. The last table above shows the extent of fraud of each section in one city, month on month. (The extent of fraud can be measured by the relative height of the spikes compared to the expected value.) Sections vary in the level of fraud, with Section 1 having significantly more fraud than Section 9. We also observe that fraud generally decreases in the winter season (Dec – Feb) when the need for cooling is less. But what’s most striking is the negative fraud in Section 5 in Jun-10. It stays low for a couple of months, and then, as if to compensate, shoots up to 82% in Sep-10. We learnt that this coincided with the appointment and transfer of a new section manager – under whose “regime”, fraud seems to have been dramatically controlled. It appears that a good organization level to control fraud is at the 5,000 people strong section manager level, rather than the 100,000 people strong staff level.
  5. Add steps on the right side
  6. Who do you know that will use this analysis? (If no one, make someone up). Edit exercise
  7. Activities to be checked yourself
  8. Add steps on the right side
  9. Think of good anecdotes here
  10. Add steps on the right side
  11. Instructors: Give the audience 1 minute to write down a one-sentence takeaway. Ask 2 people to read it out. Apply the checklist. If they don’t meet the checklist, prompt them to revise it. Allow them to struggle through it before taking help.
  12. Instructors: Give the audience 2 minutes to write down supporting material. Ask 1 person to read it out. Apply the checklist. Debate with them whether each point is necessary: “How does it support the takeaway?” Check if the analyses, when put together, logically prove the takeaway. There must be no alternative conclusion possible. If not, we need additional material to prove the takeaway.
  13. Instructors: Give the audience 2 minutes to expand on the context for any one analysis. Ask 1-2 people to read it out. Apply the checklist. Debate with them how they could make the point clearer and meaningful to the audience.. Explore alternatives, using comparisons, analogies or interpretation.
  14. Instructors: Ask 1-2 people from the audience to add supporting points to their takeaway or any message. Ask others to debate whether these points are necessary and sufficient to prove the parent message. Ask the audience if some of them are sub-bullets to a supporting point.
  15. Specificity? Structure?
  16. Instructors: The emotional arc is an interesting device and can attract questions. Clarify a few points. It’s not a graph of the data. It’s a graph of the EMOTION the graph triggers. There’s no right or wrong level to the emotion arc. It’s subjective, and audience driven. It’s important to ensure high contrast, and end with the feeling you want to deliver. That’s all. What most people are afraid to do is deliver bad news. Encourage the audience to deliver bad news as a vehicle to make the good news look better. Practice this by asking people for examples of related good and bad news. Then ask how they would word it so that the good news looks better. For example, “Sales fell. Margins improved” can become “Despite lower sales, our profits have not worsened proportionally, thanks to a significant improvement in our margins.”
  17. Make a longer ordered list of what cues should I use? Source: Designing Data Visualizations by Noah Iliinsky and Julie Steele (O’Reilly). Copyright 2011 Julie Steele and Noah Iliinsky, 978-1-449-31228-2.
  18. Specificity? Structure?