O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Humanizing Data Storytelling for Greater Business Impact

109 visualizações

Publicada em

This presentation was shared by Gramener's Kanishk Kumar Abhishek during his guest lecture session at School of Business Management, NMIMS Mumbai.

Check out Gramener's data storytelling workshop for analysts and data scientists at https://gramener.com/data-storytelling-workshop

Publicada em: Dados e análise
  • Login to see the comments

  • Seja a primeira pessoa a gostar disto

Humanizing Data Storytelling for Greater Business Impact

  1. 1. Humanizing Data Stories for Greater Business Impact NMIMS, 30 July 2020
  2. 2. How a nurse changed the course of a war using data storytelling
  3. 3. Nightingale, helped curtail the death rate from a whopping 40% to a mere 2% 3 Created by Florence Nightingale for Queen Victoria during England’s war with France. Visualizes deaths due to: Red: War wounds Black: Other war-related causes Blue: Avoidable hospital diseases
  4. 4. Let’s look at 15 Years of US Birth Data US Birth dataset (1975 – 1990) that has been around for several years and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known. • For example, • Are birthdays uniformly distributed? • Do doctors or parents exercise the C-section option to move dates? • Is there any day of the month that has unusually high or low births? • Are there any months with relatively high or low births? PROPRIETARY&CONFIDENTIAL:FORINTERNALUSEONLY More births Fewer births … on average, for each day of the year (from 1975 to 1990) Very high births in September. But this is fairly well known. Most conceptions happen during the winter holiday season Relatively few births during the Christmas and Thanksgiving holidays, as well as New Year and Independence Day. Most people prefer not to have children on the 13th of any month, given that it’s an unlucky day Some special days like April Fool’s day are avoided, but Valentine’s Day is quite popular
  5. 5. The pattern in India is quite different This is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns. For example, • Is there an aversion to the 13th or is there a local cultural nuance? • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Very few children are born in the month of August, and thereafter. Most births are concentrated in the first half of the year We see a large number of children born on the 5th, 10th, 15th, 20th and 25th of each month – that is, round numbered dates Such round numbered patterns a typical indication of fraud. Here, birthdates are brought forward to aid early school admission More births Fewer births … on average, for each day of the year (from 2007 to 2013) LINK
  6. 6. This adversely impacts children’s marks It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children
  7. 7. An energy utility detected billing fraud This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Most fraud detection software failed to load the data, and sampled data revealed little or no insight. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
  8. 8. This plot shows the frequency of all meter readings from Apr- 2010 to Mar-2011. An unusually large number of readings are aligned with the tariff slab boundaries. This clearly shows collusion of some form with the customers. Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 217 219 200 200 200 200 200 200 200 350 200 200 250 200 200 200 201 200 200 200 250 200 200 150 250 150 150 200 200 200 200 200 200 200 200 150 150 200 200 200 200 200 200 200 200 200 200 50 200 200 200 150 180 150 50 100 50 70 100 100 100 100 100 100 100 100 100 100 100 100 110 100 100 150 123 123 50 100 50 100 100 100 100 100 0 111 100 100 100 100 100 100 100 100 50 50 0 100 27 100 50 100 100 100 100 100 70 100 1 1 1 100 99 50 100 100 100 100 100 100 This happens with specific customers, not randomly. Here are such customers’ meter readings. Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109% Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54% Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34% Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14% Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15% Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33% Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14% Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17% Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11% If we define the “extent of fraud” as the percentage excess of the 100 unit meter reading, the value varies considerably across sections, and time New section manager arrives … and is transferred out … with some explainable anomalies. Why would these happen?
  9. 9. Class Xth English Marks Distribution 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  10. 10. Stories have four types of narratives to explain visualizations Remember “SEAR”: Summarize, Explain, Annotate, Recommend 10 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. Large number of students score exactly 35 marks Few (but not 0) students fail at 31-34 marks What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the visual in its title Don’t describe the chart. Don’t write the user’s question. Write the answer itself. Like a headline. Explain & interpret the visual How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Annotate essential elements What should the user focus their eyes on? Point it out, or highlight it with colors Interpret what they’re seeing – in words. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. No one scores 0-4 marks
  11. 11. Our focus at Gramener is on narrating insights from data as stories Stories are memorable, viral NUMBERS ARE NOT ENOUGH STORIES EXPLAIN THEM Delays are due to fragile cargo. Trained staff and forklifts reduce risk of breakage, and hence reduce delay. Insights are useful, non-obvious, Big FACTS ARE NOT USEFUL E.g. Delay in cargo delivery grew 8% last quarter. INSIGHTS ENABLE ACTION Lack of forklifts and fewer trained staff led to the delay. Improving these can reduce cargo delay by 15%. Today, I’ll share how we apply these at organizations like HDFC 11 INSIGHT STORY DATA GRAMENER COMBINES
  12. 12. Data storytelling is a critical skill for data scientists, analysts & managers 12 Stories are memorable. They spread virally People remember stories. They’ll act on them. People share stories. That enables collective action. For people to act on analysis, data stories are critical. But analysts present analysis, not stories We present what we did. Not what you need. You need to know what happened, why, & what to do. Narrated in an engaging way. As a story. We’ll learn how do that in this session. Storytelling has a 30X Return on Investment Rob Walker and Joshua Glenn auctioned common items like mugs, golf balls, toys, etc. The item descriptions were stories purpose-written by 200+ contributing writers. Items that were bought for $250 sold for over $8,000 – a return of over 3,000% for storytelling! Original price: $2.00. Final price: $50.00. This little statue stood on the window-sill in my favorite aunt’s front hall. Perched between plants of varying shapes and sizes, surrounded by shards of broken pottery and miniature ceramic elephants from the Red Rose Tea box, dappled with sunlight shining through the leaded glass figures of St. Francis in his garden and the mossy Celtic Cross, …
  13. 13. You have data. You have analysis. Now what? Understanding the audience & intent Finding insights Storylining Designing data stories
  14. 14. Understanding audience & intent Align Key Message Step 1 Understanding the audience & intent Finding insights Storylining Designing data stories
  15. 15. DO IT: Who is the audience for your analysis?  Role: _____________ Be specific. “Head of sales”, not “executive”  Example name: ______________ Name a real person. “Jim Fry”, not “any sales head”. Different people want different things from the same data. Given sales data: • The Board: “Predict next quarter’s sales” • Product head: “Which product grew the most?” • Sales head: “Did we meet our target?” They are not interested in each others’ questions. Who is your audience? They determine the story
  16. 16. DO IT: Write it in this structure “[Person, Role] is in [situation], and faces this [problem]. By taking [action], she can drive [impact].” Example John, the Marketing head, person, role must create a region-wise budget, situation and doesn’t know the region-wise RoI. problem By prioritizing the region, action she can maximize ROI. impact For each person, answer the following questions: 1. What’s their situation? 2. What problems do they face? 3. What action can they take? 4. What is the impact of this action? What is their problem? That defines you to align the message accordingly Clear needs & future scenario leads to effective communication.
  17. 17. Here are three examples in real life 17 Purchasing Commodities Cargo Delay Customer Churn Person, Role Adam, the purchasing head of a leading European brewery Cris, the operations head of a leading US airline Ravi, the marketing manager of an Asian telecom company Situation had plants that purchased commodities from several vendors. Discounts were low. Number of weekly orders were high. had an SLA to deliver cargo from the flight to the warehouse in under 1.5 hours – 15% lower than their current best performance. Found that the cost of replacing customers was thrice the cost of retention. Problem But he didn’t know which plants and commodities were a problem. Every plant denied it. But she didn’t know what were the biggest drivers of this delay – people, assets, or type of cargo. But he didn’t know which customers to make offers to in order to retain them. Action By consolidating vendors and reducing order frequency, By adding resources only to the largest levers of delay, By predicting which customer was likely to churn, Impact they could increase their discounts and reduce logistics cost. she could reduce turnaround time with the lowest spend. they could tailor a retention offer and reduce re-acquisition cost.
  18. 18. Finding Insights Big, Impactful & Surprising Step 2 Understanding the audience & intent Finding insights Storylining Designing data stories
  19. 19. Insights must be Big, Useful, and Surprising Filter the analyses using these as a checklist IS THE INSIGHT BIG IS THE INSIGHT USEFUL IS THE INSIGHT SURPRISING The analysis must, of course, be statistically significant. But it should also be numerically significant. We want a result that substantially changes the outcome. What should the audience do after hearing the insight? Can they take an action that improves their objective? Even if it’s informational, what should they do next? Is this something they didn’t know? Is it non-obvious? Does it overturn a domain-driven belief or a gut feel? Or does it bring consensus to a group with divided opinion?
  20. 20. Marking each analysis as Big, Useful or Surprising (High, Medium, Low) 20 Only those that are high or medium on all aspects are insights Insights Big Useful Surprising Twice as many Detractors talk about our Product’s ease of use. Low Medium High Typing with capitalization in a credit application indicates creditworthiness Low Low High Almost 20% of all voice search queries are triggered by just 25 words Low High Medium More engaged employees have fewer accidents Low High Low About 50% of American small businesses do not have a website High Medium Low The recommendation system influences about 80% of content streamed on Netflix High Medium Low
  21. 21. Here are the analyses & filters for the problems we saw earlier 21 Purchasing Commodities B U S Cargo Delay B U S Customer Churn B U S The most common commodity was ordered 10 times a week across 2.4 vendors Fragile cargo is a big factor in the delay, with a 20% impact B S Number of inbound calls does not impact churn. S The number of orders is correlated with the number of vendors. Reducing one will reduce the other U Fridays are when cargo is delayed the most Customers who haven’t made any calls in the last 15 days are the most likely to churn B Plant P126 was the plant with the most violations, especially on largest commodity B U Trained staff and forklifts impact delay the most B U S Customers making infrequent calls, recharging small amounts infrequently, are most at risk B U S
  22. 22. Storylining Summarize Step 3 Understanding the audience & intent Finding insights Storylining Designing data stories
  23. 23. A business storyline • Our NPS improved 6% • It was 34% in 4Q18. Now it’s at 40% in 2Q19 • Despite lower satisfaction with our Support, our NPS grew • This increase in NPS was mainly due to better Product Quality & Research Gladiator’s storyline • The Emperor asks General Maximus to take control of Rome and give it back to people • The ambitious Prince murders the emperor. • Maximus is sold as a gladiator slave. His family is murdered • Maximus grows famous, fights the Prince in the arena, and wins • He joins his family in death. Rome is in the hands of the people Outlines are the backbone on which you flesh out your story. This section explains how to create storylines Storylines are plot outlines. They summarize the entire story Notice “characters” in red. All stories have characters, human or otherwise. 23
  24. 24. Convert analysis into messages by adding context 24 DO IT: Add context to your analysis 1. Take each relevant analysis 2. Convert it to a message for the audience by adding context CHECK IT: Verify these yourself  Will your audience understand the messages without explanation?  Will your audience understand why this message is relevant? Analysis doesn’t mean anything to people. When it does, it’s a message. We do this by adding context. Three ways to add context are: 1. Compare with similar numbers. Our $15 mn sales is $3 mn more than last year, $1 mn below budget, and twice our nearest competitors. 2. Explain with analogies. If we stopped producing, it’ll take 3 months to dispose our excess inventory of $2 mn. 3. Add business interpretation. Usage is correlated with discounts. For every $1 discount, customer LTV increases by $24. Frame each analysis as a message that the audience will understand and find relevant
  25. 25. Structure the messages into a pyramid or a tree Conventional approach is to explain how we did the analysis & found the insight Insight is lost in the set of slides, takes too long to reach to the first insight. Instead, start with insight first, and then take the audience through arguments to support it. Starts with the main message, and then answers why & how the insight makes sense. Title Analysis section 1 Methodology Insight Analysis section 2 Methodology Insight Insight that answers a business question Supporting argument 1 Methodology Supporting reference Supporting argument 2 Methodology Supporting reference
  26. 26. Designing data stories Data Interpretation Step 4 Understanding the audience & intent Finding insights Storylining Designing data stories
  27. 27. Pick a format based on how your audience will consume the story 27
  28. 28. How the data should be interpreted decides the type of chart to be used 28 https://gramener.github.io/visual-vocabulary-vega/ Deviation Change- over-Time Spatial Ranking Correlation Part-to- Whole Flow Magnitude Distribution
  29. 29. DO IT: Write your takeaway as one sentence What’s the one thing you want the audience to remember from your story? What’s the one message that the audience should take away? CHECK IT: Verify these yourself  Is it a single, complete, sentence?  Does it deliver what you want the audience to remember?  Will your audience care a lot about this? Close your eyes. Think of a childhood tale. Summarize the moral of the story in one line We easily we remember these stories and their summary as a moral several years later. Close your eyes. Think of a business presentation from last week. Can you easily summarize the message in one line? Stories are designed around a moral. A single takeaway. An “elevator pitch” It’s a one-sentence summary of the most important message for the audience. Start with the takeaway. Summarize your entire story 29
  30. 30. Structure supporting analyses as a tree 30 Example of a business tree Launch sales were 30% less than target due to high competition • Launch sales were projected at $20 mn in the first month, but achieved only $14 mn o Sales in every region were 20-50% lower. o Only Philippines & Korea were on target • Competitors discounted price by 35% - which is unsustainable for them o 80 store discounts increased from 15% to 35% o The maximum sustainable discount is 20% • Stores offered higher discounts saw less than 20% of our target sales Construct a pyramid or tree-like outline • Start with the takeaway at the root of the tree • Add a message that supports the takeaway • Add further details or supporting messages • Messages must prove the first message, and only the first message • Strike off any message that isn’t required to prove or support the takeaway • Add next message that supports takeaway • Add details to prove the second message • Remaining messages for the takeaway • Add details as required Arrange messages hierarchically to prove & support the parent message
  31. 31. 4 type of annotations help the audience understand your intent 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. Large number of students score exactly 35 marks Few (but not 0) students score between 30-35 What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the chart in its title Don’t describe the chart. Don’t write the question to answer. Write the answer itself. Like a headline. Explain the chart How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Highlight essential elements What should the user focus their eyes on? Point it out. Interpret what they’re seeing – in words.
  32. 32. Here is the storyline for the analyses we saw earlier 32 Purchasing Commodities Cargo Delay Customer Churn Takeaway Focus on reducing the number of vendors products ICG (in P126), FRS (in P121) and SWB (in P074) for a potential 40% reduction in logistics & vendor cost. To reduce the TAT to 1.5 hours at Airport XYZ, increase the number of forklifts from 1 to 2, and the number of trained staff from 4 to 6 If a customer has not called in the last 5-14 days, and they have made only 1 recharge under $20 last quarter, make them an offer to retain them. Supporting points ICG spend is among the highest, at €6.9m. P126 typically orders 40 times a week, often from 15-20 vendors. The number of forklifts is the biggest driver of TAT. Each forklift typically reduces TAT by 15-30%. The biggest driver of retention is when the customer made the outgoing call. The 5-14 days bucket has the highest variation. FRS spend is €3.2m. P121 orders from 3 vendors 8-14 times a week. Total staff count does not impact TAT. Increasing trained staff has a more tangible impact of ~5-10% per person. Customers who make at most 1 recharge under $20 are 280% more likely to churn than others.
  33. 33. You have a story Let’s present it
  34. 34. To understand business performance, dashboards are not enough. We need stories 34 INSIGHT STORY DATA GRAMENER COMBINES
  35. 35. WELCOME TO DATA STORYTELLING Insightsasdatastories
  36. 36. HUMANIZE DATA STORYTELLING 36
  37. 37. A FRIENDSHIP IN DATA, DRAWING & POSTCARDS 37
  38. 38. A FRIENDSHIP IN DATA, DRAWING & POSTCARDS 38
  39. 39. A FRIENDSHIP IN DATA, DRAWING & POSTCARDS 39
  40. 40. A BETTER WORLD DATA PORTRAIT 40
  41. 41. Data stories through Comicgen An e.g. CoVID-19 Data Explained by Data Comics
  42. 42. Insights and Story telling approach 42 Stage 1- Identify Business Problem Define the problem statement by understanding: • What is the basic need and desired outcome? • Who will benefit? • What is the impact? • What is the success criteria? Stage 2- Translate to Data Problem • Breakdown the problem statement into multiple use- cases • Connect each use case with a data set • Understand any limitations on data sources- Internal and External? Stage 4- Translate to Business Answer • Stitch insights from individual use case to create a story • Connect data story to help in better decision making • Measure success Stage 3- Data Answer Target each use case with data through: • EDA and transformation • Modelling • Generating insights • Sales Rep • Data Consultant • Account Manager • Solution Lead • Analyst Lead • Data Consultant • Account Manager • Solution Architect • Solution Lead • Analyst Lead • Data Consultant • Data Scientist • Solution Architect • Solution Lead • Data Consultant • Account Manager • Solution Lead
  43. 43. Samuel L. Jackson Morgan Freeman Tom Hanks Harrison Ford Gary Oldman
  44. 44. Samuel L. Jackson Harrison Ford Morgan Freeman Tom Hanks Tom Cruise
  45. 45. In summary, here are the 9 steps to go from data to a data story 45 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives
  46. 46. To recap, we narrate insights as data stories But this is not scalable without technology 46 INSIGHT STORY DATA GRAMENER COMBINES
  47. 47. THANK YOU

×