2. I’m Not a Data Journalist, So…
I spoke to three really good ones:
• James Ball, special projects editor, The
Guardian
• Tasneem Raja, interactives editor, Mother
Jones
• Sarah Cohen, CAR team editor, The New York
Times
4. “There is a huge difference between records and
statistics … I really want to work with records
— I don’t want to work with statistics.” —
Sarah Cohen
5. Interview the Data
• “The first thing looking for is completeness. Do I have
everything I was supposed to get? If I know 3 million
deportations over X number of years, then there
should be 3 million rows.” — Cohen
• Compare it to the form it was generated from.
• “Look for typos if things have been manually entered”
— Ball
• Are there blank rows or columns? Are the duplicate
rows or columns? Have dates/time been rendered
correctly?
• Are the data consistent within rows/columns?
6. Interview the Data cont…
• “Looking for missing things as opposed to things
that are there.” — Cohen
• Does it answer the question, you had when you
decided to ask for the data?
• Find an expert. “There are people out there who
know datasets really well and what they should
be saying. They’re going to be able to tell you
whether you made some fundamental flaw in
your logic.” — Cohen
7. Beware of Spreadsheet Limits
• Excel 2003: 65,536 rows by 256 columns.
• Excel 2007/2010: 1,048,576 rows by 16,384
columns
• Google Docs: 256 columns
8. Rules of Thumb
• “Having a good set of rules of thumb and
applying them can prevent a lot of horror
shows.” — Ball
• Collect credible, relevant data to compare
with what you’ve gathered.
9. Standardize the Data
• “We’ve trained all of our reporters to use
Google spreadsheets … We need a method of
collecting data that’s not a barrier to entry for
non-programmer people like fact checkers,
copy editors and editors.”— Raja
10. Look for Outliers
• Visualize your data in multiple ways to have fresh eyes
(bar, line, pie etc.)
• “I’ll make a visualization in 30 different ways and I’ll
look at them on different scales…. Try to look at it in so
many different ways that it’s not possible for you to
have just gotten tunnel vision.” — Cohen
• What doesn’t look right?
• What’s not the way you expected?
• “Anytime you see a big outlier or something
counterintuitive assume its probably because
something has gone wrong.” — Ball
11. Visualize with Care
• “Be really careful on visualizing data. It loses
all the ambiguity. Most look at a graphic and
very few read the small print at the bottom. If
it’s in a graphic it gives sense of urgency.” —
Ball
12. Annotate/Describe
• “I tend to go back over all my notes and the
programs and the interviews and make sure
there is nothing I had internalized but forgot I
should tell people about it” — Cohen
• “Nerd box.”
• Try to get your data details/caveats into the
main piece, too.
13. Data Diary
• “Essentially a report’s notebook that describes
the whole process of where does this data
live, how did we clean it and merge it, what
APIs did we use, what calls did we send out to
the API ...” — Raja
14. Publish the Data
• “There’s a big risk of being a black box and
saying, ‘We have great data and here’s the
core result’ and then not putting it up.” — Ball
• More eyes = better information.
• Helps you get more data.
• Make corrections.
15. General Tips
• Be aware of the tendency to oversimplify or magnify data.
• Data are just like any other facts: you have to verify them.
• Establish credible rules of thumb for your data.
• Visualize data in multiple ways to get a sense of what you
have.
• What you do with data also has to be verified.
• Be aware of the tools you use, and how they can affect
accuracy.
• Explain your methods and the data, and share the data.
17. Accuracy Fundamentals
• Don’t assume, verify.
• Mistakes are natural and a byproduct of how
journalists/humans work.
• Just because someone official tells you/gives
something, it doesn’t mean its accurate.
• Develop and consult credible sources.
• Verification is a team sport.