4. Let’s Look at Some Data
I
x
II
y
x
III
y
x
IV
y
x
y
10
8.04
10
9.14
10
7.46
8
6.58
8
6.95
8
8.14
8
6.77
8
5.76
13
7.58
13
8.74
13
12.74
8
7.71
9
8.81
9
8.77
9
7.11
8
8.84
11
8.33
11
9.26
11
7.81
8
8.47
14
9.96
14
8.1
14
8.84
8
7.04
6
7.24
6
6.13
6
6.08
8
5.25
4
4.26
4
3.1
4
5.39
19
12.5
12
10.84
12
9.13
12
8.15
8
5.56
7
4.82
7
7.26
7
6.42
8
7.91
5
5.68
5
4.74
5
5.73
8
6.89
5. Let’s Look at Some Data
I
x
II
y
x
III
y
x
IV
y
x
y
10
8.04
10
9.14
10
7.46
8
6.58
8
6.95
8
8.14
8
6.77
8
5.76
13
7.58
13
8.74
13
12.74
8
7.71
9
8.81
9
8.77
9
7.11
8
8.84
11
8.33
11
9.26
11
7.81
8
8.47
14
9.96
14
8.1
14
8.84
8
7.04
6
7.24
6
6.13
6
6.08
8
5.25
4
4.26
4
3.1
4
5.39
19
12.5
12
10.84
12
9.13
12
8.15
8
5.56
7
4.82
7
7.26
7
6.42
8
7.91
5
5.68
5
4.74
5
5.73
8
6.89
Property
Value
Mean of x in each case
9 (exact)
Variance of x in each case
11 (exact)
Mean of y in each case
7.50 (to 2 decimal places)
Variance of y in each case
4.122 or 4.127 (to 3 decimal places)
Correlation between x and y
0.816 (to 3 decimal places)
in each case
Linear regression line in each y = 3.00 + 0.500x (to 2 and 3 decimal
places, respectively)
case
6. Let’s Look at Some Data … Visually
“Anscombe’s Quartet”
Source: Wikipedia
Welcome to Visual Analytics Best Practices. My name is Jewel Loree and I am a Data Analyst with the Product Marketing Team. Let’s get started!
You found some great data… now what? Maybe you were exploring data.gov and came across a dataset that seemed like there could be something interesting. Or maybe you had a lead and filed a records request.
No matter how you got it, you probably have something similar to this. This is data for Graffiti in New York City. Assuming you know your way around an Excel calculation, you could do some basic statistical work right here in the spreadsheet: counting records to see how many times a certain address got hit, or the number of incidents per borough… you can just report those numbers in your story. But does knowing the stats really give you the best idea of the trend? And will data like that connect with your reader?
Let’s see another example. Here’s some data. The are 4 sets of data here, each with 11 sets of x-y coordinates. For the purposes of this exercise, let’s assume the x data represents, in millions, the net sales of a single retail store over the course of a month. Let’s say the y data represents, in millions, the total profit from that store. So we’re looking here at a set of points that represent profit by sales, where each point is a single store. The four data sets represent regions, say, West, Central, South and East. Let’s say you’re a manager responsible for maximizing profit at these stores. What’s your move? [pause for 10 seconds, let people try to say smart things ]
OK, OK, so you’d typically have a bit more information than that when you’re making a decision. So now let’s look at some more information about these data sets. Maybe we can learn something about them from their means, or their variances. When we’re crunching numbers, we rely a lot on things like means and variances. And probably looking at correlation or doing a linear regression would help, too. It turns out that these four data sets all have the same means, the same variances, the same x-y correlations, and even boil down to an identical linear regression. So … what’s your move? [Pause for 10 seconds or so]
Here are these same four data sets, plotted visually, with trend lines. Now, what’s your move? [Let the audience make some suggestions. You can chime in with things like, “Yeah, you might want to talk to the manager of the outlier in set 3 and see what she’s doing right” or “You might want to talk to the managers of some of the stores in set 4 and see why their profits are underperforming compared to stores with similar sales.”]What other pieces of information might you want? [Let them make suggestions, and if necessary you can chime in with things like “You might want to see how many orders each store is producing, or what categories of product they’re selling most, or how frequently they offer discounts.”]It would be nice to be able to encode some of that information on these graphs, like maybe have a larger circle for stores that offer, on average, larger discounts, or to be able to quickly split this data up to show sales by product category by store. Like, maybe with just one click. And then it would be nice to be able to share this view with your individual store managers, with just a couple of clicks. And it would be nice to have that view you shared update with real-time data, so those store managers could see day-by-day how their stores were performing compared to their peers, and interact with that live data to understand why their stores are succeeding or lagging. So that they are each empowered to explore the information they need to meet and exceed their profit goals. That’s what Tableau does.
Bringing your data into a data visualization tool serves two purposes. Not only do you end up with a cool, interactive piece of media that’ll make your readers connect with your story better… it also helps drive what kind of questions you are asking and what insights you are making.
Emily La Coz of the Jackson Clarion-Ledger and John Kelly of USA today pulled data form the U.S. Centers for Medicare & Medicaid Services to create this set of dashboards on nursing home problems across the USA. This is a good example of an “exploratory dashboard” where the user can drill down and find what is interesting to them.
Melissa Maynard created a set of visualizations to accompany a story about the changing landscape of local governance. She created two different dashboards for this project, an exploratory dashboard where you can click into a state and see how many different local governments there are there and what kinds. She also created an explanatory dashboard showing how local governance has changed over time.
I worked very closely with Sarah Ryley on this project about the New York Stop and Frisk program. We started out with a dataset of over 500k rows; a row for every stop that occurred in 2012. This was some wide data, and we weren’t sure what we were going to find. We both explored the data thoroughly, following our instincts about where stories were and following rabbit holes and really getting into it. Some ideas we had didn’t work out; Sarah was disappointed to find the data didn’t support the idea that at the end of the month, cops make more stops… which would indicate a quota. But we did find plenty about the efficacy of the program as a whole. We created several dashboards together… I was really blown away with how creative and analytical the team at NYDN was, especially for this being their first project!
Tableau builds software for people, not specialists. We believe anyone should be able to harness the power of data. That’s our mission.