Many librarians need to express data visually in reports, papers, and presentations. The goal of this talk is to cover the basics of effectively displaying quantitative data visually. It will include an overview of quantitative data types and common quantitative relationships that can be expressed visually. The talk will emphasize practical considerations and guidance for effectively selecting and designing data visualizations, such as those found in everyday tools like Microsoft Excel and the Google Visualization API.
31. Visual perception *See Stephen Few's Show Me the Numbers and Christopher G. Healey's Perception in Visualization http://www.csc.ncsu.edu/faculty/healey/PP/index.html
Data visualization seems like it's become a bit of buzzword. So at the risk of disappointing some of you I'm not going to show a lot of fancy graphics. My goals for this talk are to dispel the myth that data visualization is something new. And I want to provide a framework for thinking about data visualization that you can apply to the kind of data visualization most of us do. This will involve approaching data with good questions, knowing the material you're working with, in this case data and basic tables and graphs. And also understanding how the human visual perceptual system affects what makes for useful displays of quantitiative information. I'll also show a few applications you might want to try out. And point you in some directions for future reading if you want to know more.
Now for some history
First maps were of the sky Cave paintings at Lascaux contain star maps Image from flickr user williamcromar
Maps of land came later. There seem to be several contenders for the first town map But here is a frequently cited example from Konya, Turkey in 6200 BCE
This graph by an unknown author attempts to show the movement of the planets over time. I can't vouch for its accuracy.
Rene Descartes – invents the Cartesian coordinate system This has significant impact on how we visualize quantitative information
William Playfair is credited with inventing statistical graphics. He invented the Bar Chart This is a later example that shows the rise in the price of wheat along with the rise in wages over time
A local example. Ben Schneiderman invented the treemap as a way to visualize usage of his Macintosh's hard drive. It's useful for displaying hierarchical data
Hans Rosling invents the Motion Bubble Chart – which is now part of Google's visualization API An interactive chart that displays several variables at once and animates changes over time. It's featured in a popular TED talk
Computers are powerful tools, and yet we still need a human brain to tell the computer what to process and how to process it. It's our job to approach the computer with the right questions. I want to emphasize the importance of asking good questions.
1913 London Underground Map - http://homepage.ntlworld.com/clive.billson/tubemaps/1913.html Here is an example of a data visualization (or map) that is accurate but may not work well for its intended purpose. Things to notice It's a standard map project Subway lines appear where they would geographically if they were on the surface Roads, various municipal boundaries are visible. It works but it's not optimal
Harry Beck's 1933 Underground Map Beck took a step back Considered the problem that the subway map was attempting to solve What matters are relation of stops and transfer stations to each other Legibility of stop names – where to get on and off Subway is underground – don't need roads For simplicity and legibility lines are drawn at 90 and 45 degree angles – Similar to electrical circuit diagrams http://sites.google.com/site/tombowersites/harry-beck
2010 Boston T Map This basic design is so successful that it is still used for subway maps around the world
I've leaned heavily on Stephen Few's Show Me the Numbers – which is a great book for getting a handle on how to use tables and graphs effectively.
Here is data It's a quantity But we don't know enough to know what it is quantifying
This just happens to be the number of keyword searches performed on NCSU libraries website last spring.
In order for data to mean something, in order for it to be information it needs to express a relationship
Nominal comparison – differences in particular values Time series – how values change over time Ranking – the order of values Part to whole (%) – percentages – what part of this whole is made up of that Deviation – difference from some standard value Distribution – how a set values are distributed over a range Correlation – whether two different values change together
I've leaned heavily on Stephen Few's Show Me the Numbers – which is a great book for getting a handle on how to use tables and graphs effectively.
It turns out that it's important to understand the kind of quantitative relationships you're working with because particular methods display are better at conveying particular quantitative relationships
Most data is or can be arranged in tables. It's often the perfect starting place and sometimes the right format for presenting quantitative information. Graphs aren't always very good at these things where Tables excel.
Back to my library website search example. This table has precise values with mixed units of measure Even a part to whole relationship on the bottom row
Likewise, graphs excel in areas where tables aren't so useful. When meaning that is hidden in a table is revealed by the shape of the values
13,000 pages of data – how to make this understandable?
Before looking more closely at different kinds of graphs and the kinds of quantitative relationships they're good at expressing, I want to introduce the role that visual perception plays in data visualization. See Stephen Few's Show Me the Numbers and Christopher G. Healey's Perception in Visualization http://www.csc.ncsu.edu/faculty/healey/PP/index.html
We don't just see stuff that's out there. Light reflects off objects. That light gets collected by our eyes and stimulates the retina. The signals from the retina are interpreted by the brain. There are particular ways that our brain processes visual information that has a bearing on what is and isn't useful for visualizing data. Brain image originally posted to Flickr, was uploaded to Commons using Flickr upload bot on 22:05, 20 October 2008 (UTC) by Kaldari Eye image: Copyright: public domain, credit to NIH National Eye Institute requested. Mountain: Some rights reserved by Ian BC North
Here is a series of numbers If I asked you pick out and count all the 0's you'd have to scan the numbers serially and count as you moved your eye from one digit to the next. This would take you some time, maybe 20 to 30 seconds Example adapted from Stephen Few's Show Me the Numbers .
If I increase the intensity of the color of the 0's Suddenly you can pick out the zeroes without having to process All of the visual information serially You can pick out without thinking about it all the items with increased intensity.
An important initial result was the discovery of a limited set of visual properties that are detected very rapidly and accurately by the low-level visual system. These properties were initially called preattentive, since their detection seemed to precede focused attention. One way to think about this is that preattentive features we can processes all at once, while other features we have to process serially. Another thing to keep in mind is that the more of these attributes that are present the less effective they are.
There is a small subset of these that we can interpret quantitatively. Notice that line length and 2d spatial position are the most effective attributes. Others can be used but they pose challenges. I am going to ignore flicker and direction to focus on static images, but you could also use these to display quantitative information. This is an incomplete list. For a really in-depth discussion of preattentive processing and attributes see http://www.csc.ncsu.edu/faculty/healey/PP/index.html
There is a small subset of these that we can interpret quantitatively. Notice that line length and 2d spatial position are the most effective attributes. Others can be used but they pose challenges. I am going to ignore flicker and direction to focus on static images, but you could also use these to display quantitative information. This is an incomplete list. For a really in-depth discussion of preattentive processing and attributes see http://www.csc.ncsu.edu/faculty/healey/PP/index.html
Scatterplot – takes advantage of 2D spatial position
Line chart also takes advantage of 2D spatial position. Line chart is really a scatterplot with lines draw between points in some sequence.
Bar chart takes advantage of line length and 2D spatial position
This was created using protovis. You can also consider using small multiples. In this case they are intended to show differences in rate of change over time across different departments If you read consumer reports, you're familiar with this, when they use their colored dot matrix to rate various products, That is an effective use of small multiples displayes to enhance comparisons across categories. http://en.wikipedia.org/wiki/File:Smallmult.png Public domain image You can also consider using small multiples. In this case they are intended to show differences wait times for different device. The lines show the pattern within each device type and the color intensity show higher average weight time time across different devices If you read consumer reports, you're familiar with this, when they use their colored dot matrix to rate various products, That is an effective use of small multiples displayes to enhance comparisons across categories. http://en.wikipedia.org/wiki/File:Smallmult.png Public domain image
Both charts show the same data in the same order. Which makes it easier to determine whether B or C is larger. Turns out we're better at distinguishing small differences in length than small differences in area, which is why I think pie charts are usually a bad idea and I pretty much never use them.
Both charts show the same data in the same order. Which makes it easier to determine whether B or C is larger. Turns out we're better at distinguishing small differences in length than small differences in area, which is why I think pie charts are usually a bad idea and I pretty much never use them.
No data visualization presentation could be complete without mentioning Edward Tufte. The graph should reveal more than the data can reveal in its raw form Don't worry about doing something pretty or cool, do something effective Don't lie and 3D effects are probably a bad idea A really good visualization let's you some things quickly, but also can reveal depth upon closer examination Know what you're showing and why you're showing it.
Now that we have an historical context. Know that having good questions is important Common kinds of quantitative relationships How our visual perceptual system influences what makes for a good data visualization And that particular graphs are better at displaying particular quantitative relationships, let's look at some of the tools that are out there.
As an alternative to excel, which can also produce great charts. BTW I think excel is a great tool for exploring datasets, because the cost of trying things is so low.
Has the advantage of looking and working like the familiar spreadsheet applications Different visualization options can be accessed By inserting Gadgets or Charts into the document Here I'm selected a Gadget.
Menu of available Google Gadgets.
So in this case I've created a treemap from some collection management data about our spending on resources. But what I really want to point out is the Publish button. Because this app is in the cloud. You get some advantages over excel. You can publish graphs and then very easily embed them in other web pages, which is useful if you want to create a web-accessible report. Area of each of the rectangles corresponds to that node's value. "Treemaps display hierarchical data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node's rectangle has an area proportional to a specified dimension on the data." -- http://en.wikipedia.org/wiki/Treemapping One of the more interesting features of Google Documents is the ability to publish Documents and Gadgets. Publishing generates a code snippet that can be added to a webpage to display the chart or document.
Another web tool that doesn't require any programming.
After some struggle about whether I should present on two different tools from the The Google, I decided I would be honest and go ahead and reveal the tools I use most often.
Somewhat like google gadgets but more powerful. Google Visualization API Collection of JavaScript visualizations You can customize and embed in web pages Requires some programming know-how
Relatively simple Javascript embedded in a web page generates the chart. Can modify this directly and create a chart, but the data will be static.
The advantage of this is that I can use PHP to generate Javascript. In this case everytime I load this page PHP processes all the most current log data, generates the javascript and I can see a chart of search activity that's up to date every time the page gets loaded.
There are lots of tools out there for doing various sorts of data visualizations. Thanks to Hilary Davis and Joe Ryan for some of the following tools/book/website recommendations. Many Eyes you have to make your dataset public, which can be a consideration
More advanced tools are more flexible but often require some comfort with javascript and/or PHP depending on what you want to accomplish.
There seem to be more of these.
Adobe Illustrator often creates cleaner looking charts than excel – at the cost of some effort to learn the application OmniGraffle is fantastic for making diagrams Viso plays a similar roll for PCs
Adobe Illustrator often creates cleaner looking charts than excel – at the cost of some effort to learn the application OmniGraffle is fantastic for making diagrams Viso plays a similar roll for PCs
Edward Tufte gets a lot of attention Personally, I think he's overrated He popularized the idea of displaying information visually His first book is worth checking out But Few will be more helpful for practical advice I really like Show Me the Numbers – very practical guide to statistics and basic charts
Also a number of websites that are worth checking out on your own time.