1. An Introduction to Data
Visualisation for Analysis
Exploring the Dataset -
Textual, Numerical and Otherwise
http://www.slideshare.net/shawnday/m-phil-datavisforanalysis
2. Agenda
Thoughts from last week - wordpress.com?
Introduction
What do we mean by Data Analysis?
Some foundation terms and concepts
The Data Visualisation Process
Tools and Methods
Extending your toolset
An Exercise
3. Objective
To appreciate the rich variety of techniques and
tools available to digital humanities scholars for
data visualisation and analysis.
The intention is to be able to add tools to your
arsenal and to have a sense of where to look for
more.
4. Breakpoint
One of the keys to good visualization is
understanding what your immediate goals are.
Are you visualizing data to understand what’s in it,
or are you trying to communicate meaning to
others?
You - Visualisation for Data Analysis
Others - Visualisation for Presentation
6. So Why Would You Want to Visualise
Your Data?
Bypass language centres to tap directly into the
visual cortex
Leverage ability to recognise patterns - what they
call visual sense-making
Powerful graphics engines now allow for live
data processing and sophisticated animations
and interactive research environments
Sources: Geoff McGhee, Getting Started with Data Viz
7. So Why Would You Want to Visualise
Your Data?
Work with new data to create new knowledge
Explore data to discover things that used to be
unknown, unknowable or impractical to know
Take a new perspective on the familiar to reveal
previously hidden insights
11. How Could You Use Data Analysis
“In the Lab” - for your own analysis
Online as part of collabourative groups
Through dissemination for extension of own work
- crowdsourcing
Others?
15. Diaries: the raw materials
• 100s of pages
• Varying hands
• Varying quality
16. The Process
• Generate word frequency (Voyeur, TAPoR)
• Isolate known farm activities (NLP -
LanguageWare)
• Collocate to link activity references to time,
duration, and resources (Voyeur)
22. What is the Value of this Visualisation
• Easier to compare over intervals
• Multiple vectors with greater granularity in a
compressed space
• The challenge is to find rich enough source
materials to yield substantive datasets
26. Case Study:
Occupations of Politicians
• What are we studying?
– Self-declared occupations of politicians
• Why?
– What bias might they bring to their job?
• How?
– Visualising past occupation and mapping to political
platform of party affiliated with
30. The Result/ New Patterns
• The emergence of the professional politician with
no private sector experience
• Occupational continuity across changes in
governing party
32. The Value of Data Vis for Analysis
• New ways of presenting allow new ways of seeing
• Hidden patterns become evident
• Suggest other hypothesis to test
33. Basic Terms
Datamining
Statistics
Structured/Unstructured Data
Visualisation
Modelling
34. Types of Data to Visualise
Audio Data Network Data
Categorical Data Social
Cartographic Data Other
Collections Numerical Data
Image Data Temporal Data
Still Textual Data
Moving Narrative
Metadata Qualitative
Multimedia Data ????
35. General Steps in Data Vis for DH
Discovery / Acquisition
Cleaning / ‘Munging’
Analysis / Exploratory Vis
Presentation
36. Discovery / Acquisition
Original Research Scraping
Spreadsheets Junar
Databases Outwit Hub
Digitized Media ScraperWiki
Other Downloads
Public Data
Archives/Libraries
Academic Partners
Purchase
38. Cleaning / Munging
(Normalisation, Format Conversion)
Tools:
Data Wrangler
Google Refine
Mr. Data Converter
Data Wrangler
Does simple, split, clear, fold/unfold transforms on data
See example --> Data and Script
Google Refine
Works with larger datasets
42. Analysis / Exploratory Visualisation
Web Services
Google Fusion Tables
Google Spreadsheets
IBM ManyEyes
TimeFlow
Applications
Tableau/Tableau Public
MS Office
OpenOffice
Gephi
Node XL (plug-in for Excel)
Spotfire
R Processing
43. Google NGram Viewers
Examine word frequency in digitised books
Currently about 4% of books ever published
In English, Chinese, French, German, Hebrew,
Russian, and Spanish
Changes in word usage
Trends
Check out the Cultural Observatory @ Harvard
45. Wordle
Visually present word frequency using size,
weight, colour
Consider Word Clouds Considered Harmful
46. Exercise
Choose a dataset from a source such as:
The CSO
Project Guttenberg
or your own material
Choose an appropriate Data Visualisation from a webservice we explored
in workshop.
Explain the process and how you madeyour choice and embed it in your
own blog using wordpress.com as we explored last week.
Suggest a research question that can be answered by using this data
visualisation as a research environment
Send the link to me at: days@tcd.ie
Maybe: http://politicalreform.ie/2011/12/04/state-of-enda-sunday-
business-post-red-c-poll-4th-september-2011/