SlideShare uma empresa Scribd logo
1 de 84
Baixar para ler offline
Hacking Data Visualisations 
MELINDA SECKINGTON 
! 
@MSECKINGTON
@mseckington
Hacking data visualisations 
@mseckington
Why?
https://www.flickr.com/photos/laurenmanning/6632168961/
https://www.flickr.com/photos/jamjar/5491205608
“I feel that everyday, all of us now are being blasted by information design. It's 
being poured into our eyes through the Web, and we're all visualizers now; we're all 
demanding a visual aspect to our information. There's something almost quite 
magical about visual information. It's effortless, it literally pours in. And if you're 
navigating a dense information jungle, coming across a beautiful graphic or a lovely 
data visualization, it's a relief, it's like coming across a clearing in the jungle.” 
DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION 
@mseckington
THE BANDWIDTH OF OUR SENSES 
Tor Norretranders 
@mseckington
A brief history of data 
visualisations
Theatrum Orbis Terrarum 
May 20, 1570 
The first modern atlas, collected by Abraham 
Ortelis. 
! 
This was a first attempt to gather all maps 
that were known to man at the time and bind 
them together. 
A BRIEF HISTORY OF DATA VISUALISATION
https://www.flickr.com/photos/smailtronic/2361594300
A BRIEF HISTORY OF DATA VISUALISATION 
Bills of Mortality 
From 1603, London parish clerks collected health-related 
population data in order to monitor plague 
deaths, publishing the London Bills of Mortality on 
a weekly basis. 
! 
John Graunt amalgamated 50 years of information 
from the bills, producing the first known tables of 
public health data. 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION 
1644: First known graph of statistical data 
! 
MICHAEL VAN LANGREN - 
ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
A BRIEF HISTORY OF DATA VISUALISATION
A BRIEF HISTORY OF DATA VISUALISATION 
1786 first bar chart 
William Playfair 
Exports and imports of Scotland to and from 
different parts for one Year from Christmas 
1780 to Christmas 1781
A BRIEF HISTORY OF DATA VISUALISATION 
Street map of cholera deaths in Soho 
1853 John Snow 
Snow's 'ghost map' shows deaths from cholera 
around Broad Street between 19 August and 30 
September 1854. Snow simplified the street layout, 
highlighting the 13 water pumps serving the area 
and representing each death as a black bar. His 
map demonstrates how cholera was spreading, not 
by a 'miasma' rising from the Thames, but in water 
contaminated by human waste 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION 
Diagram of the Causes of Mortality 
in the Army in the East 
! 
1858 Florence Nightingale 
In her seminal ‘rose diagram’, Nightingale 
demonstrated that far more soldiers died 
from preventable epidemic diseases (blue) 
than from wounds inflicted on the 
battlefield (red) or other causes (black) 
during the Crimean War (1853-56) 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
How?
HOW? 
https://www.flickr.com/photos/jdhancock/8031897271
https://www.flickr.com/photos/laurenmanning/5658951917/
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
A quick intro to R
A QUICK INTRO TO R 
What is R? 
! 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
! 
Comes with a lot of facilities for data manipulation, calculation, data 
analysis and graphical display. 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
! 
Comes with a lot of facilities for data manipulation, calculation, data 
analysis and graphical display. 
! 
Highly and easily extensible. 
@mseckington
A QUICK INTRO TO R
! 
> data()! 
! 
list all datasets available 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
> dim(movies)! 
[1] 58788! 24! 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
> dim(movies)! 
[1] 58788! 24! 
! 
> names(movies)! 
[1] "title" “year" “length" “budget" "rating" “votes" ! 
[7] “r1" “r2" “r3" “r4" “r5" “r6"! 
[13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! 
[19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! 
@mseckington
! 
> movies[7079,]! 
! 
!! title ! ! ! ! ! year ! length budget rating votes ! 
7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! 
! 
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 
4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! 
! 
Action Animation Comedy Drama Documentary Romance Short! 
1 0 0 1 0 0 0! 
! 
returns 1 row => all the data for 1 movies 
! 
@mseckington
! 
> movies[7079,]! 
! 
!! title ! ! ! ! ! year ! length budget rating votes ! 
7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! 
! 
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 
4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! 
! 
Action Animation Comedy Drama Documentary Romance Short! 
1 0 0 1 0 0 0! 
! 
returns 1 row => all the data for 1 movies 
! 
> movies[1:10,]! 
. . . ! 
! 
returns rows 1 to 10 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
! 
> movies$title! 
. . .! 
! 
same as movies[,1]! 
returns column with the label ‘title 
! 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
! 
> movies$title! 
. . .! 
! 
same as movies[,1]! 
returns column with the label ‘title 
! 
> movies[,1:10]! 
. . .! 
! 
returns columns 1 to 10 
@mseckington
! 
> hist(movies$year) 
@mseckington
! 
> hist(movies$year) 
Histogram of movies$year 
movies$year 
Frequency 
1900 1920 1940 1960 1980 2000 
0 2000 4000 6000 8000 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating) 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating) 
Histogram of movies$rating 
movies$rating 
Frequency 
2 4 6 8 10 
0 2000 4000 6000 8000 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2) 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2)! 
! 
> qplot(rating, !! ! 
!! data=movies, ! 
!! geom="histogram") 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2)! 
! 
> qplot(rating, !! ! 
!! data=movies, ! 
!! geom=“histogram")! 
! 
> qplot(rating, !! 
!! data=movies, ! 
!! geom="histogram", 
!! binwidth=1) 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram() 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..)) 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..))! 
! 
> m + geom_histogram(! 
! ! ! colour = "darkgreen", ! 
! ! ! fill = "white", ! 
! ! ! binwidth = 0.5)! 
! 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..))! 
! 
> m + geom_histogram(! 
! ! ! colour = "darkgreen", ! 
! ! ! fill = "white", ! 
! ! ! binwidth = 0.5)! 
! 
> x = m + geom_histogram(! 
! ! ! ! binwidth = 0.5)! 
> x + facet_grid(Action ~ Comedy)! 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
@mseckington
FUTURELEARN STATS
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
@mseckington
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
> source_table = table(fl$age)! 
> pie(source_table) 
@mseckington
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
> source_table = table(fl$age)! 
> pie(source_table)! 
! 
> pie(source_table, ! 
! ! radius=0.6, ! 
! ! col=rainbow(8)) 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100) 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100)! 
! 
> library(“tm”)! 
! 
> tweet_text <- sapply(tweets, function(x) x$getText())! 
> tweet_corpus <- Corpus(VectorSource(tweet_text))! 
! 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100)! 
! 
> library(“tm”)! 
! 
> tweet_text <- sapply(tweets, function(x) x$getText())! 
> tweet_corpus <- Corpus(VectorSource(tweet_text))! 
! 
> tweet_corpus <- tm_map(tweet_corpus, !! 
! ! ! ! ! ! ! ! ! content_transformer(tolower))! 
> tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! 
> tweet_corpus <- tm_map(tweet_corpus, !! ! 
! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))
! 
> library(wordcloud)! 
! 
> wordcloud(tweet_corpus) 
@mseckington
! 
> library(wordcloud)! 
! 
> wordcloud(tweet_corpus) 
@mseckington
What next?
A QUICK INTRO TO R
A QUICK INTRO TO R
WHAT NEXT? 
@mseckington
https://www.flickr.com/photos/jamjar/5491205608
@mseckington
Recap
Data visualisations 
are awesome 
@mseckington
R is awesome 
@mseckington
Any questions? 
! 
@mseckington

Mais conteúdo relacionado

Semelhante a Hacking data visualisations

DIBI Conference: Visualising Data
DIBI Conference: Visualising DataDIBI Conference: Visualising Data
DIBI Conference: Visualising Data
briansuda
 
James Davenport NerdNite 2013
James Davenport NerdNite 2013James Davenport NerdNite 2013
James Davenport NerdNite 2013
James Davenport
 

Semelhante a Hacking data visualisations (12)

GIS and Google Earth In Geography
GIS and Google Earth In GeographyGIS and Google Earth In Geography
GIS and Google Earth In Geography
 
DIBI Conference: Visualising Data
DIBI Conference: Visualising DataDIBI Conference: Visualising Data
DIBI Conference: Visualising Data
 
Curation Nation
Curation Nation Curation Nation
Curation Nation
 
The Daily Grind - Milling Stories to Reduce Risk
The Daily Grind - Milling Stories to Reduce RiskThe Daily Grind - Milling Stories to Reduce Risk
The Daily Grind - Milling Stories to Reduce Risk
 
Rockford Area Convention & Visitors Bureau
Rockford Area Convention & Visitors BureauRockford Area Convention & Visitors Bureau
Rockford Area Convention & Visitors Bureau
 
James Davenport NerdNite 2013
James Davenport NerdNite 2013James Davenport NerdNite 2013
James Davenport NerdNite 2013
 
Who was william shakespeare ?
Who was william shakespeare ?Who was william shakespeare ?
Who was william shakespeare ?
 
Vale rda presentation
Vale rda presentationVale rda presentation
Vale rda presentation
 
Conversation as a platform
Conversation as a platformConversation as a platform
Conversation as a platform
 
ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9
 
2015 MUSE Awards presentation
2015 MUSE Awards presentation2015 MUSE Awards presentation
2015 MUSE Awards presentation
 
Al Fazl International - 25 December 2015 Weekly UK
Al Fazl International - 25 December 2015 Weekly UKAl Fazl International - 25 December 2015 Weekly UK
Al Fazl International - 25 December 2015 Weekly UK
 

Mais de Melinda Seckington

Mais de Melinda Seckington (13)

Why I love escape rooms
Why I love escape roomsWhy I love escape rooms
Why I love escape rooms
 
What I Learnt From Event Organising
What I Learnt From Event OrganisingWhat I Learnt From Event Organising
What I Learnt From Event Organising
 
How to succeed at hiring without really trying
How to succeed at hiring without really tryingHow to succeed at hiring without really trying
How to succeed at hiring without really trying
 
How and Why We Run Internal Hackdays
How and Why We Run Internal HackdaysHow and Why We Run Internal Hackdays
How and Why We Run Internal Hackdays
 
Marvel Guide For Developers
Marvel Guide For DevelopersMarvel Guide For Developers
Marvel Guide For Developers
 
Learn Reflect Repeat
Learn Reflect RepeatLearn Reflect Repeat
Learn Reflect Repeat
 
Un-artificial intelligence
Un-artificial intelligenceUn-artificial intelligence
Un-artificial intelligence
 
Being Miss Geeky - WIT
Being Miss Geeky - WITBeing Miss Geeky - WIT
Being Miss Geeky - WIT
 
Being a Social Introvert
Being a Social IntrovertBeing a Social Introvert
Being a Social Introvert
 
Movie aspect ratios
Movie aspect ratiosMovie aspect ratios
Movie aspect ratios
 
Gadgets
GadgetsGadgets
Gadgets
 
DDD: Disney Driven Development
DDD: Disney Driven DevelopmentDDD: Disney Driven Development
DDD: Disney Driven Development
 
#AgileHack - Mr and Mrs Geeky
#AgileHack - Mr and Mrs Geeky#AgileHack - Mr and Mrs Geeky
#AgileHack - Mr and Mrs Geeky
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Hacking data visualisations

  • 1. Hacking Data Visualisations MELINDA SECKINGTON ! @MSECKINGTON
  • 3.
  • 4.
  • 5.
  • 10. “I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.” DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION @mseckington
  • 11. THE BANDWIDTH OF OUR SENSES Tor Norretranders @mseckington
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. A brief history of data visualisations
  • 17. Theatrum Orbis Terrarum May 20, 1570 The first modern atlas, collected by Abraham Ortelis. ! This was a first attempt to gather all maps that were known to man at the time and bind them together. A BRIEF HISTORY OF DATA VISUALISATION
  • 19. A BRIEF HISTORY OF DATA VISUALISATION Bills of Mortality From 1603, London parish clerks collected health-related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. ! John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data. BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 20. A BRIEF HISTORY OF DATA VISUALISATION 1644: First known graph of statistical data ! MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
  • 21. A BRIEF HISTORY OF DATA VISUALISATION
  • 22. A BRIEF HISTORY OF DATA VISUALISATION 1786 first bar chart William Playfair Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781
  • 23. A BRIEF HISTORY OF DATA VISUALISATION Street map of cholera deaths in Soho 1853 John Snow Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 24. A BRIEF HISTORY OF DATA VISUALISATION Diagram of the Causes of Mortality in the Army in the East ! 1858 Florence Nightingale In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56) BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 25. How?
  • 34. A QUICK INTRO TO R What is R? ! @mseckington
  • 35. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! @mseckington
  • 36. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. @mseckington
  • 37. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. @mseckington
  • 38. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. ! Highly and easily extensible. @mseckington
  • 40.
  • 41. ! > data()! ! list all datasets available ! @mseckington
  • 42. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! @mseckington
  • 43. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! @mseckington
  • 44. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! > names(movies)! [1] "title" “year" “length" “budget" "rating" “votes" ! [7] “r1" “r2" “r3" “r4" “r5" “r6"! [13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! [19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! @mseckington
  • 45. ! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! @mseckington
  • 46. ! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! > movies[1:10,]! . . . ! ! returns rows 1 to 10 @mseckington
  • 47. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies @mseckington
  • 48. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! @mseckington
  • 49. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! > movies[,1:10]! . . .! ! returns columns 1 to 10 @mseckington
  • 50. ! > hist(movies$year) @mseckington
  • 51. ! > hist(movies$year) Histogram of movies$year movies$year Frequency 1900 1920 1940 1960 1980 2000 0 2000 4000 6000 8000 @mseckington
  • 52. ! > hist(movies$year)! ! > hist(movies$rating) @mseckington
  • 53. ! > hist(movies$year)! ! > hist(movies$rating) Histogram of movies$rating movies$rating Frequency 2 4 6 8 10 0 2000 4000 6000 8000 @mseckington
  • 54. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2) @mseckington
  • 55. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, !! ! !! data=movies, ! !! geom="histogram") @mseckington
  • 56. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, !! ! !! data=movies, ! !! geom=“histogram")! ! > qplot(rating, !! !! data=movies, ! !! geom="histogram", !! binwidth=1) @mseckington
  • 57. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram() @mseckington
  • 58. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..)) @mseckington
  • 59. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! @mseckington
  • 60. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! > x = m + geom_histogram(! ! ! ! ! binwidth = 0.5)! > x + facet_grid(Action ~ Comedy)! @mseckington
  • 61. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! @mseckington
  • 63. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! @mseckington
  • 64. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table) @mseckington
  • 65. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table)! ! > pie(source_table, ! ! ! radius=0.6, ! ! ! col=rainbow(8)) @mseckington
  • 66.
  • 67. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100) @mseckington
  • 68.
  • 69. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! @mseckington
  • 70. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! > tweet_corpus <- tm_map(tweet_corpus, !! ! ! ! ! ! ! ! ! ! content_transformer(tolower))! > tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! > tweet_corpus <- tm_map(tweet_corpus, !! ! ! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))
  • 71.
  • 72. ! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington
  • 73. ! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington
  • 77.
  • 81. Recap
  • 82. Data visualisations are awesome @mseckington
  • 83. R is awesome @mseckington
  • 84. Any questions? ! @mseckington