SlideShare a Scribd company logo
1 of 84
Download to read offline
Hacking Data Visualisations 
MELINDA SECKINGTON 
! 
@MSECKINGTON
@mseckington
Hacking data visualisations 
@mseckington
Why?
https://www.flickr.com/photos/laurenmanning/6632168961/
https://www.flickr.com/photos/jamjar/5491205608
“I feel that everyday, all of us now are being blasted by information design. It's 
being poured into our eyes through the Web, and we're all visualizers now; we're all 
demanding a visual aspect to our information. There's something almost quite 
magical about visual information. It's effortless, it literally pours in. And if you're 
navigating a dense information jungle, coming across a beautiful graphic or a lovely 
data visualization, it's a relief, it's like coming across a clearing in the jungle.” 
DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION 
@mseckington
THE BANDWIDTH OF OUR SENSES 
Tor Norretranders 
@mseckington
A brief history of data 
visualisations
Theatrum Orbis Terrarum 
May 20, 1570 
The first modern atlas, collected by Abraham 
Ortelis. 
! 
This was a first attempt to gather all maps 
that were known to man at the time and bind 
them together. 
A BRIEF HISTORY OF DATA VISUALISATION
https://www.flickr.com/photos/smailtronic/2361594300
A BRIEF HISTORY OF DATA VISUALISATION 
Bills of Mortality 
From 1603, London parish clerks collected health-related 
population data in order to monitor plague 
deaths, publishing the London Bills of Mortality on 
a weekly basis. 
! 
John Graunt amalgamated 50 years of information 
from the bills, producing the first known tables of 
public health data. 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION 
1644: First known graph of statistical data 
! 
MICHAEL VAN LANGREN - 
ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
A BRIEF HISTORY OF DATA VISUALISATION
A BRIEF HISTORY OF DATA VISUALISATION 
1786 first bar chart 
William Playfair 
Exports and imports of Scotland to and from 
different parts for one Year from Christmas 
1780 to Christmas 1781
A BRIEF HISTORY OF DATA VISUALISATION 
Street map of cholera deaths in Soho 
1853 John Snow 
Snow's 'ghost map' shows deaths from cholera 
around Broad Street between 19 August and 30 
September 1854. Snow simplified the street layout, 
highlighting the 13 water pumps serving the area 
and representing each death as a black bar. His 
map demonstrates how cholera was spreading, not 
by a 'miasma' rising from the Thames, but in water 
contaminated by human waste 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION 
Diagram of the Causes of Mortality 
in the Army in the East 
! 
1858 Florence Nightingale 
In her seminal ‘rose diagram’, Nightingale 
demonstrated that far more soldiers died 
from preventable epidemic diseases (blue) 
than from wounds inflicted on the 
battlefield (red) or other causes (black) 
during the Crimean War (1853-56) 
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - 
THE GUARDIAN
How?
HOW? 
https://www.flickr.com/photos/jdhancock/8031897271
https://www.flickr.com/photos/laurenmanning/5658951917/
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
HOW? 
@mseckington
A quick intro to R
A QUICK INTRO TO R 
What is R? 
! 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
! 
Comes with a lot of facilities for data manipulation, calculation, data 
analysis and graphical display. 
@mseckington
A QUICK INTRO TO R 
What is R? 
! 
R is a free programming language and environment for statistical 
computing and graphics. 
! 
Created by statisticians for statisticians. 
! 
Comes with a lot of facilities for data manipulation, calculation, data 
analysis and graphical display. 
! 
Highly and easily extensible. 
@mseckington
A QUICK INTRO TO R
! 
> data()! 
! 
list all datasets available 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
> dim(movies)! 
[1] 58788! 24! 
! 
@mseckington
! 
> data()! 
! 
list all datasets available 
! 
> movies = data(movies)! 
> movies <- data(movies)! 
! 
assign movies data to movies variable 
! 
> dim(movies)! 
[1] 58788! 24! 
! 
> names(movies)! 
[1] "title" “year" “length" “budget" "rating" “votes" ! 
[7] “r1" “r2" “r3" “r4" “r5" “r6"! 
[13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! 
[19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! 
@mseckington
! 
> movies[7079,]! 
! 
!! title ! ! ! ! ! year ! length budget rating votes ! 
7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! 
! 
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 
4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! 
! 
Action Animation Comedy Drama Documentary Romance Short! 
1 0 0 1 0 0 0! 
! 
returns 1 row => all the data for 1 movies 
! 
@mseckington
! 
> movies[7079,]! 
! 
!! title ! ! ! ! ! year ! length budget rating votes ! 
7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! 
! 
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 
4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! 
! 
Action Animation Comedy Drama Documentary Romance Short! 
1 0 0 1 0 0 0! 
! 
returns 1 row => all the data for 1 movies 
! 
> movies[1:10,]! 
. . . ! 
! 
returns rows 1 to 10 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
! 
> movies$title! 
. . .! 
! 
same as movies[,1]! 
returns column with the label ‘title 
! 
@mseckington
! 
> movies[,1]! 
. . .! 
! 
returns 1 column => titles of all movies 
! 
> movies$title! 
. . .! 
! 
same as movies[,1]! 
returns column with the label ‘title 
! 
> movies[,1:10]! 
. . .! 
! 
returns columns 1 to 10 
@mseckington
! 
> hist(movies$year) 
@mseckington
! 
> hist(movies$year) 
Histogram of movies$year 
movies$year 
Frequency 
1900 1920 1940 1960 1980 2000 
0 2000 4000 6000 8000 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating) 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating) 
Histogram of movies$rating 
movies$rating 
Frequency 
2 4 6 8 10 
0 2000 4000 6000 8000 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2) 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2)! 
! 
> qplot(rating, !! ! 
!! data=movies, ! 
!! geom="histogram") 
@mseckington
! 
> hist(movies$year)! 
! 
> hist(movies$rating)! 
! 
> library(ggplot2)! 
! 
> qplot(rating, !! ! 
!! data=movies, ! 
!! geom=“histogram")! 
! 
> qplot(rating, !! 
!! data=movies, ! 
!! geom="histogram", 
!! binwidth=1) 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram() 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..)) 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..))! 
! 
> m + geom_histogram(! 
! ! ! colour = "darkgreen", ! 
! ! ! fill = "white", ! 
! ! ! binwidth = 0.5)! 
! 
@mseckington
! 
> m = ggplot(movies, aes(rating))! 
! 
> m + geom_histogram()! 
! 
> m + geom_histogram(! 
! ! ! aes(fill = ..count..))! 
! 
> m + geom_histogram(! 
! ! ! colour = "darkgreen", ! 
! ! ! fill = "white", ! 
! ! ! binwidth = 0.5)! 
! 
> x = m + geom_histogram(! 
! ! ! ! binwidth = 0.5)! 
> x + facet_grid(Action ~ Comedy)! 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
@mseckington
FUTURELEARN STATS
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
@mseckington
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
> source_table = table(fl$age)! 
> pie(source_table) 
@mseckington
! 
> fl = read.csv(! 
! ! "futurelearn_dataset.csv", 
! ! header=TRUE)! 
! 
> source_table = table(fl$age)! 
> pie(source_table)! 
! 
> pie(source_table, ! 
! ! radius=0.6, ! 
! ! col=rainbow(8)) 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100) 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100)! 
! 
> library(“tm”)! 
! 
> tweet_text <- sapply(tweets, function(x) x$getText())! 
> tweet_corpus <- Corpus(VectorSource(tweet_text))! 
! 
@mseckington
! 
> library(twitteR)! 
! 
> setup_twitter_oauth(! 
! ! "API key”, "API secret", "Access token", "Access secret”)! 
! 
> tweets <- searchTwitter('futurelearn', n=100)! 
! 
> library(“tm”)! 
! 
> tweet_text <- sapply(tweets, function(x) x$getText())! 
> tweet_corpus <- Corpus(VectorSource(tweet_text))! 
! 
> tweet_corpus <- tm_map(tweet_corpus, !! 
! ! ! ! ! ! ! ! ! content_transformer(tolower))! 
> tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! 
> tweet_corpus <- tm_map(tweet_corpus, !! ! 
! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))
! 
> library(wordcloud)! 
! 
> wordcloud(tweet_corpus) 
@mseckington
! 
> library(wordcloud)! 
! 
> wordcloud(tweet_corpus) 
@mseckington
What next?
A QUICK INTRO TO R
A QUICK INTRO TO R
WHAT NEXT? 
@mseckington
https://www.flickr.com/photos/jamjar/5491205608
@mseckington
Recap
Data visualisations 
are awesome 
@mseckington
R is awesome 
@mseckington
Any questions? 
! 
@mseckington

More Related Content

Similar to Hacking data visualisations

DIBI Conference: Visualising Data
DIBI Conference: Visualising DataDIBI Conference: Visualising Data
DIBI Conference: Visualising Data
briansuda
 
James Davenport NerdNite 2013
James Davenport NerdNite 2013James Davenport NerdNite 2013
James Davenport NerdNite 2013
James Davenport
 

Similar to Hacking data visualisations (12)

GIS and Google Earth In Geography
GIS and Google Earth In GeographyGIS and Google Earth In Geography
GIS and Google Earth In Geography
 
DIBI Conference: Visualising Data
DIBI Conference: Visualising DataDIBI Conference: Visualising Data
DIBI Conference: Visualising Data
 
Curation Nation
Curation Nation Curation Nation
Curation Nation
 
The Daily Grind - Milling Stories to Reduce Risk
The Daily Grind - Milling Stories to Reduce RiskThe Daily Grind - Milling Stories to Reduce Risk
The Daily Grind - Milling Stories to Reduce Risk
 
Rockford Area Convention & Visitors Bureau
Rockford Area Convention & Visitors BureauRockford Area Convention & Visitors Bureau
Rockford Area Convention & Visitors Bureau
 
James Davenport NerdNite 2013
James Davenport NerdNite 2013James Davenport NerdNite 2013
James Davenport NerdNite 2013
 
Who was william shakespeare ?
Who was william shakespeare ?Who was william shakespeare ?
Who was william shakespeare ?
 
Vale rda presentation
Vale rda presentationVale rda presentation
Vale rda presentation
 
Conversation as a platform
Conversation as a platformConversation as a platform
Conversation as a platform
 
ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9
 
2015 MUSE Awards presentation
2015 MUSE Awards presentation2015 MUSE Awards presentation
2015 MUSE Awards presentation
 
Al Fazl International - 25 December 2015 Weekly UK
Al Fazl International - 25 December 2015 Weekly UKAl Fazl International - 25 December 2015 Weekly UK
Al Fazl International - 25 December 2015 Weekly UK
 

More from Melinda Seckington

More from Melinda Seckington (13)

Why I love escape rooms
Why I love escape roomsWhy I love escape rooms
Why I love escape rooms
 
What I Learnt From Event Organising
What I Learnt From Event OrganisingWhat I Learnt From Event Organising
What I Learnt From Event Organising
 
How to succeed at hiring without really trying
How to succeed at hiring without really tryingHow to succeed at hiring without really trying
How to succeed at hiring without really trying
 
How and Why We Run Internal Hackdays
How and Why We Run Internal HackdaysHow and Why We Run Internal Hackdays
How and Why We Run Internal Hackdays
 
Marvel Guide For Developers
Marvel Guide For DevelopersMarvel Guide For Developers
Marvel Guide For Developers
 
Learn Reflect Repeat
Learn Reflect RepeatLearn Reflect Repeat
Learn Reflect Repeat
 
Un-artificial intelligence
Un-artificial intelligenceUn-artificial intelligence
Un-artificial intelligence
 
Being Miss Geeky - WIT
Being Miss Geeky - WITBeing Miss Geeky - WIT
Being Miss Geeky - WIT
 
Being a Social Introvert
Being a Social IntrovertBeing a Social Introvert
Being a Social Introvert
 
Movie aspect ratios
Movie aspect ratiosMovie aspect ratios
Movie aspect ratios
 
Gadgets
GadgetsGadgets
Gadgets
 
DDD: Disney Driven Development
DDD: Disney Driven DevelopmentDDD: Disney Driven Development
DDD: Disney Driven Development
 
#AgileHack - Mr and Mrs Geeky
#AgileHack - Mr and Mrs Geeky#AgileHack - Mr and Mrs Geeky
#AgileHack - Mr and Mrs Geeky
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 

Hacking data visualisations

  • 1. Hacking Data Visualisations MELINDA SECKINGTON ! @MSECKINGTON
  • 3.
  • 4.
  • 5.
  • 10. “I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.” DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION @mseckington
  • 11. THE BANDWIDTH OF OUR SENSES Tor Norretranders @mseckington
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. A brief history of data visualisations
  • 17. Theatrum Orbis Terrarum May 20, 1570 The first modern atlas, collected by Abraham Ortelis. ! This was a first attempt to gather all maps that were known to man at the time and bind them together. A BRIEF HISTORY OF DATA VISUALISATION
  • 19. A BRIEF HISTORY OF DATA VISUALISATION Bills of Mortality From 1603, London parish clerks collected health-related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. ! John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data. BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 20. A BRIEF HISTORY OF DATA VISUALISATION 1644: First known graph of statistical data ! MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
  • 21. A BRIEF HISTORY OF DATA VISUALISATION
  • 22. A BRIEF HISTORY OF DATA VISUALISATION 1786 first bar chart William Playfair Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781
  • 23. A BRIEF HISTORY OF DATA VISUALISATION Street map of cholera deaths in Soho 1853 John Snow Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 24. A BRIEF HISTORY OF DATA VISUALISATION Diagram of the Causes of Mortality in the Army in the East ! 1858 Florence Nightingale In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56) BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  • 25. How?
  • 34. A QUICK INTRO TO R What is R? ! @mseckington
  • 35. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! @mseckington
  • 36. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. @mseckington
  • 37. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. @mseckington
  • 38. A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. ! Highly and easily extensible. @mseckington
  • 40.
  • 41. ! > data()! ! list all datasets available ! @mseckington
  • 42. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! @mseckington
  • 43. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! @mseckington
  • 44. ! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! > names(movies)! [1] "title" “year" “length" “budget" "rating" “votes" ! [7] “r1" “r2" “r3" “r4" “r5" “r6"! [13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! [19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! @mseckington
  • 45. ! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! @mseckington
  • 46. ! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! > movies[1:10,]! . . . ! ! returns rows 1 to 10 @mseckington
  • 47. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies @mseckington
  • 48. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! @mseckington
  • 49. ! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! > movies[,1:10]! . . .! ! returns columns 1 to 10 @mseckington
  • 50. ! > hist(movies$year) @mseckington
  • 51. ! > hist(movies$year) Histogram of movies$year movies$year Frequency 1900 1920 1940 1960 1980 2000 0 2000 4000 6000 8000 @mseckington
  • 52. ! > hist(movies$year)! ! > hist(movies$rating) @mseckington
  • 53. ! > hist(movies$year)! ! > hist(movies$rating) Histogram of movies$rating movies$rating Frequency 2 4 6 8 10 0 2000 4000 6000 8000 @mseckington
  • 54. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2) @mseckington
  • 55. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, !! ! !! data=movies, ! !! geom="histogram") @mseckington
  • 56. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, !! ! !! data=movies, ! !! geom=“histogram")! ! > qplot(rating, !! !! data=movies, ! !! geom="histogram", !! binwidth=1) @mseckington
  • 57. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram() @mseckington
  • 58. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..)) @mseckington
  • 59. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! @mseckington
  • 60. ! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! > x = m + geom_histogram(! ! ! ! ! binwidth = 0.5)! > x + facet_grid(Action ~ Comedy)! @mseckington
  • 61. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! @mseckington
  • 63. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! @mseckington
  • 64. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table) @mseckington
  • 65. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table)! ! > pie(source_table, ! ! ! radius=0.6, ! ! ! col=rainbow(8)) @mseckington
  • 66.
  • 67. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100) @mseckington
  • 68.
  • 69. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! @mseckington
  • 70. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! > tweet_corpus <- tm_map(tweet_corpus, !! ! ! ! ! ! ! ! ! ! content_transformer(tolower))! > tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! > tweet_corpus <- tm_map(tweet_corpus, !! ! ! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))
  • 71.
  • 72. ! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington
  • 73. ! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington
  • 77.
  • 81. Recap
  • 82. Data visualisations are awesome @mseckington
  • 83. R is awesome @mseckington
  • 84. Any questions? ! @mseckington