SlideShare uma empresa Scribd logo
1 de 49
A Survey of R Graphics June 18 2009 R Users Group of LA Michael E. Driscoll Principal, Dataspora mike@dataspora.com
“The sexy job in the next ten years will be statisticians…” - Hal Varian
Hypothesis (from Jessica Hagy’s  thisisindexed.com)
Munge & Model gdp <- read.csv('gdp.csv')hours <- read.csv('hours.csv')gdp.hours <- merge(hours,gdp)gdp.hours$freetime <- 4380 - gdp.hours$hoursattach(gdp.hours)plot(freetime ~ gdp)m <- lm(freetime ~ gdp,data=gdp.hours)abline(m,col=3,lw=2)pm <- loess(freetime ~ gdp)lines(spline(gdp,fitted(pm)))
Visualization library(ggplot2) qplot(gdp,freetime, data=gdp.hours, geom=c("point",   "smooth"), span=1)
basic graphics
R’s Two Graphics Systems
plot() graphs objects plot(freetime ~ gdp,    data=gdp.hours) model <- lm(freetime ~ gdp,   data=gdp.hours) abline(model)
plot() graphs objects abline(model, col="red", lwd=3)
par sets graphical parameters par(pch=20,  cex=5, col="#5050a0BB") RGB hex alpha blending! plot(freetime ~ gdp, data=gdp.hours) help(par)
par sets graphical parameters parameters for par() pch col adj srt pt.cex graphing functions points() text() xlab() legend()
Paneling Graphics By setting one parameter in particular, mfrow, we can partition the graphics display to give us a multiple framework in which to panel our plots, rowwise. par(mfrow= c( nrow, ncol)) Number of rows Number of columns
Paneling Graphics par(mfrow=c(2,2)) hist(D$wg, main='Histogram',xlab='Weight Gain', ylab ='Frequency', col=heat.colors(14)) boxplot(wg.7a$wg, wg.8a$wg, wg.9a$wg, wg.10a$wg, wg.11a$wg, wg.12p$wg,  main='Weight Gain', ylab='Weight Gain (lbs)', xlab='Shift', names = c('7am','8am','9am','10am','11am','12pm')) plot(D$metmin,D$wg,main='Met Minutes vs. Weight Gain', xlab='Mets (min)',ylab='Weight Gain (lbs)',pch=2) plot(t1,D2$Intel,type="l",main='Closing Stock    Prices',xlab='Time',ylab='Price $') lines(t1,D2$DELL,lty=2)
Paneling Graphics
Working with Graphics Devices Starting up a new graphic X11 window x11() To write graphics to a file, open a device, write to it, close. pdf(“mygraphic.pdf”,width=7,height=7)  plot(x) dev.off() In Linux, the package “Cairo “ is recommended for a device that renders high-quality vector and raster images (alpha blending!).  The command would read Cairo(“mygraphic.pdf”, … Common gotcha:  under non-interactive sessions, you should explicitly invoke a print command to send a plot object to an open device.  For example    print(plot(x))
library(ggplot2)
ggplot2 =grammar of graphics
ggplot2 =grammar ofgraphics
Visualizing 50,000 Diamonds with ggplot2
qplot(carat, price, data = diamonds)
qplot(log(carat), log(price), data = diamonds) qplot(carat, price, log=“xy”, data = diamonds) OR
qplot(log(carat), log(price), data = diamonds,  alpha = I(1/20))
qplot(log(carat), log(price), data = diamonds,  alpha = I(1/20), colour=color)
Achieving small multiples with “facets” qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)
old new qplot(color, price/carat,  data = diamonds, alpha = I(1/20), geom=“jitter”) qplot(color, price/carat,  data = diamonds, geom=“boxplot”)
library(lattice)
lattice = trellis (source: http://lmdvr.r-forge.r-project.org )
visualizing six dimensions of MLB pitches with lattice
xyplot(x ~ y, data=pitch)
xyplot(x ~ y, groups=type, data=pitch)
xyplot(x ~ y | type, data=pitch)
xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) {   fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })
xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) {   fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
A Story of Two Pitchers Hamels Webb
list of latticefunctions densityplot(~ speed | type, data=pitch)
plotting big data
xyplotwith 1m points = Bad Idea Jeans xyplot(log(price)~log(carat),data=diamonds)
efficient plotting with hexbinplot hexbinplot(log(price)~log(carat),data=diamonds,xbins=40)
100 thousand  gene measures
efficient plotting with geneplotter
beautiful colors with Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
R-->web
LinuxApacheMySQLR http://labs.dataspora.com/gameday
Configuring rapache Hello world script setContentType("text/html") png("/var/www/hello.png") plot(sample(100,100),col=1:8,pch=19) dev.off() cat("<html>") cat("<body>") cat("<h1>hello world</h1>") cat('<imgsrc="../hello.png"') cat("</body>") cat("</html>")
Data Visualization References ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham http://had.co.nz/ggplot2 Lattice:  Multivariate Data Visualization with R by DeepayanSarkar http://lmdvr.r-forge.r-project.org/

Mais conteúdo relacionado

Mais procurados

構文や語彙意味論の分析成果をプログラムとして具現化する言語 パターンマッチAPIの可能性
構文や語彙意味論の分析成果をプログラムとして具現化する言語パターンマッチAPIの可能性構文や語彙意味論の分析成果をプログラムとして具現化する言語パターンマッチAPIの可能性
構文や語彙意味論の分析成果をプログラムとして具現化する言語 パターンマッチAPIの可能性kktctk
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...PingCAP
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programmingchriseidhof
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plottingNimrita Koul
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
In The Land Of Graphs...
In The Land Of Graphs...In The Land Of Graphs...
In The Land Of Graphs...Fernand Galiana
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Eventsztellman
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展Deep Learning JP
 
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and ScaleFrom Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and ScaleBadrish Chandramouli
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceYasuo Tabei
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Till Blume
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 

Mais procurados (18)

Clojure night
Clojure nightClojure night
Clojure night
 
構文や語彙意味論の分析成果をプログラムとして具現化する言語 パターンマッチAPIの可能性
構文や語彙意味論の分析成果をプログラムとして具現化する言語パターンマッチAPIの可能性構文や語彙意味論の分析成果をプログラムとして具現化する言語パターンマッチAPIの可能性
構文や語彙意味論の分析成果をプログラムとして具現化する言語 パターンマッチAPIの可能性
 
Recognize Godzilla
Recognize GodzillaRecognize Godzilla
Recognize Godzilla
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
In The Land Of Graphs...
In The Land Of Graphs...In The Land Of Graphs...
In The Land Of Graphs...
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Events
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展
 
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and ScaleFrom Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
MATHS SYMBOLS - #8 - LOGARITHMS, CHANGE of BASE - PROOFS
MATHS SYMBOLS - #8 - LOGARITHMS, CHANGE of BASE - PROOFSMATHS SYMBOLS - #8 - LOGARITHMS, CHANGE of BASE - PROOFS
MATHS SYMBOLS - #8 - LOGARITHMS, CHANGE of BASE - PROOFS
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
M/DB and M/DB:X
M/DB and M/DB:XM/DB and M/DB:X
M/DB and M/DB:X
 

Destaque

Powerpoint quality statistics_nzingoula
Powerpoint quality statistics_nzingoulaPowerpoint quality statistics_nzingoula
Powerpoint quality statistics_nzingoulaGildas Nzingoula
 
Why ADaM for a statistician?
Why ADaM for a statistician?Why ADaM for a statistician?
Why ADaM for a statistician?Kevin Lee
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 

Destaque (7)

Linux install
Linux installLinux install
Linux install
 
Powerpoint quality statistics_nzingoula
Powerpoint quality statistics_nzingoulaPowerpoint quality statistics_nzingoula
Powerpoint quality statistics_nzingoula
 
Realtime r
Realtime rRealtime r
Realtime r
 
Why ADaM for a statistician?
Why ADaM for a statistician?Why ADaM for a statistician?
Why ADaM for a statistician?
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 

Semelhante a A Survey of R Graphics Techniques

A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R GraphicsDataspora
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogrammingdudarev
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in RIlya Zhbannikov
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...SWAROOP KUMAR K
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2jalle6
 
ggplot2できれいなグラフ
ggplot2できれいなグラフggplot2できれいなグラフ
ggplot2できれいなグラフDaisuke Ichikawa
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.pptMalkaParveen3
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationExploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationWesley Goi
 
Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%David LeBauer
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScriptMark
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package ExamplesDr. Volkan OBAN
 

Semelhante a A Survey of R Graphics Techniques (20)

A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogramming
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
ggplot2できれいなグラフ
ggplot2できれいなグラフggplot2できれいなグラフ
ggplot2できれいなグラフ
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.ppt
 
Scala @ TomTom
Scala @ TomTomScala @ TomTom
Scala @ TomTom
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationExploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience Specialisation
 
Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 
A Shiny Example-- R
A Shiny Example-- RA Shiny Example-- R
A Shiny Example-- R
 
Joclad 2010 d
Joclad 2010 dJoclad 2010 d
Joclad 2010 d
 
Rug hogan-10-03-2012
Rug hogan-10-03-2012Rug hogan-10-03-2012
Rug hogan-10-03-2012
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package Examples
 
Scala en
Scala enScala en
Scala en
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

A Survey of R Graphics Techniques

  • 1. A Survey of R Graphics June 18 2009 R Users Group of LA Michael E. Driscoll Principal, Dataspora mike@dataspora.com
  • 2. “The sexy job in the next ten years will be statisticians…” - Hal Varian
  • 3.
  • 4. Hypothesis (from Jessica Hagy’s thisisindexed.com)
  • 5. Munge & Model gdp <- read.csv('gdp.csv')hours <- read.csv('hours.csv')gdp.hours <- merge(hours,gdp)gdp.hours$freetime <- 4380 - gdp.hours$hoursattach(gdp.hours)plot(freetime ~ gdp)m <- lm(freetime ~ gdp,data=gdp.hours)abline(m,col=3,lw=2)pm <- loess(freetime ~ gdp)lines(spline(gdp,fitted(pm)))
  • 6. Visualization library(ggplot2) qplot(gdp,freetime, data=gdp.hours, geom=c("point", "smooth"), span=1)
  • 9. plot() graphs objects plot(freetime ~ gdp, data=gdp.hours) model <- lm(freetime ~ gdp, data=gdp.hours) abline(model)
  • 10. plot() graphs objects abline(model, col="red", lwd=3)
  • 11. par sets graphical parameters par(pch=20, cex=5, col="#5050a0BB") RGB hex alpha blending! plot(freetime ~ gdp, data=gdp.hours) help(par)
  • 12. par sets graphical parameters parameters for par() pch col adj srt pt.cex graphing functions points() text() xlab() legend()
  • 13. Paneling Graphics By setting one parameter in particular, mfrow, we can partition the graphics display to give us a multiple framework in which to panel our plots, rowwise. par(mfrow= c( nrow, ncol)) Number of rows Number of columns
  • 14. Paneling Graphics par(mfrow=c(2,2)) hist(D$wg, main='Histogram',xlab='Weight Gain', ylab ='Frequency', col=heat.colors(14)) boxplot(wg.7a$wg, wg.8a$wg, wg.9a$wg, wg.10a$wg, wg.11a$wg, wg.12p$wg, main='Weight Gain', ylab='Weight Gain (lbs)', xlab='Shift', names = c('7am','8am','9am','10am','11am','12pm')) plot(D$metmin,D$wg,main='Met Minutes vs. Weight Gain', xlab='Mets (min)',ylab='Weight Gain (lbs)',pch=2) plot(t1,D2$Intel,type="l",main='Closing Stock Prices',xlab='Time',ylab='Price $') lines(t1,D2$DELL,lty=2)
  • 16. Working with Graphics Devices Starting up a new graphic X11 window x11() To write graphics to a file, open a device, write to it, close. pdf(“mygraphic.pdf”,width=7,height=7) plot(x) dev.off() In Linux, the package “Cairo “ is recommended for a device that renders high-quality vector and raster images (alpha blending!). The command would read Cairo(“mygraphic.pdf”, … Common gotcha: under non-interactive sessions, you should explicitly invoke a print command to send a plot object to an open device. For example print(plot(x))
  • 22. qplot(log(carat), log(price), data = diamonds) qplot(carat, price, log=“xy”, data = diamonds) OR
  • 23. qplot(log(carat), log(price), data = diamonds, alpha = I(1/20))
  • 24. qplot(log(carat), log(price), data = diamonds, alpha = I(1/20), colour=color)
  • 25. Achieving small multiples with “facets” qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)
  • 26. old new qplot(color, price/carat, data = diamonds, alpha = I(1/20), geom=“jitter”) qplot(color, price/carat, data = diamonds, geom=“boxplot”)
  • 27.
  • 29. lattice = trellis (source: http://lmdvr.r-forge.r-project.org )
  • 30. visualizing six dimensions of MLB pitches with lattice
  • 31. xyplot(x ~ y, data=pitch)
  • 32. xyplot(x ~ y, groups=type, data=pitch)
  • 33. xyplot(x ~ y | type, data=pitch)
  • 34. xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })
  • 35. xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
  • 36. A Story of Two Pitchers Hamels Webb
  • 37. list of latticefunctions densityplot(~ speed | type, data=pitch)
  • 39. xyplotwith 1m points = Bad Idea Jeans xyplot(log(price)~log(carat),data=diamonds)
  • 40. efficient plotting with hexbinplot hexbinplot(log(price)~log(carat),data=diamonds,xbins=40)
  • 41. 100 thousand gene measures
  • 42. efficient plotting with geneplotter
  • 43. beautiful colors with Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
  • 46.
  • 47.
  • 48. Configuring rapache Hello world script setContentType("text/html") png("/var/www/hello.png") plot(sample(100,100),col=1:8,pch=19) dev.off() cat("<html>") cat("<body>") cat("<h1>hello world</h1>") cat('<imgsrc="../hello.png"') cat("</body>") cat("</html>")
  • 49. Data Visualization References ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham http://had.co.nz/ggplot2 Lattice: Multivariate Data Visualization with R by DeepayanSarkar http://lmdvr.r-forge.r-project.org/

Notas do Editor

  1. “A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
  2. Hal Varian said that “The sexy job in the next ten years will be statisticians…”(in an 2009 interview with McKinsey Quarterly).Data visualization is the fastest means to feeds our brains data, because it leverages our highest bandwidth sensory organ: our eyes. Statistical visualization is sexy both because high-density information plots tickle our brains – we crave information – and because it is hard to do well.
  3. A datavisualization is often the final step in a three-step data sense-making process, whereby data is (i) “munged” e.g. collected, cleansed, and structured), (ii) modeled, relationships in the data are explored and hypotheses tested, and finally (iii) visualized, a particular model of the data is represented graphically.At Facebook, their data engineers are called “data scientists.” I like this term because it conveys that working with data involves the scientific method, predicated on making hypotheses and testing them.Ultimately, we are interested in using data to make hypotheses about the world.
  4. Like this one, from Jessica Hagy’s witty blog – this is indexed.comShe visualizes a hypothesis that free time and money are related – e.g. that youhave the most free time when you’re broke and when you’re rich.I decided to test this hypothesis with data on working hours (its complement = free time) and GDP from 29 OECD countries.
  5. Using R, I decided to test this hypothesis.I modeled it with a polynomial regression.Data for 29 countries in the OECD, using 2006 data on annual hours worked, and GDP per capita.I modeled it with both linear and polynomial regression models.Just a few lines of code.
  6. And using R, I visualized it. The wealth-free time hypothesiswas half-right.Here’s the result – for OECD countries, Jessica was partially right:the richer you are, the more free time you have (the extreme rightmost point is Luxembourg). But at least for the subset of countries that we examined, the relationship is strictly linear – the poorest OECD countries have the least free time.(In the code shown on the right, I’m using ggplot2 here, not the base graphics plot function in the previous slide. But ggplot2 will automatically do a loess fit for us).
  7. In this section, I describe built-in graphics functions in R, that require no external packages.
  8. First, a peek under the covers of the R graphics stack. At the top-most level are packages, like “maps”, “lattice”, and “ggplot2”. These packages make calls to a lower-level graphics system, of which in R there are two – called “graphics” and “grid”.According to Nicholas Lewin-Koh, the goal of these graphics systems is to “create coordinates for each graphical object and render them to a device or canvas. In addition the system may manage (i) a stack of graphics objects, (ii) local state information, (iii) redrawing and resizing.”Finally these graphics systems are capable of rendering output to a variety of devices – which for our purposes, can be considered image formats such as PNG, JPG, and PDF. Devices are most commonly include interactive displays – such as those in Windows of Mac OS X – which R sends its output to by default during an interactive session.Grid is a newer system, and both “lattice” and “ggplot2”, which I’ll discuss later, use Grid.
  9. plot() is a “do the right thing” graphics commandplot() is the simplest R command for generating a visualization of an R object. It’s an overloaded function that just “does the right thing”, and yields a quick few for many R objects that are passed to it.These built-in basic plotting commands are useful if you’re just doing quick, exploratory analysis, and publication quality graphs are not what you’re looking for.
  10. We can interactively add layers – lines, points, and text -- to plots using basic graphics functions.One such example is abline– so named for its a slope, b intercept parameters it uses to draw a line (from that saw y = ax + b).
  11. par is a function for setting graphical parameters for base graphics – and, nota bene, these parameters are often shared by the higher level packages I discuss later.Once parameters are defined via par, graphics functions like plot will use these new parameters in subsequent plots.The example above shows the setting of three parameters: pchto set a plotting character (21 denotes a filled circle),cexto set size or character expansion (1 is default, 5 is bigger)col to set color, which is definable as a name (“blue”), an integer (1-7 for primaries), or an RGB value (as above).
  12. graphics parameters can be set via par(), or passed directly to graphics functionsAboveare some more parameters that you can set using par(). For a full list, type help(par) at the R prompt.You can also pass these parameters directly to graphics functions, for example, “points(5,3, pch=19, col=blue)”The chart on the right is example of a plot painstakingly created with the low-level plotting parameters and functions above. This was done by interactively layering additional text labels and legends on after the initial points were plotted.
  13. Edward Tufte has lauded the value of “small multiples” in information graphics: namely, the incorporation of many small plots in a single graphic.R provides a basic facility for the subdivision of a display device (or ultimately its printed representation) into several panels. This can be achieved by setting the graphics parameter mfrow, which stands for multiple figures plotted row-wise.
  14. With the mfrowparameter, a 2 x 2 matrix of sub-panels -- as in the example above -- can be set up, and plots will be interactively drawn in these sub-panels.The code above illustrates the creation of four figures in a single graphic, and the result is shown in the next slide.(There is also a mfcol function for plotting multiple figures in a column-wise manner.)
  15. Unless a data visualization is of unusually high density, most modern display devices allow for upwards of 16 figures to be suitably resolved on a single device. See the splom() function for automatic creation of such dense graphics.
  16. R graphics devices can present some “gotchas”Normally one need not have any knowledge of the graphics devices that underly the R graphics system. But in a few cases, it’s worth knowing something about: while typical users can save R graphics in the Windows or Mac OS X (via a “Save As” dialog in the graphics window), if one is not using a GUI, exporting graphics requires manually opening a device – with one of several device commands (such as pdf() or png() ) – and closing it properly (using dev.off() ).also, when exporting graphics in a non-interactive environment (via a script for instance) – it’s critical to invoke the print() function – which will properly write a graphic to the available device.this “print” issue can be a real gotcha for scripts.
  17. Okay, now I want you to try and forget everything you just heard about base graphics.ggplot2 is a new visualization package formally released in 2009, developed by Professor Hadley Wickham.It is a based a different perspective of developing graphics, and has its own set of functions and parameters.
  18. the ‘gg’ in ggplot2 is a reference to a book called The Grammar of Graphics written done by Leland Wilkinson The book conceives graphics as compositional– made up colors, visual shapes, and coordinates, much as sentences are made up of parts of speech.
  19. I’ve illustrated an incomplete version of Wilkinson’s grammar in this slide, to convey how graphics are built up – and out of – their component parts.As such, Wilkinson advocates that graphical tools should leave behind what he deems “chart typologies” – rigid casts of a pie charts, bar graphs, or scatter plots, which data is poured into. (These programs might be thought of as the Mad Libs analogs of graphics –with pre-defined structure, and limited degrees of freedom).Conceived as compositional, a graphical grammar allows for an infinite variety of graphical constructions.
  20. In the upcoming examples, drawn directly from Hadley Wickham’s book on ggplot2, we’ll visualize data concerning ~ 50,000.We’ll start simple and build to more complex graphs by specifying additional elements of the graphical grammar.This data is in the ggplot2 package, more information is available with help(diamonds) (after loading ggplot2).For our purposes, we’re concerned examining relationships between just three dimensions of this data, namely: carat, cut, clarity, price.
  21. In ggplot2, the command to build this plot is qplot(), which stands for “quick plot”. We pass qplot() two dimensions of our data (carat and price), and it defaults to a scatter plot representation. Also worth noting is ggplot2’s other visual defaults are quite easy on the eyes – in contrast to most of R’s base graphics.We begin with a basic scatter plot of these 50,000 diamonds. This plot reveals that, not surprisingly, the price of diamonds increases as they get bigger (in terms of carats). Somewhat more interesting is how: we perceive that price seems to increase exponentially (and we test this hypothesis in the next slide).
  22. Next, we log normalize the our data, and reveal that as we suspected, the relationship between a diamond’s price and its carat is exponential.It should be noted that we can achieve this transformation in two equivalent ways: (i) we can directly transform our data with the log function, or (ii) we can transform our coordinate scales on which our data is plotted. In ggplot2, this latter approach is achieved by passingthe parameter ‘log=“xy”’ to qplot. Because both normalization approaches rely on different parts of graphical speech – data and scale – this nicely illustrates that, as in language, there is more than one way to express data visually using this grammar of graphics and ggplot2.
  23. Another element of the graphical grammar is the aesthetic appearance of plotting points. Here, we pass a parameter, alpha, which controls the transparency of the points plotted. The parameter’s value, I(1/20), indicates that each point should have 1/20th of full intensity: thus 20 overplotted points are required at any given location to achieve full saturation (in this case, to black).(Note: the “I” function in R inhibits further interpretation of its arguments, so can be thought of simply the fraction 1/20)This method uncovers some interesting distributions in the data that were previously obscured by overplotting. For example, we can detect that points are highly concentrated around specific carat sizes.Contrast this method with our earlier approach to alpha blending with base graphics, which required manually specifying the RGB hex code.
  24. Here we layer on yet another element of grammar, the color, to show how clearer stones are more expensive.ggplot2 automatically creates a legend for the mapping of color variables onto color.(Note, Wickham’s choice of a default color palette is not accidental – they of equal luminance, thus no one dominates over the other. For more than you ever want to know about color choice, see http://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture13.pdf).
  25. Now we use another element of the grammar – what is termed ‘facets’ – to splinter our graphic into a number of subplots along a given dimension. Here we achieve the small multiples that we previously did using the par function and mfrow parameter.These sorts of sub-divided plots are what the Lattice system, excels at, which we’ll see later.What can say from this plot? Well, if anything, clear colored diamonds (“D”) seem to get more expensive more quickly (slightly steeper slope as a function of their size) versus yellower diamonds.
  26. Let’s take another view of the data. Here we’re interested in seeing how color influences the per carat cost of a diamond. Theboxplot on the left shows that nearly clear diamonds (color categories ‘D’ and ‘E’) have a greater number of high-priced outliers, but their median (the center line of each box) is nearly identical to the others.The so-called jitter plot on the right shows this same view of the data, but all of the points are shown – in this case, the points plotted into bins according a categorical variable, diamond color, and “jittered” within each bin to prevent overplotting, and allow a sense of the local density at difference values along the common y-dimension of price/carat.
  27. A display of 50,000 data points. Whynot? Our eyes can handle, and I submit, crave these kind of rich visualizations.This also allows us to detect features of the data (for example, several thin white bands across the bottom of the bars – perhaps preferred price/carat combinations?) that may be missing in from more simplified data views.
  28. lattice is an alternativehigh-level graphics package for R. Like ggplot2 it is built on the grid graphics system.
  29. lattice is named in honor of its predecessor, trellis, which was a visualization library developed for the S language by William Cleveland. trellis was so named because of how it visualizes higher dimensions of data: it splinters these dimensions across space, producing a grid of small multiples that resemble a trellis. In the next series of slides I show how we can use lattice to visualize up to six dimensions of data in a single plot.
  30. To demonstrate lattice’s multivariate visualizing abilities, we’ll use a fascinating data set called MLB Gameday.Since 2007, Major League Baseball has tracked the path and velocity of > 1 million pitches thrown.Sample data is here:http://gd2.mlb.com/components/game/mlb/year_2008/month_03/day_30/gid_2008_03_30_atlmlb_wasmlb_1/pbp/pitchers/400010.xml
  31. With just two dimensions of data to describe — the x and y location in the strike zone — we can use lattice’s xyplotfunction.Unlike ggplot2, the first that we pass to lattice’s plotting functions (of which xyplot is just one) are formulas that describe a relationship in the data to be plotted. In this case, “x ~ y” can be read as “x depends on y”.Note the visual defaults: not as easy on the eyes as ggplot2 (which has a lower contrast gray background), but an improvement on R’s base graphics plots.
  32. In this plot, I’ve layered a third dimension, pitch type,into our plot by using lattice’s “groups” parameter, which uses a different plotting symbol for each type, and includes a legend across the top.Alas, this is not a particularly informative chart. The symbols are overplotted on top of each other: trends among the pitch types are hard to discern.With lattice, we can use yet another approach.
  33. Now we’re doing what lattice does best – splintering a dimension, in this case pitch type, into space.We do this by using R’s “condition” operator in the formula we pass to lattice (the formula “x ~ y | type” can be read as “x depends on y conditioned on type”).
  34. Now we include a fourth dimension in our plot – pitch speed – by using color. The speed to color mapping is relatively intuitive (seen in upper right), red is fast, blue is slow.How we achieve this is not particularly simple: we must use what lattice deems “panel functions”, which allow us to extend the default appearance of the chart.
  35. Finally we add a fifth dimension, local density, to our plots using a two-dimensional color palette, where speed is related to chroma, and local density to luminance. This is an attempt to control for some overplotting that might otherwise occur when we shrink these pitch plots down in size.
  36. Now we can compare two different pitchers – the sixth dimension – in a single graphic.Thesix dimensions of data we visualized with lattice are thus: 1. and 2. x and y location of the pitch 3. pitch type 4. pitch speed 5. pitch density (lots of pitches make darker luminosity with out changing hue) 6. pitcher (Cole or Hamels)
  37. As mentioned, the lattice package provides several other graphics functions besidesxyplot.Some are listed above here, and the densityplot() function is highlighted at the bottom. This is a particularly useful alternative to standard histograms, which can suffer from binning artifacts.
  38. In this section I mention a couple of techniques for handling large data sets.
  39. This is bad for two reasons: (1) overplotting obscures data, even when alpha blending is used.(2) it’s highly inefficient, both on screen – and especially if saved as vector graphic (huge PDFs).Two solutions:- resort to sampling map density of points onto some other attribute – such as colorhexbinplotand geneplotter do just this.
  40. hexbinplot() is a graphics function (in an self-named package) divides a scatter plot area into hexagons, counts occurrences within each these hexagonal areas, and maps these counts to a color scale. The result is a plot, as shown, where the graphics device need only draw as many points as there are hexagons. In the case of the diamond data, rather than 50,000 points being graphed, just ~ 2000 hexagons are.This also reveals some of the clumpiness in the data, though not as well as ggplot2’s alpha-blended scatterplots.
  41. This is an Affymetrix gene chip, with 100,000 data points.On the right we have the output of a typical microarray assay: the colors correspond to RNA expression levels.With R, I can distill these 100,000 data points down to a simple model – and visualize it.
  42. The data visualization on the right, called an M-A plot, is a variation of an XY scatter plot, where we are comparing the observed signals for particular microarray, to a composite background distribution – both are ordered by intensity of signal– deviations from the straight line show differences between our array and the background (in this case, our array tends to have higher signals across the board). Typically we generate an M-A plot for every array in our compendium to yield a big picture view of the consistency of our arrays across experiments – the flatter the red lines, the better (remember that in most models of cellular behavior we expect only a small fraction of genes to change in expression).
  43. Ross Ihaka’sColorspace package provides access to useful colorspaces beyond RGB, like LAB and HSV. These colorspaces are preferred by artists and designers for their more intuitive properties. This is the package I used to design the palettes in the pitching plots shown earlier. For my opinionated comments on using color in data visualizations, visit:http://dataspora.com/blog/how-to-color-multivariate-data/
  44. Before we end, some thoughts on how R can be used a visualization engine on the web.
  45. So I’ve pushed this pitch visualization application into a web app, using RApache.I can do this because R is open source – without licensing restrictions.Data and the processing can both live on the server – important when your data set is huge (this one is around 20 Gigabytes). And when the data changes, the dashboard updates.No local software installation needed, and updates are instantly available to all web users.It can be part of the open source web-analytics stack, with a catchy name – LAMR. If you can think of something less lame, let me know.
  46. Why EmbedR into a Web-based Architecture?Immediately access the many benefits of a web architecture that is: * Stateless/Scalable – URL requests can be distributed across one or many servers * Cacheable - common requests made to the R server can be cached by Apache * Secure - we can piggyback on existing HTTPS architecture for analysis of sensitive data
  47. rapache: Embedding R within the Apache ServerOur tool of choice is rapache, developed by Jeff Horner at Vanderbilt University. http://biostat.mc.vanderbilt.edu/rapache/
  48. Naturally this is just scratching the surface of what rapachecan do.An alternative approach to printing HTML directly, is to use a templating system, similar to PHP.This is available via the R package brew (also developed by Jeffrey Horner), downloadable on CRAN and at:http://www.rforge.net/brew/
  49. The ggplot2 and lattice books are both published by Springer (ggplot2 as of July 2009), available via Amazon.example code and figures from ggplot2 bookhttp://had.co.nz/ggplot2example code and figures from lattice bookhttp://lmdvr.r-forge.r-project.org/