SlideShare uma empresa Scribd logo
1 de 52
R for Data Visualizaiton
and Graphics
Rob Kabacoff, Ph.D.
Vice President of Research

Source code for presentation: http://tinyurl.com/Kabacoff-CS20
R is a Statistical and Graphical
R Homepage - http://www.r-project.org/
Platform

CRAN Mirrors – http://cran.r-project.org/

•
•
•
•
•
•
•

Free
Open source
State-of-the-art data analysis
Platform for programming new methods
Runs on Windows, Linux, Mac OS X
Enormous user base
Reproducible research

2
Data Input
Statistical Packages
SAS

SPSS

Stata

Keyboard
ASCII
Text Files

Excel
netCDF
HDF5

R

XML

Webscraping
SQL

MySQL

Oracle

Other

Access

Database Management Systems
3
Statistical Methods
Descriptive Statistics
Experimental Design
Linear , Generalized, Nonlinear,
and Hierarchical Models
Analysis of Categorical Data
Nonparametric Analysis
Survival Analysis
Latent Variable Models
Bayesian Models
Missing Values Analysis
Cluster Analysis
Decision Trees
Data Mining

Classical Test Theory
Item Response Theory
Correspondence Analysis
Multidimensional Scaling
Meta Analysis
Structural Equation Modeling
Complex Survey Design
Time Series Analysis
Longitudinal Analysis
Social Network Analysis
Study of Mediation and
Moderation
Power Analysis
Clinical Trials

and …
4
Given : depth

Graphs!

200

300

400

500

10 Meter Contour Spacing

165 170 175 180 185

-35

-25

-15

lat

-35

-25

Meters West

-15

165 170 175 180 185

A Topographic Map of Maunga Whau

600

100 200 300 400 500 600

100

165 170 175 180 185

0

long

0

200

400

600

800

Meters North

Sinc(

8
6
4
2
0
-2
-10

10

r)

5
Y

0

-5
0
X

-5
5
10 -10

Survival on the Titanic
Child

University Salaries by Discipline

Age
Adult
Pearson
residuals:
14.3

Male

No

200000

Yes

Salary

Sex

Survived

discipline

4.0
2.0
0.0
-2.0
-4.0

150000

Theoretical
Applied

Yes No

Female

100000

-11.1
p-value =
<2e-16
50000
0

20

Years Since Ph.D.

40

5
A High Level Tour
• General Systems
– base
– lattice
– ggplot2

• Interactive
–
–
–
–

iplots
rggobi
googleVis
Shiny

• Specialized
–
–
–
–
–
–
–
–
–

vcd (categorical data)
VIM (missing data)
likert (likert data)
scatterplot3d (3-D
scatterplot)
car (regression)
corrplot (correlations)
(decision trees)
(dendograms)
effects (glm/ANOVA)
6
60
40
20
0

3 complete
graphics systems

Frequency

80

100

Base Graphics

50000

100000

150000

200000

Salary (dollars)
Lattice Graphics

ggplot2 Graphics

40

100
30

Frequency

Frequency

80

60

20

40
10

20

0

0

50000

50000

100000

150000

Salary (dollars)

200000

100000

150000

Salary (dollars)

200000
BASE GRAPHICS

8
histograms
Histogram with Rug plot

150000

8.0e-06

1.2e-05

100000

200000

50000

Salary (dollars)

100000

150000

200000

Salary (dollars)

0

20

40

60

80

100

Histogram with Normal Curve

Frequency

50000

0.0e+00 4.0e-06

Density

8.0e-06
0.0e+00 4.0e-06

Density

1.2e-05

Histogram of Kernal Density Curve

50000

100000

150000

Salary (dollars)

200000

9
bar charts

10
box plots
Singer Height by Voice Part

Soprano 1
Soprano 2
Alto 1
Alto 2
Tenor 1
Tenor 2
Bass 1
Bass 2

60

65

70

75

Heights in Inches

11
Monthly Airline Passengers

line charts
Passengers (K)

600

4000

UK Lung Cancer Deaths

3500

Total
Male
Female

500
400
300
200

3000

100
1950

1952

1954

1956

1958

1960

2500

Time

2000

Monthly Airline Passengers

500

1000

Passengers (K)

1500

600

1974

1975

1976

1977
year

1978

1979

1980

500
400
300
200
100
1950

1952

1954

1956

1958

1960

Time

12
time series
300

-60

Season Decomposition of a Time Series

300

Season Decomposition of a Time Series

0 20

remainder

60

200

trend

400

500

Season
Decomposition

-20 0 20

seasonal

60

100

data

500

Monthly Air Passengers

-40

Season Decomposition of a Time Series

1950

1952

1954

1956

1958

1960

time
Season Decomposition of a Time Series

13
scatterplots
10

15

High Density Scatterplot (n=10,000)

5

Iris Data
Y

7

0

5
-5

4
3

-10

Petal Length (cm)

6

2

-5

1

0

5

10

X

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

Sepal Length (cm)

14
scatterplot matrix
Anderson's Iris Data -- 3 species
3.0 3.5 4.0

0.5

1.0 1.5

2.0 2.5

6.5

7.5

2.0 2.5

4.0

4.5

5.5

Sepal.Length

5

6

7

2.0

3.0

Sepal.Width

1.5

2.5

1

2

3

4

Petal.Length

0.5

Petal.Width

4.5

5.5

6.5

7.5

1

2

3

4

5

6

7

15
dot plot
MPG by Automobile
Toyota Corolla
Fiat 128
Lotus Europa
Honda Civic
Fiat X1-9
Porsche 914-2
Merc 240D
Merc 230
Datsun 710
Toyota Corona
Volvo 142E
Hornet 4 Drive
Mazda RX4 Wag
Mazda RX4
Ferrari Dino
Pontiac Firebird
Merc 280
Hornet Sportabout
Valiant
Merc 280C
Merc 450SL
Merc 450SE
Ford Pantera L
Dodge Challenger
AMC Javelin
Merc 450SLC
Maserati Bora
Chrysler Imperial
Duster 360
Camaro Z28
Lincoln Continental
Cadillac Fleetwood
10

15

20

25

30

16
contour plots
A Topographic Map of Maunga Whau

600

10 Meter Contour Spacing

110

120

400

10

0

10

150

0

200

180

0

0

17

19

18

0

160

160

110

170
140
130
10

0

110

0

Meters West

110

0

200

400
Meters North

600

800

17
LATTICE GRAPHICS

18
lattice graphs
• expands base graphics to include trellis plots
• seeks to improve in graph defaults (symbols, axes, labels)
over base gaphics
• grouping
– color, fill, line type can be mapped to variable values

• facets
– subgroups can be plotted in an array based on the levels of
(usually) one or two variables

• customizable panel functions allow you fine grained control
of what is plotted in each facet
• comments
– clean and fast
– high degree of customization possible
3D graphs with faceting

20
lattice graph with faceting and a
customized panel function
GGPLOT2 GRAPHICS
ggplot2
• Grammar of Graphics
• graphs built up in layers by plotting "geoms"
• grouping
– color, fill, shape, size can be mapped to variable values

• facets
– subgroups can be plotted in an array based on the levels of
(usually) one or two variables

• comments
–
–
–
–

allows you to create novel plots
can be slow for large problems
no 3D graphs
HOT!
kernel density plots with grouping

24
histogram with
faceting
Theoretical

Applied

20
AsstProf

15
10
5
0

AssocProf

count

20
15
10
5
0
20
15
Prof

10
5
0
50000

100000

150000

200000

50000

100000

150000

200000

salary
25
boxplots
Theoretical

Applied

200000

salary

sex
150000

Female
Male

100000

50000
AsstProf

AssocProf

Prof

AsstProf

AssocProf

Prof

rank

26
jittered plots

27
scatter plot with smooth line

28
scatterplot with fit lines, grouping,
and faceting

29
SPECIALIZED GRAPHS

30
Danger

Exp

Pred

Gest

Span

Sleep

Dream

NonD

BrainWgt

BodyWgt

Danger

Exp

Pred

Gest

Span

Sleep

Dream

NonD

BrainWgt

BodyWgt

0

2

4

8

Combinations

6

10

Number of missings
12

14

visualizing missing data
VIM
package

1

1

2

2

2

3

9

42

31
car
package

scatterplot matrices
10

20

30

40

50

60
10 20 30 40 50

0

50

0

Frequency

yrs.since.phd

yrs.service

0

10

20

30

40

50

salary

100000

Frequency

x

200000

0 10

30

Frequency

x

100000

150000
x

200000

32
cyl

corrplot
package

wt

78

89

hp

83

79

66

hp

carb

53

39

43

75

carb

qsec

-59

-43

-17

-71

-66

wt

90

disp

visualizing correlations

disp

qsec

variables reordered
to find clusters

-49

-56

-58

-13

27

-21

am

-52

-59

-69

-24

6

-23

79

am

drat

-70

-71

-71

-45

-9

9

70

71

drat

vs

-81

-71

-55

-72

-57

74

21

17

44

mpg

-85

-85

-87

-78

-55

42

48

60

68

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

vs

gear

gear

non-significant (.05)
correlations indicated
with an X

66

0.8

1

33
Heatmap

Specification Variables

disp

hp

mpg

qsec

gear

drat

wt

carb

vs

am

cyl

Toyota Corona
Porsche 914-2
Datsun 710
Volvo 142E
Merc 230
Lotus Europa
Merc 280
Merc 280C
Mazda RX4 Wag
Mazda RX4
Merc 240D
Ferrari Dino
Fiat 128
Fiat X1-9
Toyota Corolla
Honda Civic
Merc 450SL
Merc 450SE
Merc 450SLC
Dodge Challenger
AMC Javelin
Hornet 4 Drive
Valiant
Duster 360
Camaro Z28
Ford Pantera L
Pontiac Firebird
Hornet Sportabout
Cadillac Fleetwood
Lincoln Continental
Chrysler Imperial
Maserati Bora

Car Models

stats
package

34
visualizing categorical data
2000

vcd
package

Sex

1500
1000
500
0

Male Female

Survived
1500
1000
500
0

No

Yes

1000
800
600
400
200
0

Class

1st 2nd 3rd

35
visualizing effects (linear models)
2 x 3 ANCOVA

36
rank by sex interaction (means)
adjusting for other variables

effects
package

rank*sex effect plot
AsstProf

sex : Female

AssocProf

Prof

sex : Male

130000

120000

salary

110000

100000

90000

80000

70000

AsstProf

AssocProf

Prof

rank

37
visualizing effects (generalized
linear models)
Logistic regression with 8 predictors

38
rating effects (prob) by gender adjusting
for other variables

effects
package

39
scatterplot3d
package

3D Scatterplot

Automobile Data

35

Toyota Corolla
Fiat 128

30

Honda Civic
Lotus Europa
Fiat X1-9

Merc 240D
Merc 230

Ferrari Dino

Merc 280C

Hornet 4 Drive
Pontiac Firebird
Valiant

Chrysler Imperial

Hornet Sportabout
Merc 450SL
Merc 450SE

20

Merc 450SLCChallenger
Dodge
Maserati Bora
AMC JavelinPantera L
Ford
Duster 360
Camaro Z28

6
Lincoln Continental
Cadillac Fleetwood

5

15

4

Weight (lb/1000)

25

Datsun 710
Volvo 142E
Mazda RX4 Wag
Toyota Corona
Mazda RX4 Merc 280

3
2

10

Miles/(US) Gallon

Porsche 914-2

1
0

100

200

300

400

500

Displacement (cu. in.)

40
INTERACTIVE GRAPHICS

41
iplots

hold [Ctrl] and mouse
over graph for info

42
rggobi
• GGobi is an open source visualization program for
exploring high-dimensional data
• rggobi provides R command line interface to
GGobi
Installation
1. install GGobi: download from www.ggobi.org
2. in R: install.packages("rggobi")

see:
http://www.ggobi.org/rggobi/introduction.pdf
43
Display to
open new
windows
Interaction
to select,
identity, or
brush
View to change
type of xy plot

right mouse
to select

44
googleVis
• Provides access to Google Chart Tools
–
–
–
–

motion charts
annotated time lines
maps
other (e.g. line, bar, bubble, column, area, scatter,
candlestick, pie, org charts)
– https://developers.google.com/chart/

• output is html code containing data and references
to JavaScript functions hosted by Google
• an internet connection required to view the graphs
demo(WorldBank)
Hans Rosling in his TED talks
45
46
Shiny
• Package for building interative web
applications with R
– homepage- http://www.rstudio.com/shiny/
– examples- http://www.rstudio.com/shiny/showcase/

• Distribution
– self hosted (requires free Shiny Server on Linux
server)
pkgs <- c("Rcpp", "httpuv", "shiny")
– Rstudio hosted
install.packages(pkgs)
library(shiny)
– distribute as a package runExample("06_tabsets")
47
shiny example

48
RESOURCES
www.statmethods.net
Books
R in Action
Robert I. Kabacoff

R Graphics Cookbook
Winston Chang

Lattice
Deepayan Sarkar

ggplot2
Hadley Wickham

51
additional websites
• Cookbook for R
http://www.cookbook-r.com/

• ggplot2 documentation
http://docs.ggplot2.org/current/

• R-Bloggers
http://www.r-bloggers.com/
52

Mais conteúdo relacionado

Mais procurados

Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set TheoryAMIT KUMAR
 
Lec 5 uncertainty
Lec 5 uncertaintyLec 5 uncertainty
Lec 5 uncertaintyEyob Sisay
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Properties of arithmetic mean
Properties of arithmetic meanProperties of arithmetic mean
Properties of arithmetic meanNadeem Uddin
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAndrew Ferlitsch
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Tish997
 
Fuzzy set and its application
Fuzzy set and its applicationFuzzy set and its application
Fuzzy set and its applicationKalaivananRaja
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and predictionAcad
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector MachinesCloudxLab
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 

Mais procurados (20)

Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set Theory
 
Clustering
ClusteringClustering
Clustering
 
Fuzzy Set
Fuzzy SetFuzzy Set
Fuzzy Set
 
Lec 5 uncertainty
Lec 5 uncertaintyLec 5 uncertainty
Lec 5 uncertainty
 
Classical Sets & fuzzy sets
Classical Sets & fuzzy setsClassical Sets & fuzzy sets
Classical Sets & fuzzy sets
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Properties of arithmetic mean
Properties of arithmetic meanProperties of arithmetic mean
Properties of arithmetic mean
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman Equations
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
 
Fuzzy set and its application
Fuzzy set and its applicationFuzzy set and its application
Fuzzy set and its application
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Markov Chains
Markov ChainsMarkov Chains
Markov Chains
 
svm classification
svm classificationsvm classification
svm classification
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 

Destaque

The briefing room how data visualization enhances the news
The briefing room   how data visualization enhances the newsThe briefing room   how data visualization enhances the news
The briefing room how data visualization enhances the newsBen Jones
 
Lincoln2014 ddj (ppt)
Lincoln2014 ddj (ppt)Lincoln2014 ddj (ppt)
Lincoln2014 ddj (ppt)Tony Hirst
 
What Is Good DataViz Design?
What Is Good DataViz Design?What Is Good DataViz Design?
What Is Good DataViz Design?Randy Krum
 
Automatic time series forecasting
Automatic time series forecastingAutomatic time series forecasting
Automatic time series forecastingRob Hyndman
 
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...philogb
 
Democratising data by igniting a crowd powered movement.
Democratising data by igniting a crowd powered movement.Democratising data by igniting a crowd powered movement.
Democratising data by igniting a crowd powered movement.Steve Jennings
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCognizant
 
Bar Chart Embellishments
Bar Chart EmbellishmentsBar Chart Embellishments
Bar Chart EmbellishmentsDrew Skau
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualizationZach Gemignani
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
The 8 Hats of Data Visualisation
The 8 Hats of Data VisualisationThe 8 Hats of Data Visualisation
The 8 Hats of Data VisualisationAndy Kirk
 
The Design of Time
The Design of TimeThe Design of Time
The Design of TimeAndy Kirk
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 

Destaque (15)

The briefing room how data visualization enhances the news
The briefing room   how data visualization enhances the newsThe briefing room   how data visualization enhances the news
The briefing room how data visualization enhances the news
 
Lincoln2014 ddj (ppt)
Lincoln2014 ddj (ppt)Lincoln2014 ddj (ppt)
Lincoln2014 ddj (ppt)
 
What Is Good DataViz Design?
What Is Good DataViz Design?What Is Good DataViz Design?
What Is Good DataViz Design?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Automatic time series forecasting
Automatic time series forecastingAutomatic time series forecasting
Automatic time series forecasting
 
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...
From Data Journalism to Data Illustration - Visualizing Data with JavaScript ...
 
Democratising data by igniting a crowd powered movement.
Democratising data by igniting a crowd powered movement.Democratising data by igniting a crowd powered movement.
Democratising data by igniting a crowd powered movement.
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & Insights
 
Bar Chart Embellishments
Bar Chart EmbellishmentsBar Chart Embellishments
Bar Chart Embellishments
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualization
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
The 8 Hats of Data Visualisation
The 8 Hats of Data VisualisationThe 8 Hats of Data Visualisation
The 8 Hats of Data Visualisation
 
The Design of Time
The Design of TimeThe Design of Time
The Design of Time
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Data mining
Data miningData mining
Data mining
 

Semelhante a R for data visualization and graphics

Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14stewashton
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platformNaoki (Neo) SATO
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemszhu02
 
Line Detection on the GPU
Line Detection on the GPU Line Detection on the GPU
Line Detection on the GPU Gernot Ziegler
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_pptaserrapages
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringNAVER D2
 
Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Ian Thomas
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsSunghwan Kim
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...Koorosh Aslansefat
 
IDEAS ASICs and System Products 2017
IDEAS ASICs and System Products 2017IDEAS ASICs and System Products 2017
IDEAS ASICs and System Products 2017Dirk Meier
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
OptraSCAN Fluorescence Scanning & Analysis
OptraSCAN Fluorescence Scanning & AnalysisOptraSCAN Fluorescence Scanning & Analysis
OptraSCAN Fluorescence Scanning & AnalysisShwetaSharma687
 

Semelhante a R for data visualization and graphics (20)

Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
xldb-2015
xldb-2015xldb-2015
xldb-2015
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Line Detection on the GPU
Line Detection on the GPU Line Detection on the GPU
Line Detection on the GPU
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_ppt
 
R
RR
R
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-Answering
 
Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Whose Stack Is It Anyway?
Whose Stack Is It Anyway?
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
 
IDEAS ASICs and System Products 2017
IDEAS ASICs and System Products 2017IDEAS ASICs and System Products 2017
IDEAS ASICs and System Products 2017
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
OptraSCAN Fluorescence Scanning & Analysis
OptraSCAN Fluorescence Scanning & AnalysisOptraSCAN Fluorescence Scanning & Analysis
OptraSCAN Fluorescence Scanning & Analysis
 

R for data visualization and graphics

  • 1. R for Data Visualizaiton and Graphics Rob Kabacoff, Ph.D. Vice President of Research Source code for presentation: http://tinyurl.com/Kabacoff-CS20
  • 2. R is a Statistical and Graphical R Homepage - http://www.r-project.org/ Platform CRAN Mirrors – http://cran.r-project.org/ • • • • • • • Free Open source State-of-the-art data analysis Platform for programming new methods Runs on Windows, Linux, Mac OS X Enormous user base Reproducible research 2
  • 3. Data Input Statistical Packages SAS SPSS Stata Keyboard ASCII Text Files Excel netCDF HDF5 R XML Webscraping SQL MySQL Oracle Other Access Database Management Systems 3
  • 4. Statistical Methods Descriptive Statistics Experimental Design Linear , Generalized, Nonlinear, and Hierarchical Models Analysis of Categorical Data Nonparametric Analysis Survival Analysis Latent Variable Models Bayesian Models Missing Values Analysis Cluster Analysis Decision Trees Data Mining Classical Test Theory Item Response Theory Correspondence Analysis Multidimensional Scaling Meta Analysis Structural Equation Modeling Complex Survey Design Time Series Analysis Longitudinal Analysis Social Network Analysis Study of Mediation and Moderation Power Analysis Clinical Trials and … 4
  • 5. Given : depth Graphs! 200 300 400 500 10 Meter Contour Spacing 165 170 175 180 185 -35 -25 -15 lat -35 -25 Meters West -15 165 170 175 180 185 A Topographic Map of Maunga Whau 600 100 200 300 400 500 600 100 165 170 175 180 185 0 long 0 200 400 600 800 Meters North Sinc( 8 6 4 2 0 -2 -10 10 r) 5 Y 0 -5 0 X -5 5 10 -10 Survival on the Titanic Child University Salaries by Discipline Age Adult Pearson residuals: 14.3 Male No 200000 Yes Salary Sex Survived discipline 4.0 2.0 0.0 -2.0 -4.0 150000 Theoretical Applied Yes No Female 100000 -11.1 p-value = <2e-16 50000 0 20 Years Since Ph.D. 40 5
  • 6. A High Level Tour • General Systems – base – lattice – ggplot2 • Interactive – – – – iplots rggobi googleVis Shiny • Specialized – – – – – – – – – vcd (categorical data) VIM (missing data) likert (likert data) scatterplot3d (3-D scatterplot) car (regression) corrplot (correlations) (decision trees) (dendograms) effects (glm/ANOVA) 6
  • 7. 60 40 20 0 3 complete graphics systems Frequency 80 100 Base Graphics 50000 100000 150000 200000 Salary (dollars) Lattice Graphics ggplot2 Graphics 40 100 30 Frequency Frequency 80 60 20 40 10 20 0 0 50000 50000 100000 150000 Salary (dollars) 200000 100000 150000 Salary (dollars) 200000
  • 9. histograms Histogram with Rug plot 150000 8.0e-06 1.2e-05 100000 200000 50000 Salary (dollars) 100000 150000 200000 Salary (dollars) 0 20 40 60 80 100 Histogram with Normal Curve Frequency 50000 0.0e+00 4.0e-06 Density 8.0e-06 0.0e+00 4.0e-06 Density 1.2e-05 Histogram of Kernal Density Curve 50000 100000 150000 Salary (dollars) 200000 9
  • 11. box plots Singer Height by Voice Part Soprano 1 Soprano 2 Alto 1 Alto 2 Tenor 1 Tenor 2 Bass 1 Bass 2 60 65 70 75 Heights in Inches 11
  • 12. Monthly Airline Passengers line charts Passengers (K) 600 4000 UK Lung Cancer Deaths 3500 Total Male Female 500 400 300 200 3000 100 1950 1952 1954 1956 1958 1960 2500 Time 2000 Monthly Airline Passengers 500 1000 Passengers (K) 1500 600 1974 1975 1976 1977 year 1978 1979 1980 500 400 300 200 100 1950 1952 1954 1956 1958 1960 Time 12
  • 13. time series 300 -60 Season Decomposition of a Time Series 300 Season Decomposition of a Time Series 0 20 remainder 60 200 trend 400 500 Season Decomposition -20 0 20 seasonal 60 100 data 500 Monthly Air Passengers -40 Season Decomposition of a Time Series 1950 1952 1954 1956 1958 1960 time Season Decomposition of a Time Series 13
  • 14. scatterplots 10 15 High Density Scatterplot (n=10,000) 5 Iris Data Y 7 0 5 -5 4 3 -10 Petal Length (cm) 6 2 -5 1 0 5 10 X 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Sepal Length (cm) 14
  • 15. scatterplot matrix Anderson's Iris Data -- 3 species 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 6.5 7.5 2.0 2.5 4.0 4.5 5.5 Sepal.Length 5 6 7 2.0 3.0 Sepal.Width 1.5 2.5 1 2 3 4 Petal.Length 0.5 Petal.Width 4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 15
  • 16. dot plot MPG by Automobile Toyota Corolla Fiat 128 Lotus Europa Honda Civic Fiat X1-9 Porsche 914-2 Merc 240D Merc 230 Datsun 710 Toyota Corona Volvo 142E Hornet 4 Drive Mazda RX4 Wag Mazda RX4 Ferrari Dino Pontiac Firebird Merc 280 Hornet Sportabout Valiant Merc 280C Merc 450SL Merc 450SE Ford Pantera L Dodge Challenger AMC Javelin Merc 450SLC Maserati Bora Chrysler Imperial Duster 360 Camaro Z28 Lincoln Continental Cadillac Fleetwood 10 15 20 25 30 16
  • 17. contour plots A Topographic Map of Maunga Whau 600 10 Meter Contour Spacing 110 120 400 10 0 10 150 0 200 180 0 0 17 19 18 0 160 160 110 170 140 130 10 0 110 0 Meters West 110 0 200 400 Meters North 600 800 17
  • 19. lattice graphs • expands base graphics to include trellis plots • seeks to improve in graph defaults (symbols, axes, labels) over base gaphics • grouping – color, fill, line type can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • customizable panel functions allow you fine grained control of what is plotted in each facet • comments – clean and fast – high degree of customization possible
  • 20. 3D graphs with faceting 20
  • 21. lattice graph with faceting and a customized panel function
  • 23. ggplot2 • Grammar of Graphics • graphs built up in layers by plotting "geoms" • grouping – color, fill, shape, size can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • comments – – – – allows you to create novel plots can be slow for large problems no 3D graphs HOT!
  • 24. kernel density plots with grouping 24
  • 28. scatter plot with smooth line 28
  • 29. scatterplot with fit lines, grouping, and faceting 29
  • 32. car package scatterplot matrices 10 20 30 40 50 60 10 20 30 40 50 0 50 0 Frequency yrs.since.phd yrs.service 0 10 20 30 40 50 salary 100000 Frequency x 200000 0 10 30 Frequency x 100000 150000 x 200000 32
  • 33. cyl corrplot package wt 78 89 hp 83 79 66 hp carb 53 39 43 75 carb qsec -59 -43 -17 -71 -66 wt 90 disp visualizing correlations disp qsec variables reordered to find clusters -49 -56 -58 -13 27 -21 am -52 -59 -69 -24 6 -23 79 am drat -70 -71 -71 -45 -9 9 70 71 drat vs -81 -71 -55 -72 -57 74 21 17 44 mpg -85 -85 -87 -78 -55 42 48 60 68 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 vs gear gear non-significant (.05) correlations indicated with an X 66 0.8 1 33
  • 34. Heatmap Specification Variables disp hp mpg qsec gear drat wt carb vs am cyl Toyota Corona Porsche 914-2 Datsun 710 Volvo 142E Merc 230 Lotus Europa Merc 280 Merc 280C Mazda RX4 Wag Mazda RX4 Merc 240D Ferrari Dino Fiat 128 Fiat X1-9 Toyota Corolla Honda Civic Merc 450SL Merc 450SE Merc 450SLC Dodge Challenger AMC Javelin Hornet 4 Drive Valiant Duster 360 Camaro Z28 Ford Pantera L Pontiac Firebird Hornet Sportabout Cadillac Fleetwood Lincoln Continental Chrysler Imperial Maserati Bora Car Models stats package 34
  • 35. visualizing categorical data 2000 vcd package Sex 1500 1000 500 0 Male Female Survived 1500 1000 500 0 No Yes 1000 800 600 400 200 0 Class 1st 2nd 3rd 35
  • 36. visualizing effects (linear models) 2 x 3 ANCOVA 36
  • 37. rank by sex interaction (means) adjusting for other variables effects package rank*sex effect plot AsstProf sex : Female AssocProf Prof sex : Male 130000 120000 salary 110000 100000 90000 80000 70000 AsstProf AssocProf Prof rank 37
  • 38. visualizing effects (generalized linear models) Logistic regression with 8 predictors 38
  • 39. rating effects (prob) by gender adjusting for other variables effects package 39
  • 40. scatterplot3d package 3D Scatterplot Automobile Data 35 Toyota Corolla Fiat 128 30 Honda Civic Lotus Europa Fiat X1-9 Merc 240D Merc 230 Ferrari Dino Merc 280C Hornet 4 Drive Pontiac Firebird Valiant Chrysler Imperial Hornet Sportabout Merc 450SL Merc 450SE 20 Merc 450SLCChallenger Dodge Maserati Bora AMC JavelinPantera L Ford Duster 360 Camaro Z28 6 Lincoln Continental Cadillac Fleetwood 5 15 4 Weight (lb/1000) 25 Datsun 710 Volvo 142E Mazda RX4 Wag Toyota Corona Mazda RX4 Merc 280 3 2 10 Miles/(US) Gallon Porsche 914-2 1 0 100 200 300 400 500 Displacement (cu. in.) 40
  • 42. iplots hold [Ctrl] and mouse over graph for info 42
  • 43. rggobi • GGobi is an open source visualization program for exploring high-dimensional data • rggobi provides R command line interface to GGobi Installation 1. install GGobi: download from www.ggobi.org 2. in R: install.packages("rggobi") see: http://www.ggobi.org/rggobi/introduction.pdf 43
  • 44. Display to open new windows Interaction to select, identity, or brush View to change type of xy plot right mouse to select 44
  • 45. googleVis • Provides access to Google Chart Tools – – – – motion charts annotated time lines maps other (e.g. line, bar, bubble, column, area, scatter, candlestick, pie, org charts) – https://developers.google.com/chart/ • output is html code containing data and references to JavaScript functions hosted by Google • an internet connection required to view the graphs demo(WorldBank) Hans Rosling in his TED talks 45
  • 46. 46
  • 47. Shiny • Package for building interative web applications with R – homepage- http://www.rstudio.com/shiny/ – examples- http://www.rstudio.com/shiny/showcase/ • Distribution – self hosted (requires free Shiny Server on Linux server) pkgs <- c("Rcpp", "httpuv", "shiny") – Rstudio hosted install.packages(pkgs) library(shiny) – distribute as a package runExample("06_tabsets") 47
  • 51. Books R in Action Robert I. Kabacoff R Graphics Cookbook Winston Chang Lattice Deepayan Sarkar ggplot2 Hadley Wickham 51
  • 52. additional websites • Cookbook for R http://www.cookbook-r.com/ • ggplot2 documentation http://docs.ggplot2.org/current/ • R-Bloggers http://www.r-bloggers.com/ 52