SlideShare uma empresa Scribd logo
1 de 45
INTRODUCTION TO R
ComputeFest 2012
Alex Storer
Institute for Quantitative Social Science at Harvard
astorer@iq.harvard.edu
http://www.people.fas.harvard.edu/~astorer/computefest2012/
Introduction
• Research Technology Consulting at
IQSS
• Dataverse
• Research Computing Environment
• OpenScholar
• Packages for computational text
analysis, multiple imputation, and
statistics modeling in R
• Trainings on R, Stata, SAS
1/20/12 Introduction to R (Alex Storer, IQSS) 2
http://www.iq.harvard.edu
Goals
1. Why should anyone use R?
2. Basic Familiarity with R
3. Getting Help and Installing Packages
4. Data Input and Output
5. Making Plots
6. Performing Statistical Analysis
7. Functions, loops and building blocks of
programming
A foundation to take the next step.
We'll learn the basics as we go while
working on real problems.
3Introduction to R (Alex Storer, IQSS)1/20/12
Why should I use R?
“I already know Matlab. Why would I want to use R?”
1. Matlab is expensive.
2. R is open-source and community developed.
3. R is platform independent (and can even be run online)
4. Beautiful graphics with clear syntax
5. The most actively developed cutting-edge statistics
applications can be installed from inside R!
4Introduction to R (Alex Storer, IQSS)1/20/12
Why should I use R?
“I already know Python. Why would I ever want to learn a
new language that already does a subset of what Python
already does?”
1. R is designed to do data-processing and statistical
analysis, and it makes it easy.
2. More support in R (e.g., the Use R! series)
3. More active development in R for data analysis
4. Who are you sharing your code with?
5Introduction to R (Alex Storer, IQSS)1/20/12
Starting R
6Introduction to R (Alex Storer, IQSS)1/20/12
Syntax Notes
R Matlab Python
x <- seq(1,10)
# or x <- 1:10
# or x = 1:10
x = 1:10
%a less flexible
%version of linspace
x = range(1,11)
# indices start at 0
for (i in x)
{print("hello")}
for (i = x)
disp("hello")
end
for i in x:
print("hello")
foo.bar <- 10
> foo.bar
[1] 10
foo.bar = 10
> foo.bar
foo =
bar: 10
foo.bar = 10
NameError: name
'foo' is not defined
7Introduction to R (Alex Storer, IQSS)1/20/12
Vectors in R
> x <- c(1,1,2,3,5,8,13,21)
> length(x)
[1] 8
> x[5]
[1] 5
8Introduction to R (Alex Storer, IQSS)1/20/12
The c() command concatenates items into a vector of
the same type!
Note that R counts from 1.
Logical Indexing, Growing Vectors
> x<5
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE
FALSE
> x[x<5]
[1] 1 1 2 3
> x[10] = 55
> x
[1] 1 1 2 3 5 8 13 21 NA 55
> x %% 2
[1] 1 1 0 1 1 0 1 1 NA 1
9Introduction to R (Alex Storer, IQSS)1/20/12
Exercise 1
10Introduction to R (Alex Storer, IQSS)1/20/12
1. Use logical indexing to find the area of Georgia. Use
the included vectors state.name and state.area
2. What happens when you combine 2 and "2" using the
c() command?
3. Make a vector containing 100 numbers, and use plot to
visualize a mathematical function of your choice!
One more function…
> help("foo") > ?foo
Basics Simple
Description
Function
call
Arguments
Return
Values
Details
How the
function
works
The author
Relevant
Notes
See Also
Other
relevant
functions
Examples
Several
sample
calls
11Introduction to R (Alex Storer, IQSS)1/20/12
Exercise 2
1. Which is bigger, or ?
2. What does the function rep do?
3. Make a vector containing 50 ones and call it fib
4. Use a for loop to compute the first 50 Fibonacci
numbers (store them in fib)
Recall: F_n = F_{n-1} + F_{n-2}, F_1 = 1, F_2 = 1
5. What does the function table do?
6. How many of the first 50 Fibonacci numbers are
divisible by 3? (Recall: a %% b)
7. What is the mean of the first 15 Fibonacci numbers?
12Introduction to R (Alex Storer, IQSS)1/20/12
Data Frames
• Data frames are the most common way to store data in R
• Like a matrix, but columns and rows can have names
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
• You can access columns by name using the $ operator
• e.g., count(mtcars$gear)
13Introduction to R (Alex Storer, IQSS)1/20/12
Data Frames: viewing and manipulating
> summary(mtcars)
> write.csv(mtcars,'/tmp/blah.csv')
> mtcars$logmpg <- log(mtcars$mpg)
> rownames(mtcars)
> colnames(mtcars)
14Introduction to R (Alex Storer, IQSS)1/20/12
Data Frames: viewing and manipulating
> mtcars["Valiant",]
> mtcars["Valiant",c("hp","gear")]
> mtcars[c(1,8,3,5,2),c("hp","gear")]
> guzzlers <- subset(mtcars,mpg<15)
15Introduction to R (Alex Storer, IQSS)1/20/12
Exercise 2
1. What is the median mpg for cars with horsepower ("hp")
less than 100?
2. Use the order command to print the state.name sorted
by state.area
3. Use the order command and write.csv to save the
mtcars dataset sorted by horsepower
4. What is the name of the car with the most horsepower
out of all cars with 4 cylinders ("cyl")?
16Introduction to R (Alex Storer, IQSS)1/20/12
The best thing about R…
install.packages(" ")
3546
Possibilities.
Curated Lists of Packages:
• Chemometrics and
Computational Physics
• Medical Image Analysis
• Phylogenetics
• Ecological and
Environmental Data
http://cran.r-project.org/web/views/
17Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
• http://jisao.washington.edu/data/doe_prec/
Goals: Make a 'movie' of rainfall anomalies on a map of the
Earth and compare data in Boston and Tokyo
• This is a long example!
• We'll look a lot of R tricks along the way
• Most important: how to solve general "how do I do"
problems!
18Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Just load the data into R
Data is in NetCDF format…
…there's a package for that!
> install.packages("ncdf")
> library("ncdf")
We only need to install the package once.
We have to load it each time we wish to use it.
19Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Just load the data into R
Let's just get the data!
> sfile <-
"http://jisao.washington.edu/data/doe_prec/pre
cipanom18511989.nc"
> download.file(sfile,"/tmp/recip.nc")
> dat <- open.ncdf("/tmp/recip.nc")
> dat
Notice the use of '/' (this can trip you up in Windows!)
[1] "file /tmp/recip.nc has 3 dimensions:"
[1] "lat Size: 44"
[1] "lon Size: 72"
[1] "time Size: 556"
[1] "------------------------"
[1] "file /tmp/recip.nc has 1 variables:"
[1] "short data[lon,lat,time] Longname:djf,mam,jja,son mm. Missval:32767"
20Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Just load the data into R
The data is in R, but we have to load the variables.
> rain.data <- get.var.ncdf(dat, 'data')
> rain.lat <- get.var.ncdf(dat, 'lat')
> rain.lon <- get.var.ncdf(dat, 'lon')
> rain.time <- get.var.ncdf(dat, 'time')
21Introduction to R (Alex Storer, IQSS)1/20/12
Aside: a few more R basics
We can examine the variables in our workspace:
> ls()
[1] "dat" "rain.data" "rain.lat"
"rain.lon" "rain.time"
…guesses on how to remove a variable?
22Introduction to R (Alex Storer, IQSS)1/20/12
Aside: a few more R basics
> class(dat)
[1] "ncdf"
> class(rain.data)
[1] "array"
Some basic R commands are overloaded:
• e.g., plot can plot classes "array" and "data.frame"
Most analyses can be stored as an object that can be
summarized, plotted and contain relevant information
R is object oriented and each object's class can be queried.
23Introduction to R (Alex Storer, IQSS)1/20/12
Exercises 3
1. Use the diff command to investigate the spacing of
the data points in time
2. Make a new variable called rain.lat.inc that
contains the latitude values in increasing order. (Hint:
can you reverse the array? Use the seq function to
index the array from the end to the beginning.)
3. Use the ls command to verify that this variable now
exists and check its object type using class.
4. Make an array containing all points at time 300.
5. Bonus! Figure out how to write a function that reverses
an array. (Try ?function)
24Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Just plot one time point
> image(rain.data[,,300])
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Axes are
strange… South
North
25Introduction to R (Alex Storer, IQSS)1/20/12
0 50 100 150 200 250 300 350
-50050
rain.lon
rain.lat.inc
Example: Rainfall Anomalies
Sub-goal: Fix our plot
> image(rain.lon,rain.lat.inc,rain.data[,,300])
26Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Fix our plot
revinds <- length(rain.lat):1
image(x=rain.lon,y=rain.lat.inc,rain.data[,revinds,300],
xlab="Longitude",ylab="Latitude",
main="Rainfall Anomalies")
0 50 100 150 200 250 300 350
-50050
Rainfall Anomalies
Longitude
Latitude
Map?
27Introduction to R (Alex Storer, IQSS)1/20/12
Package: maps
> install.packages("maps")
> library("maps")
> map("world")
> map("world2")
28Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-goal: Add map to our plot
image(x=rain.lon,y=rain.lat.inc,rain.data[,revinds,300],
xlab="Longitude",ylab="Latitude",
main="Rainfall Anomalies")
map("world2",add=TRUE)
0 50 100 150 200 250 300 350
-50050
Rainfall Anomalies
Longitude
Latitude
29Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Goal: Make a "movie"
1. Display all time points
one after another
2. Make sure we pause
after a plot is
displayed!
3. Add a title that tells us
what time we're
looking at
1. Use a for loop
2. New function:
Sys.sleep(time in
sec)
3. Figure out how to
convert the time index
to the correct year
and season
30Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Goal: Make a "movie"
31Introduction to R (Alex Storer, IQSS)1/20/12
seasons = c("Winter","Spring","Summer","Fall")
year.start = 1855
for (i in 1:length(rain.time)) {
Sys.sleep(0.05)
this.season <- seasons[(i %% 4) + 1]
this.year <- floor(i/4)+year.start
image(x=rain.lon,y=rain.lat.inc,
rain.data[,revinds,i],xlab="Longitude",
ylab="Latitude", main=paste('Rainfall Anomalies',
'n',this.year,':',this.season))
map("world2",add=TRUE)
}
Example: Rainfall Anomalies
Goal: Compare Anomalies in Boston and Tokyo
32Introduction to R (Alex Storer, IQSS)1/20/12
1. Find the lat/lon values
of Boston and Tokyo
2. Find the indices for
those values
3. Make a time-series for
each location
1. Google it!
2. The which function
3. R provides a ts object
Example: Rainfall Anomalies
Sub-Goal: Return the index to the matrix for a lat/lon value
33Introduction to R (Alex Storer, IQSS)1/20/12
> rain.lon
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65
70 75 80 85 90 95 100 105 110 115 120 125 130 135
[29] 140 145 150 155 160 165 170 175 180 185 190 195 200 205
210 215 220 225 230 235 240 245 250 255 260 265 270 275
[57] 280 285 290 295 300 305 310 315 320 325 330 335 340 345
350 355
Boston: Lon 289, Lat 42
Closest Value
Example: Rainfall Anomalies
Sub-Goal: Find the index of a given lat/lon value
34Introduction to R (Alex Storer, IQSS)1/20/12
boston.lon <- 289; boston.lat <- 42
tokyo.lon <- 139; tokyo.lat <- 35
bos.lon.dist <- abs(boston.lon-rain.lon)
bos.lon.ind = which(bos.lon.dist==min(bos.lon.dist))
bos.lat.ind = which.min(abs(boston.lat-rain.lat))
tok.lon.ind = which.min(abs(tokyo.lon-rain.lon))
tok.lat.ind = which.min(abs(tokyo.lat-rain.lat))
Get the index
of the value that equals
the minimum distance
Exercise 4
1. Pick a city and find its indices in the array
2. Use summary to investigate the range of the data in your
city
3. Use hist to make a histogram of your city's rainfall data
4. Does this data look normal? Use qqplot to investigate.
5. Figure out how to run a Shapiro-Wilk test and interpret
its results
35Introduction to R (Alex Storer, IQSS)1/20/12
Example: Rainfall Anomalies
Sub-Goal: Make a time-series object for Boston and Tokyo
36Introduction to R (Alex Storer, IQSS)1/20/12
bos.ts <- ts(rain.data[bos.lon.ind,bos.lat.ind,],
start=year.start,frequency=4)
tok.ts <- ts(rain.dat[tok.lon.ind,tok.lat.ind,],
start=year.start,frequency=4)
Time Series Analysis
First Step: Plot your data!
37Introduction to R (Alex Storer, IQSS)1/20/12
plot(bos.ts,type='l',ylab="Anomalies")
lines(tok.ts,col="red")
Missing Data Strategies
38Introduction to R (Alex Storer, IQSS)1/20/12
• Easy to do!
• Precludes some analysis
• Might bias analysis
Ignore
• Still easy
• Analysis still runs
• Might bias analysis
Replace with the mean
• Called "imputation"
• A good strategy if you have a good model
• Repeating this gives estimates of uncertainty
• Amelia package in R
Replace using a model
Missing Data Strategies
39Introduction to R (Alex Storer, IQSS)1/20/12
• Probably makes sense to ignore
• Many procedures in R will crash by default with NA
values!
Time Series Analysis
40Introduction to R (Alex Storer, IQSS)1/20/12
Is there structure in the Rainfall Anomalies?
Start with just Boston…
> lag.plot(bos.ts,lags=9)
• Lag One: "Plot this season
versus the previous season"
• Lag Four: "Plot this season
versus the same season last
year"
• If they're correlated, points
will lie along a line
Time Series Analysis
41Introduction to R (Alex Storer, IQSS)1/20/12
Is there structure in the Rainfall Anomalies?
Start with just Boston…
> acf(bos.ts)
• Look at the
correlation
coefficients
all at once
Time Series Analysis
42Introduction to R (Alex Storer, IQSS)1/20/12
Is there structure in the Rainfall Anomalies?
Compare Boston to Tokyo
> ccf(bos.ts,tok.ts,na.action = na.pass)
• "Plot this season
against last
season in Tokyo"
• Look at the cross-
correlation
coefficients all at
once
More Time Series Data
IQSS, Harvard University 43
Time
sunspots
1750 1800 1850 1900 1950
050150250
> data(sunspots); plot(sunspots)
Exercise!
• Install the forecast package
• Use auto.arima to fit a model to the data
• Use forecast to plot the model prediction
• Can you find a model that gives a better prediction?
1/20/12 Introduction to R (Alex Storer, IQSS) 44
Thank You!
1/20/12 Introduction to R (Alex Storer, IQSS) 45
• Doing research in social science?
support@help.hmdc.harvard.edu
Feedback:
bit.ly/cf2012-IQSS

Mais conteúdo relacionado

Mais procurados

Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and DataframeNamgee Lee
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David SzakallasDatabricks
 
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellAn Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellDatabricks
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindsetEric Normand
 
Spark Schema For Free with David Szakallas
 Spark Schema For Free with David Szakallas Spark Schema For Free with David Szakallas
Spark Schema For Free with David SzakallasDatabricks
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data AnalysisPraveen Nair
 

Mais procurados (20)

Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellAn Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindset
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Spark Schema For Free with David Szakallas
 Spark Schema For Free with David Szakallas Spark Schema For Free with David Szakallas
Spark Schema For Free with David Szakallas
 
Pandas
PandasPandas
Pandas
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Scala+data
Scala+dataScala+data
Scala+data
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data Analysis
 

Destaque

IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policyalexstorer
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakShelly Sanchez Terrell
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Destaque (6)

IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policy
 
Matlab python-
Matlab python-Matlab python-
Matlab python-
 
Inaugural Addresses
Inaugural AddressesInaugural Addresses
Inaugural Addresses
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & Textspeak
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Semelhante a ComputeFest 2012: Intro To R for Physical Sciences

2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2rowensCap
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8Ivan Ma
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster packageAlberto Labarga
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 

Semelhante a ComputeFest 2012: Intro To R for Physical Sciences (20)

2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
R basics
R basicsR basics
R basics
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster package
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

ComputeFest 2012: Intro To R for Physical Sciences

  • 1. INTRODUCTION TO R ComputeFest 2012 Alex Storer Institute for Quantitative Social Science at Harvard astorer@iq.harvard.edu http://www.people.fas.harvard.edu/~astorer/computefest2012/
  • 2. Introduction • Research Technology Consulting at IQSS • Dataverse • Research Computing Environment • OpenScholar • Packages for computational text analysis, multiple imputation, and statistics modeling in R • Trainings on R, Stata, SAS 1/20/12 Introduction to R (Alex Storer, IQSS) 2 http://www.iq.harvard.edu
  • 3. Goals 1. Why should anyone use R? 2. Basic Familiarity with R 3. Getting Help and Installing Packages 4. Data Input and Output 5. Making Plots 6. Performing Statistical Analysis 7. Functions, loops and building blocks of programming A foundation to take the next step. We'll learn the basics as we go while working on real problems. 3Introduction to R (Alex Storer, IQSS)1/20/12
  • 4. Why should I use R? “I already know Matlab. Why would I want to use R?” 1. Matlab is expensive. 2. R is open-source and community developed. 3. R is platform independent (and can even be run online) 4. Beautiful graphics with clear syntax 5. The most actively developed cutting-edge statistics applications can be installed from inside R! 4Introduction to R (Alex Storer, IQSS)1/20/12
  • 5. Why should I use R? “I already know Python. Why would I ever want to learn a new language that already does a subset of what Python already does?” 1. R is designed to do data-processing and statistical analysis, and it makes it easy. 2. More support in R (e.g., the Use R! series) 3. More active development in R for data analysis 4. Who are you sharing your code with? 5Introduction to R (Alex Storer, IQSS)1/20/12
  • 6. Starting R 6Introduction to R (Alex Storer, IQSS)1/20/12
  • 7. Syntax Notes R Matlab Python x <- seq(1,10) # or x <- 1:10 # or x = 1:10 x = 1:10 %a less flexible %version of linspace x = range(1,11) # indices start at 0 for (i in x) {print("hello")} for (i = x) disp("hello") end for i in x: print("hello") foo.bar <- 10 > foo.bar [1] 10 foo.bar = 10 > foo.bar foo = bar: 10 foo.bar = 10 NameError: name 'foo' is not defined 7Introduction to R (Alex Storer, IQSS)1/20/12
  • 8. Vectors in R > x <- c(1,1,2,3,5,8,13,21) > length(x) [1] 8 > x[5] [1] 5 8Introduction to R (Alex Storer, IQSS)1/20/12 The c() command concatenates items into a vector of the same type! Note that R counts from 1.
  • 9. Logical Indexing, Growing Vectors > x<5 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE > x[x<5] [1] 1 1 2 3 > x[10] = 55 > x [1] 1 1 2 3 5 8 13 21 NA 55 > x %% 2 [1] 1 1 0 1 1 0 1 1 NA 1 9Introduction to R (Alex Storer, IQSS)1/20/12
  • 10. Exercise 1 10Introduction to R (Alex Storer, IQSS)1/20/12 1. Use logical indexing to find the area of Georgia. Use the included vectors state.name and state.area 2. What happens when you combine 2 and "2" using the c() command? 3. Make a vector containing 100 numbers, and use plot to visualize a mathematical function of your choice!
  • 11. One more function… > help("foo") > ?foo Basics Simple Description Function call Arguments Return Values Details How the function works The author Relevant Notes See Also Other relevant functions Examples Several sample calls 11Introduction to R (Alex Storer, IQSS)1/20/12
  • 12. Exercise 2 1. Which is bigger, or ? 2. What does the function rep do? 3. Make a vector containing 50 ones and call it fib 4. Use a for loop to compute the first 50 Fibonacci numbers (store them in fib) Recall: F_n = F_{n-1} + F_{n-2}, F_1 = 1, F_2 = 1 5. What does the function table do? 6. How many of the first 50 Fibonacci numbers are divisible by 3? (Recall: a %% b) 7. What is the mean of the first 15 Fibonacci numbers? 12Introduction to R (Alex Storer, IQSS)1/20/12
  • 13. Data Frames • Data frames are the most common way to store data in R • Like a matrix, but columns and rows can have names > head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 • You can access columns by name using the $ operator • e.g., count(mtcars$gear) 13Introduction to R (Alex Storer, IQSS)1/20/12
  • 14. Data Frames: viewing and manipulating > summary(mtcars) > write.csv(mtcars,'/tmp/blah.csv') > mtcars$logmpg <- log(mtcars$mpg) > rownames(mtcars) > colnames(mtcars) 14Introduction to R (Alex Storer, IQSS)1/20/12
  • 15. Data Frames: viewing and manipulating > mtcars["Valiant",] > mtcars["Valiant",c("hp","gear")] > mtcars[c(1,8,3,5,2),c("hp","gear")] > guzzlers <- subset(mtcars,mpg<15) 15Introduction to R (Alex Storer, IQSS)1/20/12
  • 16. Exercise 2 1. What is the median mpg for cars with horsepower ("hp") less than 100? 2. Use the order command to print the state.name sorted by state.area 3. Use the order command and write.csv to save the mtcars dataset sorted by horsepower 4. What is the name of the car with the most horsepower out of all cars with 4 cylinders ("cyl")? 16Introduction to R (Alex Storer, IQSS)1/20/12
  • 17. The best thing about R… install.packages(" ") 3546 Possibilities. Curated Lists of Packages: • Chemometrics and Computational Physics • Medical Image Analysis • Phylogenetics • Ecological and Environmental Data http://cran.r-project.org/web/views/ 17Introduction to R (Alex Storer, IQSS)1/20/12
  • 18. Example: Rainfall Anomalies • http://jisao.washington.edu/data/doe_prec/ Goals: Make a 'movie' of rainfall anomalies on a map of the Earth and compare data in Boston and Tokyo • This is a long example! • We'll look a lot of R tricks along the way • Most important: how to solve general "how do I do" problems! 18Introduction to R (Alex Storer, IQSS)1/20/12
  • 19. Example: Rainfall Anomalies Sub-goal: Just load the data into R Data is in NetCDF format… …there's a package for that! > install.packages("ncdf") > library("ncdf") We only need to install the package once. We have to load it each time we wish to use it. 19Introduction to R (Alex Storer, IQSS)1/20/12
  • 20. Example: Rainfall Anomalies Sub-goal: Just load the data into R Let's just get the data! > sfile <- "http://jisao.washington.edu/data/doe_prec/pre cipanom18511989.nc" > download.file(sfile,"/tmp/recip.nc") > dat <- open.ncdf("/tmp/recip.nc") > dat Notice the use of '/' (this can trip you up in Windows!) [1] "file /tmp/recip.nc has 3 dimensions:" [1] "lat Size: 44" [1] "lon Size: 72" [1] "time Size: 556" [1] "------------------------" [1] "file /tmp/recip.nc has 1 variables:" [1] "short data[lon,lat,time] Longname:djf,mam,jja,son mm. Missval:32767" 20Introduction to R (Alex Storer, IQSS)1/20/12
  • 21. Example: Rainfall Anomalies Sub-goal: Just load the data into R The data is in R, but we have to load the variables. > rain.data <- get.var.ncdf(dat, 'data') > rain.lat <- get.var.ncdf(dat, 'lat') > rain.lon <- get.var.ncdf(dat, 'lon') > rain.time <- get.var.ncdf(dat, 'time') 21Introduction to R (Alex Storer, IQSS)1/20/12
  • 22. Aside: a few more R basics We can examine the variables in our workspace: > ls() [1] "dat" "rain.data" "rain.lat" "rain.lon" "rain.time" …guesses on how to remove a variable? 22Introduction to R (Alex Storer, IQSS)1/20/12
  • 23. Aside: a few more R basics > class(dat) [1] "ncdf" > class(rain.data) [1] "array" Some basic R commands are overloaded: • e.g., plot can plot classes "array" and "data.frame" Most analyses can be stored as an object that can be summarized, plotted and contain relevant information R is object oriented and each object's class can be queried. 23Introduction to R (Alex Storer, IQSS)1/20/12
  • 24. Exercises 3 1. Use the diff command to investigate the spacing of the data points in time 2. Make a new variable called rain.lat.inc that contains the latitude values in increasing order. (Hint: can you reverse the array? Use the seq function to index the array from the end to the beginning.) 3. Use the ls command to verify that this variable now exists and check its object type using class. 4. Make an array containing all points at time 300. 5. Bonus! Figure out how to write a function that reverses an array. (Try ?function) 24Introduction to R (Alex Storer, IQSS)1/20/12
  • 25. Example: Rainfall Anomalies Sub-goal: Just plot one time point > image(rain.data[,,300]) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Axes are strange… South North 25Introduction to R (Alex Storer, IQSS)1/20/12
  • 26. 0 50 100 150 200 250 300 350 -50050 rain.lon rain.lat.inc Example: Rainfall Anomalies Sub-goal: Fix our plot > image(rain.lon,rain.lat.inc,rain.data[,,300]) 26Introduction to R (Alex Storer, IQSS)1/20/12
  • 27. Example: Rainfall Anomalies Sub-goal: Fix our plot revinds <- length(rain.lat):1 image(x=rain.lon,y=rain.lat.inc,rain.data[,revinds,300], xlab="Longitude",ylab="Latitude", main="Rainfall Anomalies") 0 50 100 150 200 250 300 350 -50050 Rainfall Anomalies Longitude Latitude Map? 27Introduction to R (Alex Storer, IQSS)1/20/12
  • 28. Package: maps > install.packages("maps") > library("maps") > map("world") > map("world2") 28Introduction to R (Alex Storer, IQSS)1/20/12
  • 29. Example: Rainfall Anomalies Sub-goal: Add map to our plot image(x=rain.lon,y=rain.lat.inc,rain.data[,revinds,300], xlab="Longitude",ylab="Latitude", main="Rainfall Anomalies") map("world2",add=TRUE) 0 50 100 150 200 250 300 350 -50050 Rainfall Anomalies Longitude Latitude 29Introduction to R (Alex Storer, IQSS)1/20/12
  • 30. Example: Rainfall Anomalies Goal: Make a "movie" 1. Display all time points one after another 2. Make sure we pause after a plot is displayed! 3. Add a title that tells us what time we're looking at 1. Use a for loop 2. New function: Sys.sleep(time in sec) 3. Figure out how to convert the time index to the correct year and season 30Introduction to R (Alex Storer, IQSS)1/20/12
  • 31. Example: Rainfall Anomalies Goal: Make a "movie" 31Introduction to R (Alex Storer, IQSS)1/20/12 seasons = c("Winter","Spring","Summer","Fall") year.start = 1855 for (i in 1:length(rain.time)) { Sys.sleep(0.05) this.season <- seasons[(i %% 4) + 1] this.year <- floor(i/4)+year.start image(x=rain.lon,y=rain.lat.inc, rain.data[,revinds,i],xlab="Longitude", ylab="Latitude", main=paste('Rainfall Anomalies', 'n',this.year,':',this.season)) map("world2",add=TRUE) }
  • 32. Example: Rainfall Anomalies Goal: Compare Anomalies in Boston and Tokyo 32Introduction to R (Alex Storer, IQSS)1/20/12 1. Find the lat/lon values of Boston and Tokyo 2. Find the indices for those values 3. Make a time-series for each location 1. Google it! 2. The which function 3. R provides a ts object
  • 33. Example: Rainfall Anomalies Sub-Goal: Return the index to the matrix for a lat/lon value 33Introduction to R (Alex Storer, IQSS)1/20/12 > rain.lon [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 [29] 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 [57] 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 Boston: Lon 289, Lat 42 Closest Value
  • 34. Example: Rainfall Anomalies Sub-Goal: Find the index of a given lat/lon value 34Introduction to R (Alex Storer, IQSS)1/20/12 boston.lon <- 289; boston.lat <- 42 tokyo.lon <- 139; tokyo.lat <- 35 bos.lon.dist <- abs(boston.lon-rain.lon) bos.lon.ind = which(bos.lon.dist==min(bos.lon.dist)) bos.lat.ind = which.min(abs(boston.lat-rain.lat)) tok.lon.ind = which.min(abs(tokyo.lon-rain.lon)) tok.lat.ind = which.min(abs(tokyo.lat-rain.lat)) Get the index of the value that equals the minimum distance
  • 35. Exercise 4 1. Pick a city and find its indices in the array 2. Use summary to investigate the range of the data in your city 3. Use hist to make a histogram of your city's rainfall data 4. Does this data look normal? Use qqplot to investigate. 5. Figure out how to run a Shapiro-Wilk test and interpret its results 35Introduction to R (Alex Storer, IQSS)1/20/12
  • 36. Example: Rainfall Anomalies Sub-Goal: Make a time-series object for Boston and Tokyo 36Introduction to R (Alex Storer, IQSS)1/20/12 bos.ts <- ts(rain.data[bos.lon.ind,bos.lat.ind,], start=year.start,frequency=4) tok.ts <- ts(rain.dat[tok.lon.ind,tok.lat.ind,], start=year.start,frequency=4)
  • 37. Time Series Analysis First Step: Plot your data! 37Introduction to R (Alex Storer, IQSS)1/20/12 plot(bos.ts,type='l',ylab="Anomalies") lines(tok.ts,col="red")
  • 38. Missing Data Strategies 38Introduction to R (Alex Storer, IQSS)1/20/12 • Easy to do! • Precludes some analysis • Might bias analysis Ignore • Still easy • Analysis still runs • Might bias analysis Replace with the mean • Called "imputation" • A good strategy if you have a good model • Repeating this gives estimates of uncertainty • Amelia package in R Replace using a model
  • 39. Missing Data Strategies 39Introduction to R (Alex Storer, IQSS)1/20/12 • Probably makes sense to ignore • Many procedures in R will crash by default with NA values!
  • 40. Time Series Analysis 40Introduction to R (Alex Storer, IQSS)1/20/12 Is there structure in the Rainfall Anomalies? Start with just Boston… > lag.plot(bos.ts,lags=9) • Lag One: "Plot this season versus the previous season" • Lag Four: "Plot this season versus the same season last year" • If they're correlated, points will lie along a line
  • 41. Time Series Analysis 41Introduction to R (Alex Storer, IQSS)1/20/12 Is there structure in the Rainfall Anomalies? Start with just Boston… > acf(bos.ts) • Look at the correlation coefficients all at once
  • 42. Time Series Analysis 42Introduction to R (Alex Storer, IQSS)1/20/12 Is there structure in the Rainfall Anomalies? Compare Boston to Tokyo > ccf(bos.ts,tok.ts,na.action = na.pass) • "Plot this season against last season in Tokyo" • Look at the cross- correlation coefficients all at once
  • 43. More Time Series Data IQSS, Harvard University 43 Time sunspots 1750 1800 1850 1900 1950 050150250 > data(sunspots); plot(sunspots)
  • 44. Exercise! • Install the forecast package • Use auto.arima to fit a model to the data • Use forecast to plot the model prediction • Can you find a model that gives a better prediction? 1/20/12 Introduction to R (Alex Storer, IQSS) 44
  • 45. Thank You! 1/20/12 Introduction to R (Alex Storer, IQSS) 45 • Doing research in social science? support@help.hmdc.harvard.edu Feedback: bit.ly/cf2012-IQSS