QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2018
1. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
R Basics and Simulation
About R
R is a free software environment for statistical computing and graphics.
Provides a wide variety of statistical and graphical techniques
Many classical and modern statistical techniques have been implemented.
A few of these are built into the base R environment, but many are supplied as packages.
Convinient interface, RStudio. It is an integrated development environment (IDE) for R. It includes a
console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history,
debugging and workspace management.
Open R Studio
Frequently Used Data Types and R-Objects
The variables are not declared as some data type.
The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the
variable.
Data Type: Numeric, Integer, Character
## Numeric
## Assign a number to variable num
num <- 3.14
print(num)
## [1] 3.14
## simple calculation by calling the variable
print(num + 1)
## [1] 4.14
## Let's check the data type that has been assigned to num
print(class(num))
## [1] "numeric"
2. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
## Integer
## Assign the integer part variable num.int
num.int <- as.integer(num)
num.int
## [1] 3
## Let's check the data type
class(num.int)
## [1] "integer"
## Character
## Assign the integer part variable num.int
char <- "Hello"
char
## [1] "Hello"
## Let's check the data type
class(char)
## [1] "character"
R-Objects: Vectors, Matrices, Data Frames
# Create a vector with more than one element
# We use c() function which means to combine the elements into a vector
# create a vector of characters
col <- c('red','green',"yellow")
col
## [1] "red" "green" "yellow"
# create a vector of numeric
num <- c(1,2,3)
num
## [1] 1 2 3
3. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
# extract elements from vectors
num[1]
## [1] 1
# Create a matrices of vectors
# Several way to do this
# Use cbind() function which means column combines
Mcol <- cbind(num,num,num)
Mcol
## num num num
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 3 3 3
# Use rbind() function which means row combine
Mrow <- rbind(num,num,num)
Mrow
## [,1] [,2] [,3]
## num 1 2 3
## num 1 2 3
## num 1 2 3
# Use matrix function to fill in each element
M <- matrix(1:9,nrow=3,ncol=3)
M
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
# Now lets try combining numeric vector and character vector into a matrix
Mtry <- rbind(num,col)
class(Mtry) # Do you notice what has been changed here?
## [1] "matrix"
# extract elements from a matrix
M[1,3]
4. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
## [1] 7
# Create a data frame
df <- data.frame(x = col, y = num)
df
## x y
## 1 red 1
## 2 green 2
## 3 yellow 3
# extract element from data frame
df$x
## [1] red green yellow
## Levels: green red yellow
df$y[3]
## [1] 3
Calculation with R: Multiplication, Log, Exponential ,Power and Some
Useful Statistics
# for scaler
x <- 2
x*x
## [1] 4
# for vector
num
## [1] 1 2 3
num + num
## [1] 2 4 6
5. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
y <- c(0,1,2,3,4)
log(y)
## [1] -Inf 0.0000000 0.6931472 1.0986123 1.3862944
# for matrix
M <- matrix(1:9,ncol=3,nrow=3)
exp(M)
## [,1] [,2] [,3]
## [1,] 2.718282 54.59815 1096.633
## [2,] 7.389056 148.41316 2980.958
## [3,] 20.085537 403.42879 8103.084
# for data frame
df
## x y
## 1 red 1
## 2 green 2
## 3 yellow 3
df^2 # why is there a warning message?
## Warning in Ops.factor(left, right): '^' not meaningful for factors
## x y
## [1,] NA 1
## [2,] NA 4
## [3,] NA 9
# Useful statistics
mean(M)
## [1] 5
sum(M)
## [1] 45
Compute Deterministic Function in R: Sine(x), Polynomial x^2 + 3*x
7. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
Simulate Random Variables in R
# generate uniform variable between 0,1
u <- runif(10)
# plot to see what it looks like
plot(u)
hist(u)
8. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
# generate more data
u <- runif(1000)
plot(u)
hist(u) # Do you see what has changed?
9. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
# sample from a vector
# Type help(sample) to see the function arguments
sample(x = 1:100, size=1, replace=F)
## [1] 35
# Normal random variable, a very useful random variable used in statistics
n1 <- rnorm(1)
n1
## [1] -1.251055
n2 <- rnorm(1)
n2
## [1] 0.650681
# Generate a larger sample to see its distribution
n <- rnorm(1000)
plot(n)
10. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
hist(n)
plot(density(n), main="Density of n",xlab="n") # remember the shape of the distribution
11. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
# set seed to generate the same random number
set.seed(123)
n1 <- rnorm(1)
n1
## [1] -0.5604756
set.seed(123)
n2 <- rnorm(1)
n2
## [1] -0.5604756
Why is Normal Distribution so Useful?
# Let's look at some examples
# If we flip 10 coints and count the number of heads.
# what do you think the distribution of the count will look like.
# simulate 30 coin flips
x = sample(c("head","tail"),30,replace = T)
x
## [1] "head" "tail" "tail" "head" "tail" "tail" "tail" "head" "tail" "head"
## [11] "tail" "tail" "head" "tail" "head" "head" "head" "tail" "tail" "tail"
## [21] "tail" "tail" "tail" "tail" "tail" "tail" "head" "head" "tail" "tail"
# count the number of heads
x == "head"
12. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
## [1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [12] FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
sum( x == "head" )
## [1] 10
# repeat this 1000 times
headcount <- c() # create an empty vector
for (i in 1:1000){
x = sample(c("head","tail"),30,replace = T)
headcount[i] <- sum( x == "head" )
}
hist(headcount,main="Head Count in 30 Coin Flips") # plot the distribution, what do you see?
# How about we simulation from a different distribution?
# simulation from uniform distribution
x = runif(30)
sum(x)
## [1] 14.14213
# repeat this 1000 times
sumunif <- c() # create an empty vector
for (i in 1:1000){
x = runif(30)
sumunif[i] <- sum(x)
13. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
}
hist(sumunif,main="Sum of Uniform Random Variables") # plot the distribution, what do you see?
# The beam machine
# install.packages("animation")
# library(animation)
# balls = 200
# layers = 15
# ani.options(nmax=balls+layers-2)
# quincunx(balls, layers)
# We will illustrate this example during the hands on session
Writing Your Own Function
h <- function(x)(sin(x)^2+cos(x)^3)^(3/2)
# Defined the range of x value
x <- seq(0,10,len =100)
h(x)
## [1] 1.0000000000 0.9924415229 0.9708663012 0.9383916390 0.8996454821
## [6] 0.8600539168 0.8250760835 0.7995255648 0.7870556343 0.7898022779
## [11] 0.8081168229 0.8403199035 0.8824779760 0.9283032379 0.9693440650
## [16] 0.9956215190 0.9967769375 0.9636598362 0.8901501922 0.7749162064
## [21] 0.6227902584 0.4455330451 0.2620251351 0.0988159073 0.0002787684
## [26] NaN NaN NaN NaN NaN
## [31] NaN NaN NaN NaN NaN
## [36] NaN NaN NaN NaN 0.0712840321
## [41] 0.2261336483 0.4079475371 0.5882393238 0.7466649092 0.8700178635
## [46] 0.9520957115 0.9930696121 0.9982195163 0.9762294564 0.9373522986
14. R Basics and Simulation
file:///F/Homework/Feb%20Ugrad%20Presentations/RTutorial_YawenGuan.html[3/6/2018 9:02:11 PM]
## [51] 0.8917535712 0.8482638203 0.8136393063 0.7922914495 0.7863422320
## [56] 0.7958337805 0.8189731051 0.8523909547 0.8914739366 0.9308440204
## [61] 0.9650074350 0.9891120153 0.9996831561 0.9951835470 0.9762685521
## [66] 0.9456767896 0.9077819385 0.8679102229 0.8315735142 0.8037642898
## [71] 0.7884053128 0.7879647488 0.8031771984 0.8327963036 0.8733616611
## [76] 0.9190612631 0.9618485403 0.9919789195 0.9990544240 0.9735346297
## [81] 0.9085333538 0.8016159861 0.6562765079 0.4828356883 0.2987209003
## [86] 0.1287563460 0.0103723359 NaN NaN NaN
## [91] NaN NaN NaN NaN NaN
## [96] NaN NaN NaN NaN NaN
plot(x, h(x),type="l")
# notice what happend to some h(x) values
Use functions written by others
# install.packages("mcsm")
library("mcsm")
# see what functions are in the coda package
ls("package:mcsm")
# see how to use a particular function
help(mcsm)
# some package also comes with very neat example
demo(package = "mcsm")
demo(Chapter.2)