1. Dr. E. N. SATHISHKUMAR,
Guest Lecturer,
Department of Computer Science,
Periyar University,
Salem -11.
2. Introduction
R (the language) was created in the early 1990s, by Ross Ihaka and
Robert Gentleman.
It is based upon the S language that was developed at Bell
Laboratories in the 1970s.
It is a high-level language like C#, Java, etc..,
R is an interpreted language (sometimes called a scripting language),
which means that your code doesn’t need to be compiled before you
run it.
R supports a mixture of programming paradigms (At its core, it is an
imperative language, but it also supports OOP, and functional
programming).
3. Getting started
Where to get R?
The newest version of R and its documentation can be downloaded
from http://www.r-project.org.
Download, Packages: Select CRAN
Set your Mirror: India (Indian Institute of Technology Madras)
Select http://ftp.iitm.ac.in/cran/
Select Download R for Windows
Select base.
Select Download R 3.4.2 for Windows
Execute the R-3.4.2-win.exe with administrator privileges. Once the
program is installed, run the R program by clicking on its icon
4. Choosing an IDE
If you use R under Windows or Mac OS X, then a graphical user
interface (GUI) is available to you.
Some of he best GUIs are:
Emacs + ESS
Eclipse/Architect
RStudio
Revolution-R
Live-R
Tinn-R
5. A Scientific Calculator
R is at heart a supercharged scientific calculator, so typing commands
directly into the R Console.
> 5+5
[1] 10
> 4-7
[1] -3
> 7*3
[1] 21
> 16/31
[1] 0.516129
> log2(32)
[1] 5
6. Variable Assignment
We assign values to variables with the assignment operator "=".
Just typing the variable by itself at the prompt will print out the value.
We should note that another form of assignment operator "<-" is also
in use.
> X = 2
[1] 2
> X <- 5
[1] 5
> X * X
[1] 25
7. Comments
All text after the pound sign "#" within the same line is considered a
comment.
> X = 2 # this is a comment
[1] 2
# 5 is assign to variable X
> X <- 5
[1] 5
8. Getting Help
R provides extensive documentation. If we want to help to
particular topic, just use help() with help topic.
For example,
> help("if")
starting httpd help server ... Done
Immediately help content opens in web browser.
9. Basic Data Types
There are several basic R data types that are of frequent
occurrence in routine R calculations.
Numeric
Integer
Complex
Logical
Character
Factor
10. Numeric
Decimal values are called numerics in R. It is the default
computational data type.
If we assign a decimal value to a variable x as follows, x will be
of numeric type.
> x = 10.5 # assign a decimal value
> x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric"
11. Numeric
Furthermore, even if we assign an integer to a variable k, it is still being
saved as a numeric value.
> k = 1
> k # print the value of k
[1] 1
> class(k) # print the class name of k
[1] "numeric"
The fact that k is not an integer can be confirmed with the is.integer
function.
> is.integer(k) # is k an integer?
[1] FALSE
12. Integer
In order to create an integer variable in R, we invoke the as.integer
function.
For example,
> y = as.integer(3)
> y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
13. Complex
A complex value in R is defined via the pure imaginary value i.
For example,
> z = 1 + 2i # create a complex number
> z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex"
The following gives an error as −1 is not a complex value.
> sqrt(−1) # square root of −1
[1] NaN
Warning message: In sqrt(−1) : NaNs produced
14. Complex
Instead, we have to use the complex value −1 + 0i.
For example,
> sqrt(−1+0i) # square root of −1+0i
[1] 0+1i
An alternative is to coerce −1 into a complex value.
> sqrt(as.complex(−1))
[1] 0+1i
15. Logical
A logical value is often created via comparison between variables.
For example,
> x = 1; y = 2 # sample values
> z = x > y # is x larger than y?
> z # print the logical value
[1] FALSE
> class(z) # print the class name of z
[1] "logical"
16. Logical
A Standard logical operations are "&", "|" , "!" .
For example,
> u = TRUE; v = FALSE
> u & v # u AND v
[1] FALSE
> u | v # u OR v
[1] TRUE
> !u # negation of u
[1] FALSE
17. Character
A character object is used to represent string values in R. We
convert objects into character values with the as.character(). For
example,
> x = as.character(3.14)
> x # print the character string
[1] "3.14"
> class(x) # print the class name of x
[1] "character"
> x = as.character( “hai”)
> x # print the character string
[1] “hai”
> class(x) # print the class name of x
[1] "character"
18. Factor
The factor data type is used to represent categorical data. (i.e. data of
which the value range is a collection of codes).
For example, to create a vector of length five of type factor do the
following:
>sex <- c("male","male","female","male","female")
The object sex is a character object. You need to transform it to factor.
>sex <- factor(sex)
>sex
[1] male male female male female
Levels: female male
Use the function levels to see the different levels a factor variable has.
19. Data structures
Before you can perform statistical analysis in R, your data has to
be structured in some coherent way. To store your data R has the
following structures:
Vector
Matrix
Array
Data frame
Time-series
List
20. Vectors
A vector is a sequence of data elements of the same basic type.
Members in a vector are officially called components.
For example, Here is a vector containing three numeric values 2, 3, 5.
> c(2, 3, 5)
[1] 2 3 5
Here is a vector of logical values.
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
21. Combining Vectors
Vectors can be combined via the function c.
For example, Here is a vector containing three numeric
values 2, 3, 5.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> c(n, s)
[1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee"
22. Vector Arithmetics
Arithmetic operations of vectors are performed member-by-member.
For example, Here is a vector containing three numeric values 2, 3, 5.
> a = c(1, 3, 5, 7)
> b = c(1, 2, 4, 8)
We add a and b together, the sum would be a vector whose members
are the sum of the corresponding members from a and b.
> a + b
[1] 2 5 9 15
Similarly for subtraction, multiplication and division, we get new
vectors via member wise operations.
23. Vector Recycling Rule
If two vectors are of unequal length, the shorter one will be recycled in
order to match the longer vector.
For example, sum is computed by recycling values of the shorter vector.
> u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
> u + v
[1] 11 22 33 14 25 36 17 28 39
24. Vector Index
We retrieve values in a vector by declaring an index inside a single
square bracket "[ ]" operator.
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> > s[3]
[1] "cc"
25. Vector Negative Index
If the index is negative, it would strip the member whose position
has the same absolute value as the negative index.
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> s[-3]
[1] "aa" "bb" "dd" "ee"
Out-of-Range Index
If an index is out-of-range, a missing value will be reported via the
symbol NA.
>s[10]
[1] NA
26. Numeric Index Vector
A new vector can be sliced from a given vector with a numeric
index vector, which consists of member positions of the original
vector to be retrieved.
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3)]
[1] "bb" "cc"
27. Vector Duplicate Indexes
The index vector allows duplicate values. Hence the following
retrieves a member twice in one operation.
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3, 3)]
[1] "bb" "cc" "cc"
28. Vector Out-of-Order Indexes
The index vector can even be out-of-order. Here is a vector slice
with the order of first and second members reversed.
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 1, 3)]
[1] "bb" "aa" "cc"
29. Vector Range Index
To produce a vector slice between two indexes, we can use the
colon operator ":".
For example,
> s = c("aa", "bb", "cc", "dd", "ee")
> s[2:4]
[1] "bb" "cc" "dd"
30. Named Vector Members
We can assign names to vector members.
For example, the following variable v is a character string vector
with two members.
> v = c("Mary", "Sue")
> v
[1] "Mary" "Sue"
We now name the first member as First, and the second as Last.
> names(v) = c("First", "Last")
> v
First Last
"Mary" "Sue"
31. Named Vector Members
We can assign names to vector members.
For example, the following variable v is a character string vector
with two members.
> v = c("Mary", "Sue")
> v
[1] "Mary" "Sue”
We now name the first member as First, and the second as Last.
> names(v) = c("First", "Last")
> v
First Last
"Mary" "Sue"
32. Matrices
A matrix is a collection of data elements arranged in a row-
column layout.
A matrix can be regarded as a generalization of a vector.
As with vectors, all the elements of a matrix must be of the same
data type.
A matrix can be generated in several ways.
Use the function dim
Use the function matrix
33. Matrices
Use the function dim
> x <- 1:8 [,1] [,2] [,3] [,4]
> dim(x) <- c(2,4) [1,] 1 3 5 7
> X [2,] 2 4 6 8
Use the function matrix
> A = matrix(c(2, 4, 3, 1, 5, 7), nrow=2, ncol=3, byrow = T)
> A
> A <- matrix(c(2, 4, 3, 1, 5, 7),2,3,byrow=T)
> A
A [,1] [,2] [,3]
[1,] 2 4 3
[2,] 1 5 7
34. Accessing Matrices
An element at the mth row, nth column of A can be accessed by the
expression A[m, n].
> A[2, 3]
[1] 7
The entire mth row A can be extracted as A[m, ].
> A[2, ]
[1] 1 5 7
We can also extract more than one rows/columns at a time.
> A[ ,c(1,3)]
[,1] [,2]
[1,] 2 3
[2,] 1 7
35. Calculations on matrices
We construct the transpose of a matrix by interchanging its columns
and rows with the function t .
> t(A) # transpose of A
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
We can deconstruct a matrix by applying the c function, which
combines all column vectors into one.
> c(A)
[1] 2 4 3 1 5 7
36. Arrays
In R, Arrays are generalizations of vectors and matrices.
A vector is a one-dimensional array and a matrix is a two
dimensional array.
As with vectors and matrices, all the elements of an array must be
of the same data type.
An array of one dimension of two element may be constructed as
follows.
> x = array(c(T,F),dim=c(2))
> print(x)
[1] TRUE FALSE
37. Arrays
A three dimensional array - 3 by 3 by 3 - may
be created as follows.
> z = array(1:27,dim=c(3,3,3))
> dim(z)
[1] 3 3 3
> print(z)
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
, , 3
[,1] [,2] [,3]
[1,] 19 22 25
[2,] 20 23 26
[3,] 21 24 27
38. Accessing Arrays
R arrays are accessed in a manner similar to arrays in other
languages: by integer index, starting at 1 (not 0).
For example, the third dimension is a 3 by 3 array.
> z[,,3]
[,1] [,2] [,3]
[1,] 19 22 25
[2,] 20 23 26
[3,] 21 24 27
Specifying two of the three dimensions returns an array on one
dimension.
>z[,3,3]
[1] 25 26 27
39. Accessing Arrays
Specifying three of three dimension returns an element of the 3 by 3 by 3
array.
> z[3,3,3]
[1] 27
More complex partitioning of array may be had.
> z[,c(2,3),c(2,3)]
, , 1
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 2
[,1] [,2]
[1,] 22 25
[2,] 23 26
[3,] 24 27
40. Lists
A list is a collection of R objects.
list() creates a list. unlist()
transform a list into a vector.
The objects in a list do not have to
be of the same type or length.
>x <- c(1:4)
>y <- FALSE
> z <-
matrix(c(1:4),nrow=2,ncol=2)
> myList <- list(x,y,z)
> myList
[[1]]
[1] 1 2 3 4
[[2]]
[1]
FALSE
[[3]]
[,1] [,2]
[1,] 1 2
[2,] 3 4
41. Data Frame
A data frame is used for storing data like spreadsheet(table).
It is a list of vectors of equal length.
Most statistical modeling routines in R require a data frame as input.
For example,
> weight = c(150, 135, 210, 140)
> height = c(65, 61, 70, 65)
> gender = c("Fe","Fe","Ma","Fe")
> study = data.frame(weight,height,gender) # make the data frame
> study
weight height gender
1 150 65 Fe
2 135 61 Fe
3 210 70 Ma
4 140 65 Fe
42. Creating a data frame
The dataframe may be created directly using data.frame().
For example, the dataframe is created - naming each vector composing
the dataframe as part of the argument list.
> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
43. Accessing data frame elements
Use the subscript notation/specify column names to identify the elements
in the patient data frame [1] 25 34 28 52
>patientdata[1:2]
patientID age
1 1 25
2 2 34
3 3 28
4 4 52
>table(patientdata$diabetes, patientdata$status)
Excellent Improved Poor
Type1 1 0 2
Type2 0 1 0
>patientdata[c("diabetes", "status")]
diabetes status
1 Type1 Poor
2 Type2 Improved
3 Type1 Excellent
4 Type1 Poor
44. Functions
Most tasks are performed by calling a function in R. All R functions
have three parts:
the body(), the code inside the function.
the formals(), the list of arguments which controls how you
can call the function.
the environment(), the “map” of the location of the function’s
variables.
The general form of a function is given by:
functionname <- function(arg1, arg2,...)
{
Body of function: a collection of valid statements
}
45. Functions
Example 1: Creating a function, called f1, which adds a pair of numbers.
f1 <- function(x, y)
{
x+y
}
f1( 3, 4)
[1] 7
46. Functions
Example 2: Creating a function, called readinteger.
readinteger <- function()
{
n <- readline(prompt="Enter an integer: ")
return(as.integer(n))
}
print(readinteger())
Enter an integer: 55
[1] 55
47. Functions
Example 3: calculate rnorm()
x <- rnorm(100)
y <- x + rnorm(100)
plot(x, y)
my.plot <- function(..., pch.new=15)
{
plot(..., pch=pch.new)
}
my.plot(x, y)
48. Control flow
A list of constructions to perform testing and looping in R.
These allow you to control the flow of execution of a script typically
inside of a function. Common ones include:
if, else
switch
for
while
repeat
break
next
return
49. Simple if
Syntax:
if (test_expression) {statement}
Example:
x <- 5
if(x > 0)
{
print("Positive number")
}
Output:
[1] "Positive number"
Example:
x <- 4 == 3
if (x)
{
"4 equals 3"
}
Output:
[1] FALSE
51. Nested if...else
Syntax:
if ( test_expression1)
{
statement1
} else if ( test_expression2)
{
statement2
} else if ( test_expression3)
{
statement3
} else
statement4
Only one statement will get
executed depending upon the
test_expressions.
Example:
x <- 0
if (x < 0)
{
print("Negative number")
} else if (x > 0)
{
print("Positive number")
} else
print("Zero")
Output:
[1] "Zero"
52. ifelse()
There is a vector equivalent form of the if...else statement in R, the
ifelse() function.
Syntax:
ifelse(test_expression, x, y)
Example:
> a = c(5,7,2,9)
> ifelse(a %% 2 == 0,"even","odd")
Output:
[1] "odd" "odd" "even" "odd"
53. for
A for loop is used to iterate over a vector, in R programming.
Syntax:
for (val in sequence) {statement}
Example:
v <- c("this", "is", "the", “R", "for", "loop")
for(i in v)
{
print(i)
}
Output:
[1] "this"
[1] "is"
[1] "the"
[1] R
[1] "for"
[1] "loop"
54. Nested for loops
We can use a for loop within another for loop to iterate over two things
at once (e.g., rows and columns of a matrix).
Example:
for(i in 1:3)
{
for(j in 1:3)
{
print(paste(i,j))
}
}
Output:
[1] "1 1"
[1] "1 2"
[1] "1 3"
[1] "2 1"
[1] "2 2"
[1] "2 3"
[1] "3 1"
[1] "3 2"
[1] "3 3"
55. while
while loops are used to loop until a specific condition is met.
Syntax:
while (test_expression)
{
statement
}
Example:
i <- 1
while (i < 6)
{
print(i)
i = i+1
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
56. repeat
The easiest loop to master in R is repeat.
All it does is execute the same code over and over until you tell it to
stop.
Syntax:
repeat {statement}
Example:
x <- 1
repeat {
print(x)
x = x+1
if (x == 6){
break
}
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
57. break
A break statement is used inside a loop to stop the iterations and
flow the control outside of the loop.
Example:
x <- 1:5
for (val in x) {
if (val == 3){
break
}
print(val)
}
Output:
[1] 1
[1] 2
58. Replication
The rep() repeats its input several times.
Another related function, replicate() calls an expression several times.
rep will repeat the same random number several times, but replicate
gives a different number each time
Example:
>rep(runif(1), 5)
[1] 0.04573 0.04573 0.04573 0.04573 0.04573
>replicate(5, runif(1))
[1] 0.5839 0.3689 0.1601 0.9176 0.5388
59. Packages
Packages are collections of R functions, compiled code, data,
documentation, and tests, in a well-defined format.
The directory where packages are stored is called the library.
R comes with a standard set of packages.
Others are available for download and installation.
Once installed, they have to be loaded into the session to be used.
>.libPaths() # get library location
>library() # see all packages installed
>search() # see packages currently loaded
60. Adding Packages
You can expand the types of analyses you do be adding other packages.
For adding package, Download and install a package.
1 2
61. Loading Packages
To load a package that is already installed on your machine; and call the
library function with package name which package you want to load.
For example, the lattice package should be installed, but it won’t
automatically be loaded. We can load it with the library() or require().
>library(lattice)
Same as,
>library(eda) # load package "eda"
>require(eda) # the same
>library() # list all available packages
>library(lib = .Library) # list all packages in the default library
>library(help = eda) # documentation on package "eda"
62. Importing and Exporting Data
There are many ways to get data in and out.
Most programs (e.g. Excel), as well as humans, know how to deal
with rectangular tables in the form of tab-delimited text files.
Normally, you would start your R session by reading in some data to
be analysed. This can be done with the read.table function. Download
the sample data to your local directory...
>x <- read.table(“sample.txt", header = TRUE)
Also: read.delim, read.csv, scan
>write.csv(x, file = “samplenew.csv")
Also: write.matrix, write.table, write HANDSON
63. Frequently used Operators
<- Assign
+ Sum
- Difference
* Multiplication
/ Division
^ Exponent
%% Mod
%*% Dot product
%/% Integer division
%in% Subset
| Or
& And
< Less
> Greater
<= Less or =
>= Greater or =
! Not
!= Not equal
== Is equal
64. Frequently used Functions
c Concatenate
cbind,
rbind
Concatenate vectors
min Minimum
max Maximum
length # values
dim # rows, cols
floor Max integer in
which TRUE indices
table Counts
summary Generic stats
Sort, order,
rank
Sort, order, rank a
vector
print Show value
cat Print as char
paste c() as char
round Round
apply Repeat over rows,
cols
65. Statistical Functions
rnorm, dnorm, pnorm,
qnorm
Normal distribution random sample, density,
cdf and quantiles
lm, glm, anova Model fitting
loess, lowess Smooth curve fitting
sample Resampling (bootstrap, permutation)
.Random.seed Random number generation
mean, median Location statistics
var, cor, cov, mad, range Scale statistics
svd, qr, chol, eigen Linear algebra
66. Graphical Functions
plot Generic plot eg: scatter
points Add points
lines, abline Add lines
text, mtext Add text
legend Add a legend
axis Add axes
box Add box around all axes
par Plotting parameters (lots!)
colors, palette Use colors