SlideShare uma empresa Scribd logo
1 de 71
Baixar para ler offline
Chameera Dedduwage, B.Sc. (Colombo), Pg. Dip. (Applied Stat., Colombo)
2024 April
Introduction to R
This document is not meant to serve as a formal language declaration or an
exhaustive guide to R. Rather, its purpose is to provide a firm understanding of the
building blocks of R so that the knowledge can be applied to various use cases.
A direct result of this approach is that much of the slides here will have illustrative
examples that a user must type into the R console.
How to Use This Document
What is R?
R is the GNU implementation of the S language developed by John Chambers, Rick Becker
and Allan Wilks at the AT&T Bell Labs (where the C and C++ languages were born).
The commercial implementation of S is called ‘S-PLUS’, while the copyleft (as opposed to
copyrighted) implementation is known as R. R was developed by Ross Ihaka and Robert
Gentleman, at University of Auckland.
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. Among other things it has,
● an effective data handling and storage facility,
● a suite of operators for calculations on arrays, in particular matrices,
● a large, coherent, integrated collection of intermediate tools for data analysis,
● graphical facilities for data analysis and display either directly at the computer or on hard-copy, and,
● a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and
output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
● Objects in R
○ Basic Object Types
○ Extending Basic Objects via Attributes
● Operations
○ Arithmetic, Assignment, Relational, Logical,
and Special
○ Filtering Data via Vector Indexing
○ Multidimensional (Homogeneous Data) -
Arrays and Matrices
○ Heterogeneous Data - Lists and Data
Frames
○ Aggregation
● Flow Control
○ Conditional
○ Repetition
○ Jumps
Outline of This Presentation
Part I: Objects in R
Everything is an Object in R
R deals with data stored in memory. Even data from external sources must be
loaded into computer memory before they can be manipulated.
R does not provide direct access to the computer’s memory. Rather, R provides a
number of specialized data structures referred to as objects.
These objects are referred to through symbols or variables. Furthermore, these
symbols are themselves objects and can be manipulated in the same way as any
other object.
Furthermore, everything in R is an object, including executable code (functions).
This is crucial to understanding and mastering R.
Basic Object Types
Vectors (‘atomic vectors’)
Lists (‘recursive vectors’)
Language objects
Expression objects
Function objects
NULL
The 12 Basic Object Types in R
Built In objects and special forms
Promise objects
Dot-dot-dot
Environments
Pairlist objects
The “Any” type
Vector Objects
Vectors can be thought of as contiguous cells containing data. They are usually
accessed through indexing operations such as x[5]. Indexing is a bit more involved
in R because it includes filtering as well. More on this later.
Vectors must have their values all of the same mode. Thus any given vector must
be unambiguously either logical, numeric, complex, character or raw.
Numerical literals such as 42, 1e3, (-6.5), as well as character strings such as “Hello,
world” are vectors of length 1. Zero-length vectors are also possible.
R has six basic (‘atomic’) vector types: logical, integer, real, complex, character (in C
aka ‘string’) and raw. In addition, R has list vectors.
The Six Atomic Vector Types in R
Type typeof mode storage.mode
Logical logical logical logical
Integer integer numeric integer
Real double numeric double
Complex complex complex complex
String character character character
Raw raw raw raw
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1.
List Objects
Lists are ordered collections of elements, each of which can contain any type of R
object. List elements can be heterogeneous. List elements are accessed through
three different indexing operations.
Lists are vectors as well. To distinguish basic vectors from lists, basic vectors are
usually referred to as ‘atomic vectors’, and lists are referred to as ‘recursive vectors’
(since the elements of a list themselves can be lists).
There are three language objects: calls, expressions, and names. Confusingly, R
has another object type called "expression".
Unlike arrays and matrices, this provides an intrinsic way to handle modeling or
formulae.
Language Objects
Using the Language Object to Create Formulas
Unlike arrays and matrices, R provides an
intrinsic way to handle modeling or formulae via
the language object.
class(fo <- y ~ x1*x2) # "formula"
fo
typeof(fo) # R internal : "language"
terms(fo)
environment(fo)
environment(as.formula("y ~ x"))
environment(as.formula("y ~ x", env =
new.env()))
Function Objects
In R, functions are also objects and can be manipulated in much the same way as
any other object. Functions (or more precisely, function closures) have three basic
components: a formal argument list, a body and an environment.
It is possible to have closures as well. Closures are delimited by braces, {} , and
unlike functions, only have a body. Since they lack an environment, symbols
declared within a closure belong to the parent environment.
Operators in R are functions as well. This is an important feature that will become
important in OOP.
Built-in Objects (and Special Forms)
These two kinds of object contain the builtin functions of R, i.e., those that are
displayed as .Primitive in code listings (as well as those accessed via the .Internal
function and hence not user-visible as objects). The difference between the two lies
in the argument handling. Builtin functions have all their arguments evaluated and
passed to the internal function, in accordance with call-by-value, whereas special
functions pass the unevaluated arguments to the internal function.
From the R language, these objects are just another kind of function. The
is.primitive function can distinguish them from interpreted functions.
Environment Objects
Environments can be thought of as consisting of two things. A frame, consisting of a
set of symbol-value pairs, and an enclosure, a pointer to an enclosing environment.
When R looks up the value for a symbol the frame is examined and if a matching
symbol is found its value will be returned. If not, the enclosing environment is then
accessed and the process repeated.
Environments are created implicitly by function calls.
Environments form a tree structure in which the enclosures play the role of parents.
The tree of environments is rooted in an empty environment, available through
emptyenv(), which has no parent. It is the direct parent of the environment of the
base package (available through the baseenv() function).
Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a
value, an expression, and an environment.
When a function is called the arguments are matched and then each of the formal
arguments is bound to a promise. The expression that was given for that formal argument
and a pointer to the environment the function was called from are stored in the promise.
Until that argument is accessed there is no value associated with the promise. When the
argument is accessed, the stored expression is evaluated in the stored environment, and
the result is returned. The result is also saved by the promise. The substitute function will
extract the content of the expression slot. This allows the programmer to access either the
value or the expression associated with the promise.
Promise Objects
Pairlist Objects
The use of pairlists is deprecated since generic vectors are usually more efficient to
use. When an internal pairlist is accessed from R it is generally (including when
subsetted) converted to a generic vector.
NULL, Any, and dot-dot-dot (...) Objects
There is a special object called NULL. It is used whenever there is a need to indicate or specify that an
object is absent. It should not be confused with a vector or list of zero length. The NULL object has no type
and no modifiable properties. There is only one NULL object in R, to which all instances refer. To test for
NULL use is.null. You cannot set attributes on NULL.
It is not really possible for an object to be of “Any” type, but it is nevertheless a valid type value. It gets used
in certain (rather rare) circumstances, e.g. as.vector(x, "any"), indicating that type coercion should not be
done.
The ... object type is stored as a type of pairlist. The components of ... can be accessed in the usual pairlist
manner from C code, but ... is not easily accessed as an object in interpreted code, and even the existence
of such an object should typically not be assumed, as that may change in the future. If a function has ... as a
formal argument then any actual arguments that do not match a formal argument are matched with …
Extending Basic Types via Attributes
In R, every object has at least a few attributes. These include the length, mode,
class, type, and storage mode. In addition, there are others such as dim and names.
Attributes tell R to interpret and handle an object in a specific way.
For example, a list object with a class attribute of “data.frame” will be handled
differently than a list. A vector having a class attribute of “factor” will be printed
differently. The names attribute of a list will make it possible to access elements by
name.
Attributes of Objects
Common Attributes: mode, length, and class
By the mode of an object we mean the basic type of its fundamental constituents. This is a special
case of a “property” of an object.
Another property of every object is its length. The functions mode(object) and length(object) can
be used to find out the mode and length of any defined structure.
All objects in R have a class, reported by the function class. For simple vectors this is just the
mode, for example "numeric", "logical", "character" or "list", but "matrix", "array", "factor" and
"data.frame" are other possible values.
A special attribute known as the class of the object is used to allow for an object-oriented style of
programming in R. For example, if an object has class "data.frame", it will be printed in a certain
way, the plot() function will display it graphically in a certain way, and other so-called generic
functions such as summary() will react to it as an argument in a way sensitive to its class.
Part II: Operations
● Atomic Vectors
○ Integer
○ Numeric
○ Complex
○ Character
○ Logical
● List Vectors
● Special Types
○ Matrices
○ Arrays
○ Factors
○ Data frames
Data Types and Operators
● Arithmetic
○ Addition, subtraction, multiplication,
exponentiation, division, integer division,
modulus
● Assignment
○ x = value, x <- value, value -> x, x <<- value,
value ->> x
● Relational
○ <, >, >=, <=, ==, !=
● Logical
○ &&, ||, !, &, |
● Special
○ :, %in%
> 5%%2 # five modulus 2
[1] 2 # returns remainder after division
> 5%/%2 # five integer-division 2
[1] 2 # returns quotient of division
> 5 / 0 # division by zero
[1] Inf # returns Inf
> 0 / 0 # zero divided by zero
[1] NaN # returns NaN (Not a Number)
> 2 + 2
[1] 4 # the [1] before the answer indicates that the answer is a 1-d
vector (of one element)
> 3-2
[1] -1
> 5*2
[1] 10
> 6^2 # six to the power 2
[1] 36
> 6**2 # six to the power 2, identical to Python format
[1] 36
> 5/2 # five divided by 2
[1] 2.5 # returns a decimal
Scalar Arithmetic
NaN and Inf
# mixed operations: all mathematical operators return NaN
> Inf + NaN
[1] NaN
> NaN +1 # addition/subtraction of a scalar to NaN returns NaN
[1] NaN
> NaN -1
[1] NaN
> NaN + NaN # same with *
[1] NaN
> NaN - NaN # same with /
[1] NaN
> Inf + 1
[1] Inf
> Inf - 1
[1] Inf
> Inf + Inf # same with *
[1] Inf
# but subtraction and division between Inf returns NaN
> Inf - Inf # same with /
[1] NaN
A Summary of NaN and Inf in R
A op B B = 0 B = 1 B = (-1) B = NaN Inf
A = 0 + 0
- 0
* 0
/ NaN
+ 1
- (-1)
* 0
/ 0
+ (-1)
- 1
* 0
/ 0
+ NaN
- NaN
* NaN
/ NaN
+ Inf
- (-Inf)
* NaN
/ 0
A = 1 + 1
- 1
* 0
/ Inf
+ 2
- 0
* 1
/ 1
+ 0
- 2
* (-1)
/ (-1)
+ NaN
- NaN
* NaN
/ NaN
+ Inf
- (-Inf)
* Inf
/ 0
A = (-1) + (-1)
- (-1)
* 0
/ (-Inf)
+ 0
- (-2)
* 0
/ (-1)
+ (-2)
- 0
* 1
/ 1
+ NaN
- NaN
* NaN
/ NaN
+ Inf
- (-Inf)
* (-Inf)
/ 0
A = NaN + NaN
- NaN
* NaN
/ NaN
+ NaN
- NaN
* NaN
/ NaN
+ NaN
- NaN
* NaN
/ NaN
+ NaN
- NaN
* NaN
/ NaN
+ NaN
- NaN
* NaN
/ NaN
A = Inf + Inf
- Inf
* NaN
/ Inf
+ Inf
- Inf
* Inf
/ Inf
+ Inf
- Inf
* (-Inf)
/ (-Inf)
+ NaN
- NaN
* NaN
/ NaN
+ Inf
- NaN
* Inf
/ NaN
Vectors: Addition & Subtraction
> c(1, 3, 4, 7) # c() stands for ‘combine’ - notice that it’s a simple c - R is case sensitive
[1] 1 3 4 7 # the result is a 1-d vector of 4 elements
> c(1, 3, 4, 7) + c(2, 3, 5, 8) # vector addition, equal length: corresponding elements added
[1] 3 6 9 15
> c(12, 15, 28, 74) + c (2, 8) # unequal addition: smaller vector is recycled & added
[1] 14 23 30 82
> c(15, 18, 21) + 5 # same logic applies to scalars - scalars are treated as one-element vectors
[1] 20 23 26
> c(91, 90, 76, 54, 23) - c(2, 3) # unequal lengths where larger vector length is not a multiple of the smaller vector length - smaller vector is recycled -
with a warning
[1] 89 87 74 51 21
Warning message:
In c(91, 90, 76, 54, 23) - c(2, 3) :
longer object length is not a multiple of shorter object length
Vectors: Multiplication & Division
> c(1, 3, 5) * 2 # multiplication by scalar
[1] 2 6 10
> c(1, 3, 5) / 3 # division by a scalar
[1] 0.3333333 1.0000000 1.6666667
> c(1, 3, 5) * c(2, 4, 6) # multiplication of two vectors with equal lengths - element-wise multiplication
[1] 2 12 30
> c(1, 3, 5, 7, 9) * c(2, 5) # multiplication of two vectors with unequal lengths - smaller vector is recycled, just like in addition
[1] 2 15 10 35 18
Warning message:
In c(1, 3, 5, 7, 9) * c(2, 5) :
longer object length is not a multiple of shorter object length
> c(1, 3, 5, 7, 9, 11) / c(5, 10) # division by a vector
[1] 0.2 0.3 1.0 0.7 1.8 1.1
> TRUE
[1] TRUE
> FALSE
[1] FALSE
> TRUE || FALSE # logical OR
[1] TRUE
> TRUE && FALSE # logical AND
[1] FALSE
> ! FALSE # logical NOT
[1] TRUE
> TRUE + 1 # TRUE coerced to a numeric value
[1] 2
Logical Arithmetic
> TRUE + FALSE # logicals coerced to numeric values
[1] 1
> TRUE * FALSE
[1] 0
> TRUE / TRUE
[1] 1
> TRUE/ FALSE # same rules apply
[1] Inf
> 1 && 1 # the && operator wil coerce the ‘1’ into TRUE.
[1] TRUE
# same goes for the | and || operators
> (2 < 3) | (5=6)
Error in 5 = 6 : invalid (do_set) left-hand side to assignment
> (2 < 3) || (5=6)
[1] TRUE
> 2 = 3 # caveat: = is not equality
Error in 2 = 3 : invalid (do_set) left-hand side to assignment
> 2==3 # == is the equality comparison
[1] FALSE
> 2!=3
[1] TRUE
# this will throw an error, since & evaluates both operands,
regardless of the first comparison being sufficient
> (2 > 3) & (5=6)
Error in 5 = 6 : invalid (do_set) left-hand side to assignment
# but the following will not; && will ‘short-circuit’ and return
> (2 > 3) && (5=6)
[1] FALSE
# but if the first comparison is inconclusive, then second will be
evaluated, throwing an error
> (2 < 3) && (5=6)
Error in 5 = 6 : invalid (do_set) left-hand side to assignment
Logical Comparisons
Missing Value: NA
> b = NA # missing value marker
> b
[1] NA
> class(b)
[1] "logical"
> b + 1
[1] NA
> b - 1
[1] NA
> b + TRUE
[1] NA
> b || TRUE
[1] TRUE
> b && TRUE
[1] NA
> b && FALSE
[1] FALSE
> b || FALSE
[1] NA
# you cannot check equality / inequality of NA
> NA==NA
[1] NA
> NA!=NA
[1] NA
Comparisons involving Inf, NaN, and NA
# Use of is.na() function. NaN==NA returns FALSE but:
> is.na(NaN)
[1] TRUE
> is.na(Inf)
[1] FALSE
# Use of is.nan() function.
> is.nan(NaN)
[1] TRUE
> is.nan(Inf)
[1] FALSE
> is.na(NA)
[1] TRUE
> Inf < NaN
[1] NA
> Inf == NaN
[1] NA
> Inf == Inf
[1] TRUE
> NaN == NaN
[1] NA
> Inf == NA
[1] NA
> NaN == NA
[1] NA
> x = TRUE; y = FALSE # TRUE & FALSE are boolean literals
> x
[1] TRUE
> x && y # logical AND, shortcut version
[1] FALSE
> c( x && y, x || y, !x, !y) # logical AND, OR, NOT, shortcut versions
[1] FALSE TRUE FALSE TRUE
# simple comparisons
> 2 >3
[1] FALSE
> 3 > 3
[1] FALSE
> 3>=3
[1] TRUE
> 2<=4
[1] TRUE
Logical Vectors
> v = 1:7 # simple sequence
> v
[1] 1 2 3 4 5 6 7
> v > 3 # elementwise comparison of vector with scalar
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE
> d = c(2,3) # another vector, with non-matching length
> v > d
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Warning message:
In v > d : longer object length is not a multiple of shorter object
length
> v[v>3] # since v>3 is a 7-element boolean vector, we can use it to
filter elements
[1] 4 5 6 7 # only the elements for which v>3 is TRUE are fetched
> v[TRUE] # for the sake of demonstration
[1] 1 2 3 4 5 6 7
> v[FALSE]
integer(0)
> exp(1) # R comes with a lot of built-in functions of the form f(...)
[1] 2.718282
> log(2) # function called with one argument, the second argument
defaults to e
[1] 0.6931472
> log(2, 10) # second argument is ‘base’, and it’s matched
positionally
[1] 0.30103
> log( 2, base = 10) # alternatively, the second argument can be
matched by name
[1] 0.30103
> log( x = 2, base = 10) # both arguments matched by name: order
doesn’t matter here, e.g. log( base = 10, x = 2 ) is identical
[1] 0.30103
Built-in Functions
> log( base = 10, 2) # R will first match the named argument, and the
unnamed arguments will be matched positionally
[1] 0.30103
# in fact, all operators like +, -, *, ^ are functions, and R calls these
functions under-the-hood when operators are used in expressions.
# summary functions
length(x) - number of elements in x
sum(x) - sum of elements in x
mean (x) - mean of elements in x
min(x) - minimum of elements in x
max(x) - maximum of elements in x
range(x) - returns a 2-element vector of c( min(x), max(x) )
var(x) - sample variance of elements in x
Numeric Sequences
> 1:10 # basic sequence
[1] 1 2 3 4 5 6 7 8 9 10
> 8:-2 # backward sequence
[1] 8 7 6 5 4 3 2 1 0 -1 -2
> seq( 2, 8) # the seq() function
[1] 2 3 4 5 6 7 8
> seq( from = 2, to = 8) # equivalent to above, named arguments
[1] 2 3 4 5 6 7 8
> seq( from = 2, to = 8, by = 2) # stepping parameter
[1] 2 4 6 8
> seq( from = 2, to = 8, by = 4) # when range is not a multiple of step
size, end value may not be included
[1] 2 6
> seq(8) # if only one argument is given, it’s matched with
‘to’ parameter, and ‘from’ defaults to 1
[1] 1 2 3 4 5 6 7 8
> v <- seq(1, 8, 2) # create sequence
> v
[1] 1 3 5 7
> 5 %in% v # the %in% operator
[1] TRUE
> 4 %in% v
[1] FALSE
Character Vectors
> "Hello" # string literals are treated as 1-d character vectors
[1] "Hello"
> ‘This is a string, too’ # they can be enclosed in single quotes, too: note how the R console delimits strings by double quotes, regardless
[1] “This is a string, too”
> c("a", "b") # you can have character vectors as well: note how c() combines, not concatenates
[1] "a" "b" # note how the result is a 2-element vector
> paste (“a”, “b”) # for concatenation, you need to call the ‘paste’ function
[1] "a b" # now the result is a 1-element vector. Note the space between: this is the default separator for paste
> paste (“a”, “b”, sep = “”) # let’s override the default one-space separator with a zero-length string
[1] "ab" # now it’s a proper concatenation
> paste(2) # note how a scalar is converted into a character array
[1] “2”
> paste( c(1,2)) # vectors are converted not concatenated
[1] "1" "2"
Complex & Integer Vectors
> a <- 2+3i # the symbol ‘i’ when placed after a numeric denotes a
complex number 0+1i, i being the square root of (-1)
> b <- 5-4i
> a+b
[1] 7-1i
> a-b
[1] -3+7i
> a*b
[1] 22+7i
> a/b
[1] -0.0487805+0.5609756i
> class(a) # the function ‘class’ retrieves what storage class this
variable is
[1] "complex"
> b = 20L # the suffix L tells R that this is of class integer
> class(b)
[1] "integer"
> b = 2.5L # trying to store numeric by force
Warning message:
integer literal 2.5L contains decimal; using numeric value
> class(b)
[1] "numeric"
> (5**2 + 57) -> y # right assignment also works
> x = 5; y = 4 # multiple assignment in one line with ‘ ; ’
> y <<- 75; 50->>x # yet another alternative, but this has to do with
assigning to a masked variable outside current scope
> assign(v, 1) # basic assignment
> # notice no output
> v <- 1 # syntactic shortcut
> # notice no output
> v # now type the variable name
[1] 1 # and you see the value
> v = 5 # alternative assignment method
> v # same as <- operator, except in the following case
> 5
> sin( x = 5) # assigns the value 5 to the x parameter of sin function
> [1] -0.9589243
> sin( x <-5) # creates a new variable x, assigns 5 to it, the whole
expression evaluates to 5, which then gives the value
> [1] -0.9589243 # the difference is, using = didn’t create a new
variable in the workspace, using <- does.
Symbols (Variables) & Assignment
Filtering Data (via Vector Indexing)
> v[c(1, 1)] # first element repeated twice
[1] 1 1
> > v[c(2, 1, 4, 3, 5, 3)] # doesn’t need to be a sequence
[1] 2 1 4 3 5 3
> v[-1] # negative args allowed; asks to drop the first element
[1] 2 3 4 5 6 7
> v[-5] # drop the fifth element
[1] 1 2 3 4 6 7
> v[ c(-1, -3)] # drop elements 1 & 3
[1] 2 4 5 6 7
> v[-1:-3] # drop elements 1 through 3
[1] 4 5 6 7
> v = c(1, 3, 5) # assignment of numeric vector
> v[1] # unlike C, this is 1-oriented
[1] 1 # the first value is returned
> v[0] # there is no element at “zeroth” position
> numeric(0) # graceful fallback
> v[1.7] # non-integers floored
[1] 1
> v = seq(1,7) # basic sequence from 1 to 7
> v
[1] 1 2 3 4 5 6 7
> v[1:3] # ask to return elements 1 to 3
[1] 1 2 3
> k = seq(1, 6, 2) # create a sequence 1 3 5
> k
[1] 1 3 5
> v[k] # returns elements at 1st, 3rd & 5th positions
[1] 1 3 5
> v[c(1,3,5)] # identical to v[k]
> [1] 1 3 5
Vector Indexing
> v[11] = 11 # gaps are filled with NA
> v
[1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0 8.0 NA NA 11.0
> class(v) # however this is numeric NA, not logical NA
[1] "numeric"
> class(v[9])
[1] "numeric"
> v = c(1, 7, 4, 0, 3, 3, 5, 6, 2, 9, 1, 1, 0, 7, 4, 6, 8, NA)
> sort(v)
[1] 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9 # NAs are missing
> sort(v, na.last = TRUE)
[1] 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9 NA
> sort(v, na.last = FALSE)
[1] NA 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9
> k # remember k is c(1 3 5)
[1] 1 3 5
> v[-k] # drop elements 1, 3 & 5
[1] 2 4 6 7
> v [ -8] # there is no eighth element to drop
[1] 1 2 3 4 5 6 7 # so the entire vector is returned
> v[4] = 4.5; v[5] = 4.5 # change elements 4 & 5
> v
[1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0
> v[8] = 8 # non-existent index adds element
> v
[1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0 8.0
Vector Indexing, Replacing, Inserting & Sorting
Four Types of Vector Indices
1. A vector of positive integral quantities. In this case the values in the index vector must lie in
the set {1, 2, …, length(x)}. The corresponding elements of the vector are selected and
concatenated, in that order, in the result. The index vector can be of any length and the result
is of the same length as the index vector.
2. A vector of negative integral quantities. Such an index vector specifies the values to be
excluded rather than included.
3. A logical vector. In this case the index vector is recycled to the same length as the vector from
which elements are to be selected. Values corresponding to TRUE in the index vector are
selected and those corresponding to FALSE are omitted. NA values in the index vector are
included in the result as NA.
4. A vector of character strings. This possibility only applies where an object has a names
attribute to identify its components. In this case a sub-vector of the names vector may be used
in the same way as the positive integral labels
Multidimensional (Homogeneous) Data
Special Types: Arrays, Matrices, and Factors
R provides no intrinsic way to handle arrays and matrices (unlike MATLAB or
OCTAVE). Instead, we create vectors and ask R to treat them as arrays or matrices
by setting the ‘dim’ attribute. Alternatively, we can use the array() and matrix()
functions to create these objects. Arrays can have any non-zero dimensions.
Matrices are a special case of arrays having just two dimensions.
Similarly, R has no intrinsic support of factors. This is done by asking R to treat a
vector as factors by setting its class manually (or by using the factor function).
# now output the array: notice the order in which data are filled
> d
, , 1
[,1] [,2] [,3]
[1,] 12 11 5
[2,] 14 9 13
[3,] 12 9 9
, , 2
[,1] [,2] [,3]
[1,] 10 5 9
[2,] 3 8 10
[3,] 7 14 11
# create 18 random numbers
> d = floor(rnorm(18, mean = 10, sd = 3))
> d
[1] 12 14 12 11 9 9 5 13 9 10 3 7 5 8 14 9 10 11
# change into an array
> d = array(d)
> class(d)
[1] "array"
# one dimension, 18 elements
> dim(d)
[1] 18
# change the dimensions to 3x3x2
> dim(d) = c(3, 3, 2)
# check dimensions
> dim(d)
[1] 3 3 2
Arrays
> x[1] # first element - no ambiguities
[1] 1
> x[1][1] # not like C, this doesn’t work as x[row][col]
[1] 1
> x[1,1] # but this does: x [row, col]
[1] 1
> x[1,2] # row one, column 2 is 5, and not 2
[1] 5
> x[1][2] # again, this fails
[1] NA
> x[15] # first increment rows, then column
[1] 15 # FORTRAN column-major order
Matrices from Vectors
> x = 1:20 # simple sequence
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> class(x)
[1] "integer" # class in atomic type integer
> dim(x) # dim attribute is not set
NULL
> dim(x) = c(4, 5) # setting dim will let R treat this as a matrix/array
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> class(x)
[1] "matrix" # 2-d array is a matrix
> dim(x) # dimension vector is a 2-element vector
[1] 4 5
Matrices from Functions
# the matrix function
> z = matrix(1:20, 4, 5)
> z
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# use of only nrow (if less data, then data will recycle
> p = matrix(1:20, nrow = 4)
> p
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# use of ncol, completely equivalent
> p = matrix(1:20, ncol = 5)
> p
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# use byrow to control how data is filled
> p = matrix(1:20, 4, 5, byrow = TRUE)
> p
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
# alternative: use the array function with the dim parameter
> y <- array(1:20, c(4,5))
> class(y)
[1] "matrix"
> y
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# the original matrix
> p
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# create an index matrix
> idx = array(c(1:3,3:1), c(3,2))
> idx
[,1] [,2]
[1,] 1 3
[2,] 2 2
[3,] 3 1
# access the elements: note how idx is used as [row, col]
> p[idx]
[1] 9 6 3
Matrices & Index Matrices
# now set those elements to zero
> p[idx] = 0
> p
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 0 13 17
[2,] 2 0 10 14 18
[3,] 0 7 11 15 19
[4,] 4 8 12 16 20
# creating random numbers
> rnorm(5)
[1] 0.5414429 -0.5555167 1.7667198 1.1929404 -0.7713971
> floor(rnorm(5, mean = 10, sd = 5))
[1] 6 10 14 18 6
# create a 2x2 random matrix from a normal distribution
> matrix(floor(rnorm(4, mean = 10, sd = 5)), 2, 2) -> p
> p
[,1] [,2]
[1,] 13 7
[2,] 20 11
# create another 2x2 random matrix from a normal distribution
> matrix(floor(rnorm(4, mean = 10, sd = 5)), 2, 2) -> q
> q
[,1] [,2]
[1,] 6 4
[2,] 6 18
Matrix Operations
> p*q # not true multiplication, element-wise multiplication
[,1] [,2]
[1,] 78 28
[2,] 120 198
# outer product
> p %o% q
, , 1, 1
[,1] [,2]
[1,] 78 42
[2,] 120 66
, , 2, 1
[,1] [,2]
[1,] 78 42
[2,] 120 66
, , 1, 2
[,1] [,2]
[1,] 52 28
[2,] 80 44
, , 2, 2
[,1] [,2]
[1,] 234 126
[2,] 360 198
# matrix product
> p %*% q
[,1] [,2]
[1,] 120 178
[2,] 186 278
# matrix transpose
> t(p)
[,1] [,2]
[1,] 13 20
[2,] 7 11
# diagonals of matrices p and q
> diag(p)
[1] 13 11
> diag(q)
[1] 6 18
Matrix Operations
# diag() with a vector gives a diagonal matrix
> diag( c(2, 3, 4))
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 3 0
[3,] 0 0 4
# diag() with a scalar gives an identity matrix
> diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
# cross-product of two matrices
> t(p) %*% q
[,1] [,2]
[1,] 198 412
[2,] 108 226
> crossprod(p, q)
[,1] [,2]
[1,] 198 412
[2,] 108 226
Matrix Operations
# e.g take the following system of eqns.
# x + y + z = 2
# 6x - 4y + 5z = 31
# 5x + 2y + 2z = 13
# this can be written as follows:
# M x = b where M is the coefficient matrix, x is the vector [x y z]’
> m = matrix( c(1, 6, 5, 1, -4, 2, 1, 5, 2), nrow = 3)
> b = c(2, 31, 13)
# solving for x involves finding the inverse of m, m-1
> solve(m)
[,1] [,2] [,3]
[1,] -0.6666667 1.850372e-17 0.33333333
[2,] 0.4814815 -1.111111e-01 0.03703704
[3,] 1.1851852 1.111111e-01 -0.37037037
# but we can always directly solve for x as follows:
> solve(m, b)
[1] 3 -2 1
# determinant
> det(m)
[1] 27
# eigenvalues and eigenvectors
> eigen(m)
eigen() decomposition
$`values`
[1] -5.6445744 5.5123299 -0.8677555
$vectors
[,1] [,2] [,3]
[1,] 0.1204044 -0.2968715 -0.5511522
[2,] -0.9768516 -0.5842714 0.2262813
[3,] 0.1768158 -0.7553107 0.8031363
# for more methods, e.g. rref(), install pracma
>install.packages("pracma")
>library(pracma)
Special Type: Factors
Unlike formulas, R provides no intrinsic way to
handle factors. This is done by associating two
vectors of equal length or reinterpreting an
existing symbol via its class attribute.
# given two vectors of equal lengths, one with responses and the
other with factor levels, R allows to apply summary functions at each
factor level - using factor() and tapply() functions.
> incomes <- c(50, 53, 80, 35, 47, 92, 44, 62, 61, 30)
> depts <- c(“H”, “H”, “M”, “M”, “A”, “S”, “M”, “S”, “H”, “A”)
> dfact <- factor(depts)
> dfact
[1] H H M M A S M S H A
Levels: A H M S
> tapply(incomes, dfact, mean)
A H M S
38.50000 54.66667 53.00000 77.00000
# both the first & second arguments must be of equal lengths.
Heterogeneous Data
Handling Heterogeneous Data: Lists and Data Frames
Most of the time, the data to be analysed will not be typed into the R console:
rather, they will be read from an external data source, like a disk file or a repository.
While R does not have a native type to read tabular data, it does provide lists. Lists
can contain heterogeneous values. Based on lists, a new class is built, “data
frames”, which will serve as the containers for external data.
Lists
# presence of one character element forces all elements to
characters, can’t use vectors to store heterogeneous data
> emp <- c("Sam", 34L, 85.5, 132000, "HR")
> emp
[1] "Sam" "34" "85.5" "132000" "HR"
# the correct approach is to use a list
> emp <- list("Sam", 34L, 85.5, 132000, "HR")
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 34
[[3]]
[1] 85.5
[[4]]
[1] 132000
[[5]]
[1] "HR"
> emp[1]
[[1]]
[1] "Sam"
> emp[1][1]
[[1]]
[1] "Sam"
> emp[1][2]
[[1]]
NULL
> emp[6]
[[1]]
NULL
# difference between [ ] and [[ ]]
> emp[6] # 6th element by position: note operator [ ]
[[1]]
[[1]][[1]]
[1] "ann"
[[1]][[2]]
[1] "beth"
> emp[[6]] # 6th item in list ‘emp’ is also a list: note operator [[ ]]
[[1]]
[1] "ann"
[[2]]
[1] "beth"
> emp[[6]][1] # note how a 1-item list is returned, not an atomic
[[1]]
[1] "ann"
> emp[[6]][[1]] # note how an atomic is returned
[1] "ann"
Lists with Sub-lists
> emp <- list("Sam", 34L, 85.5, 132000, "HR", list("ann", "beth"))
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 34
[[3]]
[1] 85.5
[[4]]
[1] 132000
[[5]]
[1] "HR"
[[6]]
[[6]][[1]]
[1] "ann"
[[6]][[2]]
[1] "beth
> emp$fname
[1] "Sam"
> emp$children
[[1]]
[1] "ann"
[[2]]
[1] "beth"
> emp$children[1]
[[1]]
[1] "ann"
> emp$c # notice minimalism in names
[[1]]
[1] "ann"
[[2]][1] "beth"
Lists with Field Names
> names(emp)
NULL
> names(emp) = c("fname", "age", "perf", "salary", "dept", "children")
> emp
$`fname`
[1] "Sam"
$age
[1] 34
$perf
[1] 85.5
$salary
[1] 132000
$dept
[1] "HR"
$children
$children[[1]]
[1] "ann"
$children[[2]]
[1] "beth"
Addition / Deletion with Lists
# create a list with two elements
> emp <- list("Sam", 23)
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 23
# now add another element
> emp <- c(emp, "HR")
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 23
[[3]]
[1] "HR"
# access the third element
> emp[3]
[[1]]
[1] "HR"
# minus the third element
> emp = emp[-3]
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 23
# now add another element at position 4
> emp[4] = 4
> emp
[[1]]
[1] "Sam"
[[2]]
[1] 23
[[3]]
NULL # note missing element at position 3
[[4]]
[1] 4
Data Frames
# there are two basic ways to look at data frames. One: as a
# combination of several vectors, each of which represents one
# variable. Each vector corresponds to a column, each element
# corresponds to a row (observation).
> emp.names = c("Sam", "Joe", "Ann")
> emp.codes = c(12, 23, 45)
> emp.salaries = c(45000, 32000, 85000)
> emps = data.frame(emp.codes, emp.names, emp.salaries)
> emps
emp.codes emp.names emp.salaries
1 12 Sam 45000
2 23 Joe 32000
3 45 Ann 85000
# getting the dimension and column name info
> dim(emps)
[1] 3 3
> names(emps)
[1] "emp.codes" "emp.names" "emp.salaries"
# The other way to look at data frames is as combinations of lists,
# each of which hold heterogeneous info about a single record.
> sam = list("Sam", 12, 45000)
> joe = list("Joe", 23, 32000)
> ann = list("Ann", 45, 85000)
> emps2 = data.frame(rbind(sam, joe, ann))
> emps2 # note that default names have been given to columns
X1 X2 X3
sam Sam 12 45000
joe Joe 23 32000
ann Ann 45 85000
> names(emps2) = c("emp.names", "emp.codes", "emp.salaries")
> emps2
emp.names emp.codes emp.salaries
sam Sam 12 45000
joe Joe 23 32000
ann Ann 45 85000
> rownames(emps2) # unlike in previous case, rows have names
[1] "sam" "joe" "ann"
> rownames(emps2) = NULL # lets remove them
> rownames(emps2)
[1] "1" "2" "3"
> emps2
emp.codes emp.names emp.salaries
1 Sam 12 45000
2 Joe 23 32000
3 Ann 45 85000
Data Wrangling
# select a range of rows or columns based on order
# filter a subset of rows by condition on columns
# summary statistics for columns, by groups in columns
# change the row or column order
# append, insert, delete rows or columns
# transform the data type of a column
Aggregation
● lapply(X, FUN, …) - operates on a vector or list and applies the FUN function for each element in the
vector (or list) X
● sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE) - works just like lapply, but will simplify the
output if possible, i.e., instead of returning a list like lapply, it will return a vector instead if the data is
simplifiable.
● vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE) - similar to sapply, but requires us to specify what
type of data we are expecting the arguments for vapply are.
● tapply(X, LEVELS, FUN, …) - similar to sapply but applies the FUN on groups specified by the levels of
LEVELS.
● mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) - ‘multivariate’ apply. Its
purpose is to be able to vectorize arguments to a function that is not usually accepting vectors as
arguments.
● apply(X, MARGIN, FUN) - finally, there is the general apply() function which works on arrays (and
matrices). MARGIN specifies which dimension to group by.
● Note: the xapply() family is considered legacy functionality and should not be used for new code.
Instead, it is recommended to use the purrr package for all aggregation in R.
Aggregation: the xapply() family of functions
Aggregation: sweep(), by() and aggregate()
Part III: Flow Control
Overview of Flow Control in R
● Grouping
● Conditional
○ The if/else structure
○ The ifelse() function
○ The switch() function
● Repetition
○ The while loop
○ The for/in loop
○ The repeat loop
○ The ‘foreach’ package
● Jump
○ Break
○ Next
● Commands may be grouped together in braces, {expr_1; …; expr_m}, in which
case the value of the group is the result of the last expression in the group
evaluated.
● Since such a group is also an expression it may, for example, be itself included
in parentheses and used as part of an even larger expression, and so on.
● Groups are important in conditionals and repetitions because often their
bodies are grouped statements.
Groups (Closures)
● The if/else construct
○ Syntax: if (expr_1) expr_2 else expr_3
○ Here, expr_1 must evaluate to a single logical value and the entire expression evaluates to
either expr_2 or expr_3.
● The ifelse() function
○ This is a vectorized version of the if/else construct
○ This has the form ifelse(condition, a, b) and returns a vector of the same length as condition,
with elements a[i] if condition[i] is true, otherwise b[i] (where a and b are recycled as necessary).
● The switch() function
○ Syntax: switch (integer_expression, list)
○ Evaluates the integer_expression and returns the first element from ‘list’ whose index matches
with integer_expression.
Conditional
● The unconditional loop: repeat expr_2
○ No conditions - infinite loop by default
○ Need a ‘break’ statement to break out of the loop
● The sentinel-controlled loop: while (condition) expr
○ expr is evaluated as long as the condition evaluates to true
○ Both ‘break’ and ‘next’ are accommodated.
● The counter-controlled loop: for (name in vector_expr_1) expr_2
○ name is the loop variable and expr_1 is a vector expression, (often a sequence).
○ expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy
name. It repeatedly evaluated as name ranges through the values in the vector result of expr_1.
Repetition
● The package ‘foreach’ provides the parallel counterpart to the for/in loop.
● The foreach() function takes an expression and returns an object of type
‘foreach’.
● The special %do% and %dopar% binary operators take a ‘foreach’ object as the
first operand and a grouped expression as the second operand.
● %do% evaluates sequentially while %dopar% runs parallely.
● When the ‘foreach’ function takes no arguments, the shortcut ‘times()’ can be
used for convenience.
● For more info, refer to the documentation.
The foreach package
Jumps
● The break statement
○ Unconditionally breaks from a loop
○ Only way to break ‘repeat’ loops
● The next statement
○ Skips evaluating the rest of the grouped expression
○ Forces the next iteration

Mais conteúdo relacionado

Semelhante a Introduction to R - by Chameera Dedduwage

DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESDATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESsuthi
 
Introduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologiesIntroduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologiesTabassumMaktum
 
Dbms Concepts
Dbms ConceptsDbms Concepts
Dbms Conceptsadukkas
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesUnit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesVasavi College of Engg
 
Semantic web
Semantic webSemantic web
Semantic webtariq1352
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptShemse Shukre
 
Geographic Information systems
Geographic Information systemsGeographic Information systems
Geographic Information systemsRajalakshmiS34
 
Geographic Information System unit 2
Geographic Information System unit 2Geographic Information System unit 2
Geographic Information System unit 2sridevi5983
 
Types and Annotations for CIDOC CRM Properties
Types and Annotations for CIDOC CRM PropertiesTypes and Annotations for CIDOC CRM Properties
Types and Annotations for CIDOC CRM PropertiesVladimir Alexiev, PhD, PMP
 
C++ Langauage Training in Ambala ! BATRA COMPUTER CENTRE
C++  Langauage Training in Ambala ! BATRA COMPUTER CENTREC++  Langauage Training in Ambala ! BATRA COMPUTER CENTRE
C++ Langauage Training in Ambala ! BATRA COMPUTER CENTREjatin batra
 
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptx
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptxCh.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptx
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptxSohagSrz
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model IntroductionNishant Munjal
 
Interview preparation for programming.pptx
Interview preparation for programming.pptxInterview preparation for programming.pptx
Interview preparation for programming.pptxBilalHussainShah5
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture OneAngelo Corsaro
 

Semelhante a Introduction to R - by Chameera Dedduwage (20)

DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTESDATA BASE MANAGEMENT SYSTEM - SHORT NOTES
DATA BASE MANAGEMENT SYSTEM - SHORT NOTES
 
Introduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologiesIntroduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologies
 
Unit 3
Unit 3Unit 3
Unit 3
 
C++
C++C++
C++
 
Dbms Concepts
Dbms ConceptsDbms Concepts
Dbms Concepts
 
RDA and the Semantic Web
RDA and the Semantic WebRDA and the Semantic Web
RDA and the Semantic Web
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming LanguagesUnit 2 Principles of Programming Languages
Unit 2 Principles of Programming Languages
 
Data types
Data typesData types
Data types
 
Introduction to R for beginners
Introduction to R for beginnersIntroduction to R for beginners
Introduction to R for beginners
 
Semantic web
Semantic webSemantic web
Semantic web
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.ppt
 
Spatial Data Models
Spatial Data Models Spatial Data Models
Spatial Data Models
 
Geographic Information systems
Geographic Information systemsGeographic Information systems
Geographic Information systems
 
Geographic Information System unit 2
Geographic Information System unit 2Geographic Information System unit 2
Geographic Information System unit 2
 
Types and Annotations for CIDOC CRM Properties
Types and Annotations for CIDOC CRM PropertiesTypes and Annotations for CIDOC CRM Properties
Types and Annotations for CIDOC CRM Properties
 
C++ Langauage Training in Ambala ! BATRA COMPUTER CENTRE
C++  Langauage Training in Ambala ! BATRA COMPUTER CENTREC++  Langauage Training in Ambala ! BATRA COMPUTER CENTRE
C++ Langauage Training in Ambala ! BATRA COMPUTER CENTRE
 
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptx
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptxCh.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptx
Ch.03 - Class Diagram_1 OBJECT ORIENTED ANALYSIS AND DESIGN [O] .pptx
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 
Interview preparation for programming.pptx
Interview preparation for programming.pptxInterview preparation for programming.pptx
Interview preparation for programming.pptx
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture One
 

Último

REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptxmanishaJyala2
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17Celine George
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesashishpaul799
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxCapitolTechU
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonMayur Khatri
 
Neurulation and the formation of the neural tube
Neurulation and the formation of the neural tubeNeurulation and the formation of the neural tube
Neurulation and the formation of the neural tubeSaadHumayun7
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...Nguyen Thanh Tu Collection
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Denish Jangid
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Celine George
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfmstarkes24
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxCeline George
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdfVikramadityaRaj
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 

Último (20)

Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdf
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdfPost Exam Fun(da) Intra UEM General Quiz - Finals.pdf
Post Exam Fun(da) Intra UEM General Quiz - Finals.pdf
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon season
 
Neurulation and the formation of the neural tube
Neurulation and the formation of the neural tubeNeurulation and the formation of the neural tube
Neurulation and the formation of the neural tube
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 

Introduction to R - by Chameera Dedduwage

  • 1. Chameera Dedduwage, B.Sc. (Colombo), Pg. Dip. (Applied Stat., Colombo) 2024 April Introduction to R
  • 2. This document is not meant to serve as a formal language declaration or an exhaustive guide to R. Rather, its purpose is to provide a firm understanding of the building blocks of R so that the knowledge can be applied to various use cases. A direct result of this approach is that much of the slides here will have illustrative examples that a user must type into the R console. How to Use This Document
  • 3. What is R? R is the GNU implementation of the S language developed by John Chambers, Rick Becker and Allan Wilks at the AT&T Bell Labs (where the C and C++ languages were born). The commercial implementation of S is called ‘S-PLUS’, while the copyleft (as opposed to copyrighted) implementation is known as R. R was developed by Ross Ihaka and Robert Gentleman, at University of Auckland. R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has, ● an effective data handling and storage facility, ● a suite of operators for calculations on arrays, in particular matrices, ● a large, coherent, integrated collection of intermediate tools for data analysis, ● graphical facilities for data analysis and display either directly at the computer or on hard-copy, and, ● a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
  • 4. ● Objects in R ○ Basic Object Types ○ Extending Basic Objects via Attributes ● Operations ○ Arithmetic, Assignment, Relational, Logical, and Special ○ Filtering Data via Vector Indexing ○ Multidimensional (Homogeneous Data) - Arrays and Matrices ○ Heterogeneous Data - Lists and Data Frames ○ Aggregation ● Flow Control ○ Conditional ○ Repetition ○ Jumps Outline of This Presentation
  • 6. Everything is an Object in R R deals with data stored in memory. Even data from external sources must be loaded into computer memory before they can be manipulated. R does not provide direct access to the computer’s memory. Rather, R provides a number of specialized data structures referred to as objects. These objects are referred to through symbols or variables. Furthermore, these symbols are themselves objects and can be manipulated in the same way as any other object. Furthermore, everything in R is an object, including executable code (functions). This is crucial to understanding and mastering R.
  • 8. Vectors (‘atomic vectors’) Lists (‘recursive vectors’) Language objects Expression objects Function objects NULL The 12 Basic Object Types in R Built In objects and special forms Promise objects Dot-dot-dot Environments Pairlist objects The “Any” type
  • 9. Vector Objects Vectors can be thought of as contiguous cells containing data. They are usually accessed through indexing operations such as x[5]. Indexing is a bit more involved in R because it includes filtering as well. More on this later. Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw. Numerical literals such as 42, 1e3, (-6.5), as well as character strings such as “Hello, world” are vectors of length 1. Zero-length vectors are also possible. R has six basic (‘atomic’) vector types: logical, integer, real, complex, character (in C aka ‘string’) and raw. In addition, R has list vectors.
  • 10. The Six Atomic Vector Types in R Type typeof mode storage.mode Logical logical logical logical Integer integer numeric integer Real double numeric double Complex complex complex complex String character character character Raw raw raw raw Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1.
  • 11. List Objects Lists are ordered collections of elements, each of which can contain any type of R object. List elements can be heterogeneous. List elements are accessed through three different indexing operations. Lists are vectors as well. To distinguish basic vectors from lists, basic vectors are usually referred to as ‘atomic vectors’, and lists are referred to as ‘recursive vectors’ (since the elements of a list themselves can be lists).
  • 12. There are three language objects: calls, expressions, and names. Confusingly, R has another object type called "expression". Unlike arrays and matrices, this provides an intrinsic way to handle modeling or formulae. Language Objects
  • 13. Using the Language Object to Create Formulas Unlike arrays and matrices, R provides an intrinsic way to handle modeling or formulae via the language object. class(fo <- y ~ x1*x2) # "formula" fo typeof(fo) # R internal : "language" terms(fo) environment(fo) environment(as.formula("y ~ x")) environment(as.formula("y ~ x", env = new.env()))
  • 14. Function Objects In R, functions are also objects and can be manipulated in much the same way as any other object. Functions (or more precisely, function closures) have three basic components: a formal argument list, a body and an environment. It is possible to have closures as well. Closures are delimited by braces, {} , and unlike functions, only have a body. Since they lack an environment, symbols declared within a closure belong to the parent environment. Operators in R are functions as well. This is an important feature that will become important in OOP.
  • 15. Built-in Objects (and Special Forms) These two kinds of object contain the builtin functions of R, i.e., those that are displayed as .Primitive in code listings (as well as those accessed via the .Internal function and hence not user-visible as objects). The difference between the two lies in the argument handling. Builtin functions have all their arguments evaluated and passed to the internal function, in accordance with call-by-value, whereas special functions pass the unevaluated arguments to the internal function. From the R language, these objects are just another kind of function. The is.primitive function can distinguish them from interpreted functions.
  • 16. Environment Objects Environments can be thought of as consisting of two things. A frame, consisting of a set of symbol-value pairs, and an enclosure, a pointer to an enclosing environment. When R looks up the value for a symbol the frame is examined and if a matching symbol is found its value will be returned. If not, the enclosing environment is then accessed and the process repeated. Environments are created implicitly by function calls. Environments form a tree structure in which the enclosures play the role of parents. The tree of environments is rooted in an empty environment, available through emptyenv(), which has no parent. It is the direct parent of the environment of the base package (available through the baseenv() function).
  • 17. Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment. When a function is called the arguments are matched and then each of the formal arguments is bound to a promise. The expression that was given for that formal argument and a pointer to the environment the function was called from are stored in the promise. Until that argument is accessed there is no value associated with the promise. When the argument is accessed, the stored expression is evaluated in the stored environment, and the result is returned. The result is also saved by the promise. The substitute function will extract the content of the expression slot. This allows the programmer to access either the value or the expression associated with the promise. Promise Objects
  • 18. Pairlist Objects The use of pairlists is deprecated since generic vectors are usually more efficient to use. When an internal pairlist is accessed from R it is generally (including when subsetted) converted to a generic vector.
  • 19. NULL, Any, and dot-dot-dot (...) Objects There is a special object called NULL. It is used whenever there is a need to indicate or specify that an object is absent. It should not be confused with a vector or list of zero length. The NULL object has no type and no modifiable properties. There is only one NULL object in R, to which all instances refer. To test for NULL use is.null. You cannot set attributes on NULL. It is not really possible for an object to be of “Any” type, but it is nevertheless a valid type value. It gets used in certain (rather rare) circumstances, e.g. as.vector(x, "any"), indicating that type coercion should not be done. The ... object type is stored as a type of pairlist. The components of ... can be accessed in the usual pairlist manner from C code, but ... is not easily accessed as an object in interpreted code, and even the existence of such an object should typically not be assumed, as that may change in the future. If a function has ... as a formal argument then any actual arguments that do not match a formal argument are matched with …
  • 20. Extending Basic Types via Attributes
  • 21. In R, every object has at least a few attributes. These include the length, mode, class, type, and storage mode. In addition, there are others such as dim and names. Attributes tell R to interpret and handle an object in a specific way. For example, a list object with a class attribute of “data.frame” will be handled differently than a list. A vector having a class attribute of “factor” will be printed differently. The names attribute of a list will make it possible to access elements by name. Attributes of Objects
  • 22. Common Attributes: mode, length, and class By the mode of an object we mean the basic type of its fundamental constituents. This is a special case of a “property” of an object. Another property of every object is its length. The functions mode(object) and length(object) can be used to find out the mode and length of any defined structure. All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example "numeric", "logical", "character" or "list", but "matrix", "array", "factor" and "data.frame" are other possible values. A special attribute known as the class of the object is used to allow for an object-oriented style of programming in R. For example, if an object has class "data.frame", it will be printed in a certain way, the plot() function will display it graphically in a certain way, and other so-called generic functions such as summary() will react to it as an argument in a way sensitive to its class.
  • 24. ● Atomic Vectors ○ Integer ○ Numeric ○ Complex ○ Character ○ Logical ● List Vectors ● Special Types ○ Matrices ○ Arrays ○ Factors ○ Data frames Data Types and Operators ● Arithmetic ○ Addition, subtraction, multiplication, exponentiation, division, integer division, modulus ● Assignment ○ x = value, x <- value, value -> x, x <<- value, value ->> x ● Relational ○ <, >, >=, <=, ==, != ● Logical ○ &&, ||, !, &, | ● Special ○ :, %in%
  • 25. > 5%%2 # five modulus 2 [1] 2 # returns remainder after division > 5%/%2 # five integer-division 2 [1] 2 # returns quotient of division > 5 / 0 # division by zero [1] Inf # returns Inf > 0 / 0 # zero divided by zero [1] NaN # returns NaN (Not a Number) > 2 + 2 [1] 4 # the [1] before the answer indicates that the answer is a 1-d vector (of one element) > 3-2 [1] -1 > 5*2 [1] 10 > 6^2 # six to the power 2 [1] 36 > 6**2 # six to the power 2, identical to Python format [1] 36 > 5/2 # five divided by 2 [1] 2.5 # returns a decimal Scalar Arithmetic
  • 26. NaN and Inf # mixed operations: all mathematical operators return NaN > Inf + NaN [1] NaN > NaN +1 # addition/subtraction of a scalar to NaN returns NaN [1] NaN > NaN -1 [1] NaN > NaN + NaN # same with * [1] NaN > NaN - NaN # same with / [1] NaN > Inf + 1 [1] Inf > Inf - 1 [1] Inf > Inf + Inf # same with * [1] Inf # but subtraction and division between Inf returns NaN > Inf - Inf # same with / [1] NaN
  • 27. A Summary of NaN and Inf in R A op B B = 0 B = 1 B = (-1) B = NaN Inf A = 0 + 0 - 0 * 0 / NaN + 1 - (-1) * 0 / 0 + (-1) - 1 * 0 / 0 + NaN - NaN * NaN / NaN + Inf - (-Inf) * NaN / 0 A = 1 + 1 - 1 * 0 / Inf + 2 - 0 * 1 / 1 + 0 - 2 * (-1) / (-1) + NaN - NaN * NaN / NaN + Inf - (-Inf) * Inf / 0 A = (-1) + (-1) - (-1) * 0 / (-Inf) + 0 - (-2) * 0 / (-1) + (-2) - 0 * 1 / 1 + NaN - NaN * NaN / NaN + Inf - (-Inf) * (-Inf) / 0 A = NaN + NaN - NaN * NaN / NaN + NaN - NaN * NaN / NaN + NaN - NaN * NaN / NaN + NaN - NaN * NaN / NaN + NaN - NaN * NaN / NaN A = Inf + Inf - Inf * NaN / Inf + Inf - Inf * Inf / Inf + Inf - Inf * (-Inf) / (-Inf) + NaN - NaN * NaN / NaN + Inf - NaN * Inf / NaN
  • 28. Vectors: Addition & Subtraction > c(1, 3, 4, 7) # c() stands for ‘combine’ - notice that it’s a simple c - R is case sensitive [1] 1 3 4 7 # the result is a 1-d vector of 4 elements > c(1, 3, 4, 7) + c(2, 3, 5, 8) # vector addition, equal length: corresponding elements added [1] 3 6 9 15 > c(12, 15, 28, 74) + c (2, 8) # unequal addition: smaller vector is recycled & added [1] 14 23 30 82 > c(15, 18, 21) + 5 # same logic applies to scalars - scalars are treated as one-element vectors [1] 20 23 26 > c(91, 90, 76, 54, 23) - c(2, 3) # unequal lengths where larger vector length is not a multiple of the smaller vector length - smaller vector is recycled - with a warning [1] 89 87 74 51 21 Warning message: In c(91, 90, 76, 54, 23) - c(2, 3) : longer object length is not a multiple of shorter object length
  • 29. Vectors: Multiplication & Division > c(1, 3, 5) * 2 # multiplication by scalar [1] 2 6 10 > c(1, 3, 5) / 3 # division by a scalar [1] 0.3333333 1.0000000 1.6666667 > c(1, 3, 5) * c(2, 4, 6) # multiplication of two vectors with equal lengths - element-wise multiplication [1] 2 12 30 > c(1, 3, 5, 7, 9) * c(2, 5) # multiplication of two vectors with unequal lengths - smaller vector is recycled, just like in addition [1] 2 15 10 35 18 Warning message: In c(1, 3, 5, 7, 9) * c(2, 5) : longer object length is not a multiple of shorter object length > c(1, 3, 5, 7, 9, 11) / c(5, 10) # division by a vector [1] 0.2 0.3 1.0 0.7 1.8 1.1
  • 30. > TRUE [1] TRUE > FALSE [1] FALSE > TRUE || FALSE # logical OR [1] TRUE > TRUE && FALSE # logical AND [1] FALSE > ! FALSE # logical NOT [1] TRUE > TRUE + 1 # TRUE coerced to a numeric value [1] 2 Logical Arithmetic > TRUE + FALSE # logicals coerced to numeric values [1] 1 > TRUE * FALSE [1] 0 > TRUE / TRUE [1] 1 > TRUE/ FALSE # same rules apply [1] Inf > 1 && 1 # the && operator wil coerce the ‘1’ into TRUE. [1] TRUE
  • 31. # same goes for the | and || operators > (2 < 3) | (5=6) Error in 5 = 6 : invalid (do_set) left-hand side to assignment > (2 < 3) || (5=6) [1] TRUE > 2 = 3 # caveat: = is not equality Error in 2 = 3 : invalid (do_set) left-hand side to assignment > 2==3 # == is the equality comparison [1] FALSE > 2!=3 [1] TRUE # this will throw an error, since & evaluates both operands, regardless of the first comparison being sufficient > (2 > 3) & (5=6) Error in 5 = 6 : invalid (do_set) left-hand side to assignment # but the following will not; && will ‘short-circuit’ and return > (2 > 3) && (5=6) [1] FALSE # but if the first comparison is inconclusive, then second will be evaluated, throwing an error > (2 < 3) && (5=6) Error in 5 = 6 : invalid (do_set) left-hand side to assignment Logical Comparisons
  • 32. Missing Value: NA > b = NA # missing value marker > b [1] NA > class(b) [1] "logical" > b + 1 [1] NA > b - 1 [1] NA > b + TRUE [1] NA > b || TRUE [1] TRUE > b && TRUE [1] NA > b && FALSE [1] FALSE > b || FALSE [1] NA # you cannot check equality / inequality of NA > NA==NA [1] NA > NA!=NA [1] NA
  • 33. Comparisons involving Inf, NaN, and NA # Use of is.na() function. NaN==NA returns FALSE but: > is.na(NaN) [1] TRUE > is.na(Inf) [1] FALSE # Use of is.nan() function. > is.nan(NaN) [1] TRUE > is.nan(Inf) [1] FALSE > is.na(NA) [1] TRUE > Inf < NaN [1] NA > Inf == NaN [1] NA > Inf == Inf [1] TRUE > NaN == NaN [1] NA > Inf == NA [1] NA > NaN == NA [1] NA
  • 34. > x = TRUE; y = FALSE # TRUE & FALSE are boolean literals > x [1] TRUE > x && y # logical AND, shortcut version [1] FALSE > c( x && y, x || y, !x, !y) # logical AND, OR, NOT, shortcut versions [1] FALSE TRUE FALSE TRUE # simple comparisons > 2 >3 [1] FALSE > 3 > 3 [1] FALSE > 3>=3 [1] TRUE > 2<=4 [1] TRUE Logical Vectors > v = 1:7 # simple sequence > v [1] 1 2 3 4 5 6 7 > v > 3 # elementwise comparison of vector with scalar [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE > d = c(2,3) # another vector, with non-matching length > v > d [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE Warning message: In v > d : longer object length is not a multiple of shorter object length > v[v>3] # since v>3 is a 7-element boolean vector, we can use it to filter elements [1] 4 5 6 7 # only the elements for which v>3 is TRUE are fetched > v[TRUE] # for the sake of demonstration [1] 1 2 3 4 5 6 7 > v[FALSE] integer(0)
  • 35. > exp(1) # R comes with a lot of built-in functions of the form f(...) [1] 2.718282 > log(2) # function called with one argument, the second argument defaults to e [1] 0.6931472 > log(2, 10) # second argument is ‘base’, and it’s matched positionally [1] 0.30103 > log( 2, base = 10) # alternatively, the second argument can be matched by name [1] 0.30103 > log( x = 2, base = 10) # both arguments matched by name: order doesn’t matter here, e.g. log( base = 10, x = 2 ) is identical [1] 0.30103 Built-in Functions > log( base = 10, 2) # R will first match the named argument, and the unnamed arguments will be matched positionally [1] 0.30103 # in fact, all operators like +, -, *, ^ are functions, and R calls these functions under-the-hood when operators are used in expressions. # summary functions length(x) - number of elements in x sum(x) - sum of elements in x mean (x) - mean of elements in x min(x) - minimum of elements in x max(x) - maximum of elements in x range(x) - returns a 2-element vector of c( min(x), max(x) ) var(x) - sample variance of elements in x
  • 36. Numeric Sequences > 1:10 # basic sequence [1] 1 2 3 4 5 6 7 8 9 10 > 8:-2 # backward sequence [1] 8 7 6 5 4 3 2 1 0 -1 -2 > seq( 2, 8) # the seq() function [1] 2 3 4 5 6 7 8 > seq( from = 2, to = 8) # equivalent to above, named arguments [1] 2 3 4 5 6 7 8 > seq( from = 2, to = 8, by = 2) # stepping parameter [1] 2 4 6 8 > seq( from = 2, to = 8, by = 4) # when range is not a multiple of step size, end value may not be included [1] 2 6 > seq(8) # if only one argument is given, it’s matched with ‘to’ parameter, and ‘from’ defaults to 1 [1] 1 2 3 4 5 6 7 8 > v <- seq(1, 8, 2) # create sequence > v [1] 1 3 5 7 > 5 %in% v # the %in% operator [1] TRUE > 4 %in% v [1] FALSE
  • 37. Character Vectors > "Hello" # string literals are treated as 1-d character vectors [1] "Hello" > ‘This is a string, too’ # they can be enclosed in single quotes, too: note how the R console delimits strings by double quotes, regardless [1] “This is a string, too” > c("a", "b") # you can have character vectors as well: note how c() combines, not concatenates [1] "a" "b" # note how the result is a 2-element vector > paste (“a”, “b”) # for concatenation, you need to call the ‘paste’ function [1] "a b" # now the result is a 1-element vector. Note the space between: this is the default separator for paste > paste (“a”, “b”, sep = “”) # let’s override the default one-space separator with a zero-length string [1] "ab" # now it’s a proper concatenation > paste(2) # note how a scalar is converted into a character array [1] “2” > paste( c(1,2)) # vectors are converted not concatenated [1] "1" "2"
  • 38. Complex & Integer Vectors > a <- 2+3i # the symbol ‘i’ when placed after a numeric denotes a complex number 0+1i, i being the square root of (-1) > b <- 5-4i > a+b [1] 7-1i > a-b [1] -3+7i > a*b [1] 22+7i > a/b [1] -0.0487805+0.5609756i > class(a) # the function ‘class’ retrieves what storage class this variable is [1] "complex" > b = 20L # the suffix L tells R that this is of class integer > class(b) [1] "integer" > b = 2.5L # trying to store numeric by force Warning message: integer literal 2.5L contains decimal; using numeric value > class(b) [1] "numeric"
  • 39. > (5**2 + 57) -> y # right assignment also works > x = 5; y = 4 # multiple assignment in one line with ‘ ; ’ > y <<- 75; 50->>x # yet another alternative, but this has to do with assigning to a masked variable outside current scope > assign(v, 1) # basic assignment > # notice no output > v <- 1 # syntactic shortcut > # notice no output > v # now type the variable name [1] 1 # and you see the value > v = 5 # alternative assignment method > v # same as <- operator, except in the following case > 5 > sin( x = 5) # assigns the value 5 to the x parameter of sin function > [1] -0.9589243 > sin( x <-5) # creates a new variable x, assigns 5 to it, the whole expression evaluates to 5, which then gives the value > [1] -0.9589243 # the difference is, using = didn’t create a new variable in the workspace, using <- does. Symbols (Variables) & Assignment
  • 40. Filtering Data (via Vector Indexing)
  • 41. > v[c(1, 1)] # first element repeated twice [1] 1 1 > > v[c(2, 1, 4, 3, 5, 3)] # doesn’t need to be a sequence [1] 2 1 4 3 5 3 > v[-1] # negative args allowed; asks to drop the first element [1] 2 3 4 5 6 7 > v[-5] # drop the fifth element [1] 1 2 3 4 6 7 > v[ c(-1, -3)] # drop elements 1 & 3 [1] 2 4 5 6 7 > v[-1:-3] # drop elements 1 through 3 [1] 4 5 6 7 > v = c(1, 3, 5) # assignment of numeric vector > v[1] # unlike C, this is 1-oriented [1] 1 # the first value is returned > v[0] # there is no element at “zeroth” position > numeric(0) # graceful fallback > v[1.7] # non-integers floored [1] 1 > v = seq(1,7) # basic sequence from 1 to 7 > v [1] 1 2 3 4 5 6 7 > v[1:3] # ask to return elements 1 to 3 [1] 1 2 3 > k = seq(1, 6, 2) # create a sequence 1 3 5 > k [1] 1 3 5 > v[k] # returns elements at 1st, 3rd & 5th positions [1] 1 3 5 > v[c(1,3,5)] # identical to v[k] > [1] 1 3 5 Vector Indexing
  • 42. > v[11] = 11 # gaps are filled with NA > v [1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0 8.0 NA NA 11.0 > class(v) # however this is numeric NA, not logical NA [1] "numeric" > class(v[9]) [1] "numeric" > v = c(1, 7, 4, 0, 3, 3, 5, 6, 2, 9, 1, 1, 0, 7, 4, 6, 8, NA) > sort(v) [1] 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9 # NAs are missing > sort(v, na.last = TRUE) [1] 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9 NA > sort(v, na.last = FALSE) [1] NA 0 0 1 1 1 2 3 3 4 4 5 6 6 7 7 8 9 > k # remember k is c(1 3 5) [1] 1 3 5 > v[-k] # drop elements 1, 3 & 5 [1] 2 4 6 7 > v [ -8] # there is no eighth element to drop [1] 1 2 3 4 5 6 7 # so the entire vector is returned > v[4] = 4.5; v[5] = 4.5 # change elements 4 & 5 > v [1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0 > v[8] = 8 # non-existent index adds element > v [1] 1.0 2.0 3.0 4.5 4.5 6.0 7.0 8.0 Vector Indexing, Replacing, Inserting & Sorting
  • 43. Four Types of Vector Indices 1. A vector of positive integral quantities. In this case the values in the index vector must lie in the set {1, 2, …, length(x)}. The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length and the result is of the same length as the index vector. 2. A vector of negative integral quantities. Such an index vector specifies the values to be excluded rather than included. 3. A logical vector. In this case the index vector is recycled to the same length as the vector from which elements are to be selected. Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted. NA values in the index vector are included in the result as NA. 4. A vector of character strings. This possibility only applies where an object has a names attribute to identify its components. In this case a sub-vector of the names vector may be used in the same way as the positive integral labels
  • 45. Special Types: Arrays, Matrices, and Factors R provides no intrinsic way to handle arrays and matrices (unlike MATLAB or OCTAVE). Instead, we create vectors and ask R to treat them as arrays or matrices by setting the ‘dim’ attribute. Alternatively, we can use the array() and matrix() functions to create these objects. Arrays can have any non-zero dimensions. Matrices are a special case of arrays having just two dimensions. Similarly, R has no intrinsic support of factors. This is done by asking R to treat a vector as factors by setting its class manually (or by using the factor function).
  • 46. # now output the array: notice the order in which data are filled > d , , 1 [,1] [,2] [,3] [1,] 12 11 5 [2,] 14 9 13 [3,] 12 9 9 , , 2 [,1] [,2] [,3] [1,] 10 5 9 [2,] 3 8 10 [3,] 7 14 11 # create 18 random numbers > d = floor(rnorm(18, mean = 10, sd = 3)) > d [1] 12 14 12 11 9 9 5 13 9 10 3 7 5 8 14 9 10 11 # change into an array > d = array(d) > class(d) [1] "array" # one dimension, 18 elements > dim(d) [1] 18 # change the dimensions to 3x3x2 > dim(d) = c(3, 3, 2) # check dimensions > dim(d) [1] 3 3 2 Arrays
  • 47. > x[1] # first element - no ambiguities [1] 1 > x[1][1] # not like C, this doesn’t work as x[row][col] [1] 1 > x[1,1] # but this does: x [row, col] [1] 1 > x[1,2] # row one, column 2 is 5, and not 2 [1] 5 > x[1][2] # again, this fails [1] NA > x[15] # first increment rows, then column [1] 15 # FORTRAN column-major order Matrices from Vectors > x = 1:20 # simple sequence > x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > class(x) [1] "integer" # class in atomic type integer > dim(x) # dim attribute is not set NULL > dim(x) = c(4, 5) # setting dim will let R treat this as a matrix/array > x [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 > class(x) [1] "matrix" # 2-d array is a matrix > dim(x) # dimension vector is a 2-element vector [1] 4 5
  • 48. Matrices from Functions # the matrix function > z = matrix(1:20, 4, 5) > z [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 # use of only nrow (if less data, then data will recycle > p = matrix(1:20, nrow = 4) > p [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 # use of ncol, completely equivalent > p = matrix(1:20, ncol = 5) > p [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 # use byrow to control how data is filled > p = matrix(1:20, 4, 5, byrow = TRUE) > p [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 6 7 8 9 10 [3,] 11 12 13 14 15 [4,] 16 17 18 19 20 # alternative: use the array function with the dim parameter > y <- array(1:20, c(4,5)) > class(y) [1] "matrix" > y [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20
  • 49. # the original matrix > p [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 # create an index matrix > idx = array(c(1:3,3:1), c(3,2)) > idx [,1] [,2] [1,] 1 3 [2,] 2 2 [3,] 3 1 # access the elements: note how idx is used as [row, col] > p[idx] [1] 9 6 3 Matrices & Index Matrices # now set those elements to zero > p[idx] = 0 > p [,1] [,2] [,3] [,4] [,5] [1,] 1 5 0 13 17 [2,] 2 0 10 14 18 [3,] 0 7 11 15 19 [4,] 4 8 12 16 20
  • 50. # creating random numbers > rnorm(5) [1] 0.5414429 -0.5555167 1.7667198 1.1929404 -0.7713971 > floor(rnorm(5, mean = 10, sd = 5)) [1] 6 10 14 18 6 # create a 2x2 random matrix from a normal distribution > matrix(floor(rnorm(4, mean = 10, sd = 5)), 2, 2) -> p > p [,1] [,2] [1,] 13 7 [2,] 20 11 # create another 2x2 random matrix from a normal distribution > matrix(floor(rnorm(4, mean = 10, sd = 5)), 2, 2) -> q > q [,1] [,2] [1,] 6 4 [2,] 6 18 Matrix Operations > p*q # not true multiplication, element-wise multiplication [,1] [,2] [1,] 78 28 [2,] 120 198 # outer product > p %o% q , , 1, 1 [,1] [,2] [1,] 78 42 [2,] 120 66 , , 2, 1 [,1] [,2] [1,] 78 42 [2,] 120 66 , , 1, 2 [,1] [,2] [1,] 52 28 [2,] 80 44 , , 2, 2 [,1] [,2] [1,] 234 126 [2,] 360 198
  • 51. # matrix product > p %*% q [,1] [,2] [1,] 120 178 [2,] 186 278 # matrix transpose > t(p) [,1] [,2] [1,] 13 20 [2,] 7 11 # diagonals of matrices p and q > diag(p) [1] 13 11 > diag(q) [1] 6 18 Matrix Operations # diag() with a vector gives a diagonal matrix > diag( c(2, 3, 4)) [,1] [,2] [,3] [1,] 2 0 0 [2,] 0 3 0 [3,] 0 0 4 # diag() with a scalar gives an identity matrix > diag(3) [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 # cross-product of two matrices > t(p) %*% q [,1] [,2] [1,] 198 412 [2,] 108 226 > crossprod(p, q) [,1] [,2] [1,] 198 412 [2,] 108 226
  • 52. Matrix Operations # e.g take the following system of eqns. # x + y + z = 2 # 6x - 4y + 5z = 31 # 5x + 2y + 2z = 13 # this can be written as follows: # M x = b where M is the coefficient matrix, x is the vector [x y z]’ > m = matrix( c(1, 6, 5, 1, -4, 2, 1, 5, 2), nrow = 3) > b = c(2, 31, 13) # solving for x involves finding the inverse of m, m-1 > solve(m) [,1] [,2] [,3] [1,] -0.6666667 1.850372e-17 0.33333333 [2,] 0.4814815 -1.111111e-01 0.03703704 [3,] 1.1851852 1.111111e-01 -0.37037037 # but we can always directly solve for x as follows: > solve(m, b) [1] 3 -2 1 # determinant > det(m) [1] 27 # eigenvalues and eigenvectors > eigen(m) eigen() decomposition $`values` [1] -5.6445744 5.5123299 -0.8677555 $vectors [,1] [,2] [,3] [1,] 0.1204044 -0.2968715 -0.5511522 [2,] -0.9768516 -0.5842714 0.2262813 [3,] 0.1768158 -0.7553107 0.8031363 # for more methods, e.g. rref(), install pracma >install.packages("pracma") >library(pracma)
  • 53. Special Type: Factors Unlike formulas, R provides no intrinsic way to handle factors. This is done by associating two vectors of equal length or reinterpreting an existing symbol via its class attribute. # given two vectors of equal lengths, one with responses and the other with factor levels, R allows to apply summary functions at each factor level - using factor() and tapply() functions. > incomes <- c(50, 53, 80, 35, 47, 92, 44, 62, 61, 30) > depts <- c(“H”, “H”, “M”, “M”, “A”, “S”, “M”, “S”, “H”, “A”) > dfact <- factor(depts) > dfact [1] H H M M A S M S H A Levels: A H M S > tapply(incomes, dfact, mean) A H M S 38.50000 54.66667 53.00000 77.00000 # both the first & second arguments must be of equal lengths.
  • 55. Handling Heterogeneous Data: Lists and Data Frames Most of the time, the data to be analysed will not be typed into the R console: rather, they will be read from an external data source, like a disk file or a repository. While R does not have a native type to read tabular data, it does provide lists. Lists can contain heterogeneous values. Based on lists, a new class is built, “data frames”, which will serve as the containers for external data.
  • 56. Lists # presence of one character element forces all elements to characters, can’t use vectors to store heterogeneous data > emp <- c("Sam", 34L, 85.5, 132000, "HR") > emp [1] "Sam" "34" "85.5" "132000" "HR" # the correct approach is to use a list > emp <- list("Sam", 34L, 85.5, 132000, "HR") > emp [[1]] [1] "Sam" [[2]] [1] 34 [[3]] [1] 85.5 [[4]] [1] 132000 [[5]] [1] "HR" > emp[1] [[1]] [1] "Sam" > emp[1][1] [[1]] [1] "Sam" > emp[1][2] [[1]] NULL > emp[6] [[1]] NULL
  • 57. # difference between [ ] and [[ ]] > emp[6] # 6th element by position: note operator [ ] [[1]] [[1]][[1]] [1] "ann" [[1]][[2]] [1] "beth" > emp[[6]] # 6th item in list ‘emp’ is also a list: note operator [[ ]] [[1]] [1] "ann" [[2]] [1] "beth" > emp[[6]][1] # note how a 1-item list is returned, not an atomic [[1]] [1] "ann" > emp[[6]][[1]] # note how an atomic is returned [1] "ann" Lists with Sub-lists > emp <- list("Sam", 34L, 85.5, 132000, "HR", list("ann", "beth")) > emp [[1]] [1] "Sam" [[2]] [1] 34 [[3]] [1] 85.5 [[4]] [1] 132000 [[5]] [1] "HR" [[6]] [[6]][[1]] [1] "ann" [[6]][[2]] [1] "beth
  • 58. > emp$fname [1] "Sam" > emp$children [[1]] [1] "ann" [[2]] [1] "beth" > emp$children[1] [[1]] [1] "ann" > emp$c # notice minimalism in names [[1]] [1] "ann" [[2]][1] "beth" Lists with Field Names > names(emp) NULL > names(emp) = c("fname", "age", "perf", "salary", "dept", "children") > emp $`fname` [1] "Sam" $age [1] 34 $perf [1] 85.5 $salary [1] 132000 $dept [1] "HR" $children $children[[1]] [1] "ann" $children[[2]] [1] "beth"
  • 59. Addition / Deletion with Lists # create a list with two elements > emp <- list("Sam", 23) > emp [[1]] [1] "Sam" [[2]] [1] 23 # now add another element > emp <- c(emp, "HR") > emp [[1]] [1] "Sam" [[2]] [1] 23 [[3]] [1] "HR" # access the third element > emp[3] [[1]] [1] "HR" # minus the third element > emp = emp[-3] > emp [[1]] [1] "Sam" [[2]] [1] 23 # now add another element at position 4 > emp[4] = 4 > emp [[1]] [1] "Sam" [[2]] [1] 23 [[3]] NULL # note missing element at position 3 [[4]] [1] 4
  • 60. Data Frames # there are two basic ways to look at data frames. One: as a # combination of several vectors, each of which represents one # variable. Each vector corresponds to a column, each element # corresponds to a row (observation). > emp.names = c("Sam", "Joe", "Ann") > emp.codes = c(12, 23, 45) > emp.salaries = c(45000, 32000, 85000) > emps = data.frame(emp.codes, emp.names, emp.salaries) > emps emp.codes emp.names emp.salaries 1 12 Sam 45000 2 23 Joe 32000 3 45 Ann 85000 # getting the dimension and column name info > dim(emps) [1] 3 3 > names(emps) [1] "emp.codes" "emp.names" "emp.salaries" # The other way to look at data frames is as combinations of lists, # each of which hold heterogeneous info about a single record. > sam = list("Sam", 12, 45000) > joe = list("Joe", 23, 32000) > ann = list("Ann", 45, 85000) > emps2 = data.frame(rbind(sam, joe, ann)) > emps2 # note that default names have been given to columns X1 X2 X3 sam Sam 12 45000 joe Joe 23 32000 ann Ann 45 85000 > names(emps2) = c("emp.names", "emp.codes", "emp.salaries") > emps2 emp.names emp.codes emp.salaries sam Sam 12 45000 joe Joe 23 32000 ann Ann 45 85000 > rownames(emps2) # unlike in previous case, rows have names [1] "sam" "joe" "ann" > rownames(emps2) = NULL # lets remove them > rownames(emps2) [1] "1" "2" "3" > emps2 emp.codes emp.names emp.salaries 1 Sam 12 45000 2 Joe 23 32000 3 Ann 45 85000
  • 61. Data Wrangling # select a range of rows or columns based on order # filter a subset of rows by condition on columns # summary statistics for columns, by groups in columns # change the row or column order # append, insert, delete rows or columns # transform the data type of a column
  • 63. ● lapply(X, FUN, …) - operates on a vector or list and applies the FUN function for each element in the vector (or list) X ● sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE) - works just like lapply, but will simplify the output if possible, i.e., instead of returning a list like lapply, it will return a vector instead if the data is simplifiable. ● vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE) - similar to sapply, but requires us to specify what type of data we are expecting the arguments for vapply are. ● tapply(X, LEVELS, FUN, …) - similar to sapply but applies the FUN on groups specified by the levels of LEVELS. ● mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) - ‘multivariate’ apply. Its purpose is to be able to vectorize arguments to a function that is not usually accepting vectors as arguments. ● apply(X, MARGIN, FUN) - finally, there is the general apply() function which works on arrays (and matrices). MARGIN specifies which dimension to group by. ● Note: the xapply() family is considered legacy functionality and should not be used for new code. Instead, it is recommended to use the purrr package for all aggregation in R. Aggregation: the xapply() family of functions
  • 64. Aggregation: sweep(), by() and aggregate()
  • 65. Part III: Flow Control
  • 66. Overview of Flow Control in R ● Grouping ● Conditional ○ The if/else structure ○ The ifelse() function ○ The switch() function ● Repetition ○ The while loop ○ The for/in loop ○ The repeat loop ○ The ‘foreach’ package ● Jump ○ Break ○ Next
  • 67. ● Commands may be grouped together in braces, {expr_1; …; expr_m}, in which case the value of the group is the result of the last expression in the group evaluated. ● Since such a group is also an expression it may, for example, be itself included in parentheses and used as part of an even larger expression, and so on. ● Groups are important in conditionals and repetitions because often their bodies are grouped statements. Groups (Closures)
  • 68. ● The if/else construct ○ Syntax: if (expr_1) expr_2 else expr_3 ○ Here, expr_1 must evaluate to a single logical value and the entire expression evaluates to either expr_2 or expr_3. ● The ifelse() function ○ This is a vectorized version of the if/else construct ○ This has the form ifelse(condition, a, b) and returns a vector of the same length as condition, with elements a[i] if condition[i] is true, otherwise b[i] (where a and b are recycled as necessary). ● The switch() function ○ Syntax: switch (integer_expression, list) ○ Evaluates the integer_expression and returns the first element from ‘list’ whose index matches with integer_expression. Conditional
  • 69. ● The unconditional loop: repeat expr_2 ○ No conditions - infinite loop by default ○ Need a ‘break’ statement to break out of the loop ● The sentinel-controlled loop: while (condition) expr ○ expr is evaluated as long as the condition evaluates to true ○ Both ‘break’ and ‘next’ are accommodated. ● The counter-controlled loop: for (name in vector_expr_1) expr_2 ○ name is the loop variable and expr_1 is a vector expression, (often a sequence). ○ expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. It repeatedly evaluated as name ranges through the values in the vector result of expr_1. Repetition
  • 70. ● The package ‘foreach’ provides the parallel counterpart to the for/in loop. ● The foreach() function takes an expression and returns an object of type ‘foreach’. ● The special %do% and %dopar% binary operators take a ‘foreach’ object as the first operand and a grouped expression as the second operand. ● %do% evaluates sequentially while %dopar% runs parallely. ● When the ‘foreach’ function takes no arguments, the shortcut ‘times()’ can be used for convenience. ● For more info, refer to the documentation. The foreach package
  • 71. Jumps ● The break statement ○ Unconditionally breaks from a loop ○ Only way to break ‘repeat’ loops ● The next statement ○ Skips evaluating the rest of the grouped expression ○ Forces the next iteration