2. REASONS FOR LEARNING R
It's data-centric. It was literally made for interacting with data.
It has tons of battle-tested, high-quality packages for nearly every type of analysis.
Its data visualization capabilities (ahem, ggplot2) are second to none.
It just seems more natural when dealing with data (functional, dplyr, magrittr).
It's interesting from a language perspective
It can be a good source of ideas for the Python community
5. Source:
by David Robinson
"What are the Most Disliked Programming Languages?"
(https://stackoverflow.blog/2017/10/31/disliked-programming-languages/)
8. Legal characters for variable and function names include:
letters
numbers
underscores (_)
dots (.)
In [1]: str(data.frame)
function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors())
9. Indexing, or attribute access, is done with the
dollar sign ($)
double brackets ([[)
In [2]:
In [3]:
me <- list(
first_name = 'Christopher',
last_name = 'Roach',
title = 'Software Engineer',
company = 'LinkedIn',
start_date = as.Date('2013-06-01')
)
# Nice (tidyverse) library for string manipulation
library(glue)
# Use the `$` operator/function to access the values in `me`
print(glue("{me$first_name} {me[[2]]}",
"has been a {me$title} at {me$company}",
"since {me$start_date}", .sep = " "))
Christopher Roach has been a Software Engineer at LinkedIn since 2013-06-01
10. ASSIGNMENT OPERATORS: = VS <- VS <<-
A holdover from the APL language, where = was used for testing equality.
<- and = work the same in modern R (since 2001), however...
11. In [4]:
In [5]:
str(mean)
# Evaluate the code in a local environment to avoid polluting the global env
local({
# Assign the value of the call to runif to the x arg
mean(x = runif(10))
print(paste("ls() =>", ls()))
})
function (x, ...)
[1] "ls() => "
12. In [6]: local({
# Assign the value of the call to runif to y,
# and call mean with the first argument set to y
mean(y <- runif(10))
print(paste("ls() =>", ls()))
})
[1] "ls() => y"
13. The deep assignment operator (<<-) modifies variables in the parent environments
The closest Python analog would be the global keyword
In [7]: by_2 <- (function(offset) {
i <- 0
function() {
i <<- i + offset
}
})(2)
for (i in 1:5) { print(by_2()) }
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
14. EVERYTHING IS A VECTOR
As a language whose sole purpose in life is to deal with data, R takes a very different view of the
world from most programming languages. In particular, R views all data as plural. In fact, it is
absolutely impossible to have data in R that is just a singular value. In R’s eyes, everything is a
vector.
In [8]:
In [9]:
In [10]:
In [11]:
numbers <- 42
length(numbers)
typeof(numbers)
42[1]
42[2]
1
'double'
42
<NA>
15. THIS ALLOWS US TO DO SOME REALLY WEIRD THINGS...
In [12]: v <- c(1,3,5,7)
v[2][1][1][1][1][1][1][1][1][1][1][1]
3
16. THIS IS WHY THINGS LIKE THIS WORK AUTOMATICALY IN R...
In [13]: v %% 3
1 0 2 1
17. R IS FUNCTIONAL
First class functions
Higher order functions
Lambdas/Closures
It allows “computing on the language”
Attempts to be pure
18. In [14]: # Toolset for introspecting the R language
library(pryr)
# Using `local` to prevent a second reference to `x`
local({
x <- list(name = 'Chris')
print(refs(x))
print(address(x))
# Update the variable and
x$name <- 'Christopher'
print(address(x))
# Increase the reference count on l
y <- x
print(refs(x))
# And, now R should switch to copy-on-modify semantics
x$name <- 'Christopher Roach'
print(address(x))
})
[1] 1
[1] "0x7fa2eaf1a848"
[1] "0x7fa2eaf1a848"
[1] 2
[1] "0x7fa2eaf29d18"
19. OOP IN R
R has several different "systems" for OOP
Two styles of OOP systems:
Encapsulated - methods belong to objects or classes
Functional - methods belong to functions called generics
S3 is a functional system used throughout base R
S4 is a functional system like S3, but more formal
RC is an encapsulated system that bypasses R's copy-on-modify semantics
R6 is an encapsulated system similar to RC, but are lighter weight and avoids some of
RC's issues
20. THE S3 OOP SYSTEM
S3 is an informal system that relies mainly on convention
Methods are the functions that implement the class-specific behavior
Generic functions, or generics for short, are responsible for selecting the correct
method to apply
Method dispatch is based on an object's class attribute.
21. In [15]:
In [16]:
In [17]:
# A factor object can only be intergers
sizes <- c(1L, 1L, 2L, 2L, 2L, 3L, 4L)
print(sizes)
# Set"levels" and "class" to change "sizes" into a factor
attr(sizes, "levels") <- c("S", "M", "L", "XL")
attr(sizes, "class") <- "factor"
# Now, print the new factor vector
print(sizes)
str(print.factor)
[1] 1 1 2 2 2 3 4
[1] S S M M M L XL
Levels: S M L XL
function (x, quote = FALSE, max.levels = NULL, width = getOption("width"),
...)
22. OVERRIDING BEHAVIOR IN S3
In [18]: # Cache the old print.factor function and create a new one
old.print.factor <- print.factor
print.factor <- function(...) {
print("We are printing from the new print.factor function...")
old.print.factor(...)
}
# Print the sizes object, which will call our new print.factor function
print(sizes)
# Then reset the print.factor method back to the original one
print.factor <- old.print.factor
[1] "We are printing from the new print.factor function..."
[1] S S M M M L XL
Levels: S M L XL
23. CREATING CLASSES IN S3
In [19]: attr(me, 'class') <- 'employee'
print.employee <- function(e) {
print(glue("{e$first_name} {e$last_name}",
"has been a {e$title} at {e$company}",
"since {e$start_date}", .sep = " "))
}
print(me)
Christopher Roach has been a Software Engineer at LinkedIn since 2013-06-01
25. This allows us to capture the code for each argument.
And, do fun things with it!
In [21]: print_expr <- function(expr, ...) {
expr <- substitute(expr)
params <- list(...)
val <- eval(expr, params)
print(glue("The value of {deparse(expr)} is {val}"))
}
print_expr(x + y, x = 4, y = 12)
The value of x + y is 16
27. COMBINING PYTHON AND R
Since this is an R notebook, and I want to demonstrate how you can access the power of R from
Python, we need to switch to another notebook. To see how you can incorporate R into your
normal Python and Jupyter workflow, read over
.
this notebook
(https://github.com/croach/pydata_nyc_2017/blob/master/notebooks/rpy2.ipynb)