WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
1. Workshops
in
next-‐genera1on
science
at
UNC
Charlo7e
2014
Workshop
2
-‐
R,
RStudio,
&
reproducible
research
with
knitr
1
2. R,
RStudio,
&
reproducible
research
with
knitr
2
wings
2014
3. No
programming
experience
necessary
"we
wanted
users
to
be
able
to
begin
in
an
interac1ve
environment,
where
they
did
not
consciously
think
of
themselves
as
programming.
Then
as
their
needs
became
clearer
and
their
sophis1ca1on
increased,
they
should
be
able
to
slide
gradually
into
programming..."
John
Chambers,
Stages
in
the
Evolu0on
of
S
3
4. Why
use
R?
• Free
&
open
source
• Has
a
lot
of
support
– Popular
in
many
domains
(finance,
business
analy1cs,
sta1s1cs,
biology)
• Many
libraries
available
for
biological
data
analysis
through
Bioconductor
project
– Such
as
EdgeR
(today)
• Now
has
an
easy
to
use,
free
user
interface
called
RStudio
4
5. RStudio
• A
very
nice
graphical
user
interface
for
R.
• It's
free!
• Integrates
well
with
knitr
– tool
for
wri1ng
sta1s1cal
reports
w/
R
markdown
5
6. R
Markdown
".Rmd"
• Lets
you
write
a
report
that
combines
results
and
commands
• Sounds
weird,
but
once
you
get
used
to
it,
it's
very
powerful
• Catch
mistakes
before
publica1on
– Ask
a
friend
to
run
&
review
your
data
analysis
6
7. knitr
&
R
Markdown
enable
literate
programming
• A
way
to
do
"literate
programming"
– Developed
by
Donald
Knuth,
Stanford
Computer
Science
professor
• Literate
programming:
Write
programs
that
explain
what
they
are
doing
while
they
are
doing
it.
• Prac1cal
applica1on:
Data
Analysis
Reports
7
8. Plan
for
Today
• Introduce
R
and
RStudio
– Part
I:
Func1ons
&
plots
– Part
2:
Markdown
– Part
3:
See
how
sta1s1cal
tes1ng
works
in
R
• Differen1al
expression
analysis
walk-‐through
(may
extend
into
Workshop
3)
• Goal:
Get
you
started!
– Lots
of
Web
resources
for
further
study
8
10. Start
RStudio
• RStudio
has
panes
– w/
min,
max
bu7ons
(top
right)
• Panes
have
tabs
10
console
where
you
type
commands
environment,
shows
variables
you've
defined
11. Make
new
project
(Part
1)
• Select
File
>
Project
>
New
Project
..
• Choose
New
Directory
11
15. • Open
folder
wings2014
• See
wings2014.Rproj
file
• Tip:
Aier
quit,
double-‐click
to
start
RStudio
with
correct
directory
sekngs
15
16. Enter
commands
in
Console
16
>
symbol
is
the
prompt
• Type
commands
or
expressions
at
the
prompt,
ENTER
• R
evaluates
what
you
type,
prints
the
result
• Returns
prompt
17. Prac1ce:
Try
arithme1c
expressions
• Add
+
• Subtract
-‐
• Mul1ply
*
• Raise
to
a
power
**
17
• Expressions
return
values
as
one-‐element
vectors.
• [1]
indicates
that
the
value
next
to
it
has
this
index.
18. Prac1ce:
Save
results
to
variables
18
• Use
'='
to
assign
result
to
a
variable
– Nothing
printed
• Type
variable
name
to
see
what's
in
it
• Use
variables
in
expressions
19. Variables
refer
to
objects
19
• Environment
tab
shows
objects
created
thus
far
• Most
of
what
you
do
in
R
involves
manipula1ng
objects
saved
to
variable
names
– Use
objects
as
inputs
to
func1ons
20. R
func1ons
• R
has
many
func1ons
– math
– plokng
– sta1s1cal
tests
• Func1ons
take
inputs
called
arguments
• Most
func1ons
have
many
possible
arguments
– Usually
have
reasonable
defaults
20
argument
21. How
to
use
a
func1on
in
4
steps
1. Type
func1on
name
2. Type
"("
open
paren
! RStudio
types
closing
paren
for
you
3. Type
arguments
– if
more
than
one
argument,
insert
","
(comma)
4. Type
ENTER
21
sqrt
calculates
square
root
22. Prac1ce:
rnorm
func1on
• rnorm
creates
a
vector
of
numbers
randomly
sampled
from
normal
distribu1on
with
specified
mean,
standard
devia1on
22
func1on
name
rnorm(10,5,5)!
sample
size
mean
standard
devia1on
arguments
23. Prac1ce:
rnorm
func1on
• Mean
and
standard
devia1on
are
op1onal
• If
you
don't
specify
them,
they
default
default
to:
– 0
default
mean
– 1
default
sd
23
24. R
1p!
• Use
UP
arrow
key
to
retrieve
previous
command
– Saves
typing
24
25. Prac1ce:
R
allows
named
arguments
Order
can
vary
25
rnorm(10,mean=5,sd=2)!
26. 26
• Type
help(rnorm)
to
list
arguments,
defaults
• help
is
a
func1on
– takes
other
func1ons
as
arguments
help
shows
how
to
use
a
func1on
27. Now
you
know
how
to...
• Calculate
values
&
see
the
result
• Save
output
to
variables
• Use
Environment
tab
to
view
variables
• Use
R
func1ons
Next
-‐-‐-‐
ploKng!!!
27
28. R
plokng
func1ons
• Many
op1ons
– generic
x-‐y
plot,
sca7er
plots
– barplots
– dendrograms
– histograms
...
and
much
more
• Highly
configurable!
– log
or
linear
scale
axes
– different
characters
or
colors
for
points
...
and
much
more
28
29. Prac1ce:
Generic
x-‐y
plot
(sca7er
plot)
• named
argument
main
determines
plot
1tle
• Note:
Enclose
text
in
quotes
29
30. Prac1ce:
Try
other
op1ons
• col
-‐
color
of
points
(in
quotes)
• pch
-‐
point
character
– numeric
code
– le7er
(in
quotes)
30
and
many
more..
32. Prac1ce:
Adding
to
a
plot
(1)
• abline -‐
"a
b
line"
– add
straight
line
• Arguments:
– v
or
h
for
loca1on
of
ver1cal
or
horizontal
line
– a
and
b
for
slope
and
y
intercept
32
33. Prac1ce:
Adding
to
a
plot
(2)
• points
– add
points
to
a
plot
• Arguments:
– x
,
y
x
&
y
values
for
the
points
– other
op1ons,
same
as
for
plot !
33
34. Take-‐home:
In
R
you
can
"script"
a
plot
• Using
plokng
commands
like
points,
abline,
lines
you
can
add
more
data
to
a
plot,
element
by
element
• Most
plokng
commands
accept
the
same
op1ons,
like
– pch
-‐
point
character
– col
-‐
color
• Learning
one
plokng
command
helps
you
learn
many.
34
37. How
to
install
knitr
• Go
to
Packages
tab
• Not
checked?
– Check
it
• Not
installed?
– Select
Tools
>
Install
Packages...
– Enter
knitr
– Click
Install
• May
need
to
restart
RStudio
37
38. Setup
-‐
to
enable
be7er
coding!
Go
to
Tools
>
Global
Preferences
>
Panes
• Top
right:
console
• Lower
right:
Environment,
History,
Files,
Plots,
Help
• Top
Lei:
Source
• Lower
lei:
everything
else
38
39. Prac1ce:
Make
R
Markdown
file
• Click
"new"
file
icon
• Choose
R
Markdown
– Creates
an
example
R
Markdown
• Take
a
moment
to
scan
document
39
40. R
Markdown
has
plain
text
with
formakng
instruc1ons
• Row
of
"==="
makes
"Title"
a
top
level
heading
40
41. R
Markdown
has
code
chunks
• Code
chunk
-‐
three
back
1cs,
{r},
ends
with
three
more
back
1cs
• gray
background
41
42. knitr
"knits"
code
&
text
• Makes
an
HTML
document
(web
page)
that
combines
– code
– output
from
code
– your
text
explana1ons
42
43. Prac1ce:
Knit
HTML
• Save
the
file
as
"Example.Rmd"
• Click
• Preview
appears
• HTML
file
appears
• Click
Example.html
in
File
tab
– choose
View
in
Web
browser
43
44. knitr
makes
an
HTML
document
(a
Web
page)
• Images
embedded
• You
can
email
it,
save
in
a
Dropbox,
etc
44
51. Sta1s1cal
tests
in
R
• Tests
implemented
as
func1ons
– Usually
return
list
objects
• List
is
– object
that
contains
other
objects
of
many
types
• Previously,
you
saw
vectors
– Output
of
rnorm
command
– Vectors
are
like
lists
that
only
contain
one
type
of
object
(e.g.,
numbers
only)
51
52. Prac1ce:
Start
a
new
sec1on
• Heading,
smaller
than
1tle
heading
52
• Make
new
code
chunk
• Make
new
vectors
• Run
t.test!
53. Tip:
Markdown
help
• Using
R
Markdown
opens
Web
page
w/
more
info
• Markdown
Quick
Reference
shows
Markdown
codes
in
Help
tab
53
54. Prac1ce:
Run
the
code
54
• t.test
output
is
in
result!
• result is
a
list
• Cursor
inside
chunk
• Type
CNTRL-‐ENTER
– or
click
run
58. Goals
• Show
you
how
to
structure
a
data
analysis
– Useful
framework
you
can
use
in
many
sekngs
• Give
you
an
example
differen1al
gene
expression
analysis
for
RNA-‐Seq
– Use
it
as
a
star1ng
point
for
other
projects
–
Tip:
Review
edgeR
user
guide
for
other
example
data
analyses
58
59. Structure
of
the
data
analysis
• Introduc1on
– explain
the
experimental
design
– state
ques1ons
(no
more
than
3,
ideally
2)
• Analysis
– describe
steps
of
analysis,
with
results
– explain
judgment
calls,
like
P
value
cutoffs
• Conclusion
– answer
the
original
ques1ons
• State
limita1ons
of
the
analysis
• Session
info
including
soiware
versions
used
Adapted
from
Jeff
Leek's
Data
Analysis,
Coursera
59
60. Prac1ce:
Setup
• Go
to
h7ps://bitbucket.org/lorainelab/tomatopollen
60
62. Move
to
Desktop
• Subfolders
correspond
to
analysis
chunks
– See
README.md
for
details
• Open
Differen0alExpression
Folder
name
suffix
based
on
repo
version
62
64. Review
of
the
experiment
• Tomato
plants
subjected
to
chronic
mild
heat
stress
&
control
– Greenhouse
C
– Greenhouse
B
• Mature
pollen
grains
harvested
in
batches
over
eight
weeks,
~
10
plants
per
batch
– One
treatment
sample,
one
control
sample
per
collec1on
• RNA
extracted,
sent
to
UCLA
for
sequencing
– 10
libraries,
5
treatments,
5
controls,
69
base
paired
end
sequencing
64
Next:
Step-‐by-‐step
walk-‐through
of
R
Markdown