SlideShare uma empresa Scribd logo
1 de 32
Reproducible Research in R
with Docker
Daniel Nüst | University of Münster | @nordholmen
MünsteR Meetup, Sep 2017
https://www.meetup.com/Munster-R-Users-Group/events/241108949/
Agenda
Motivation
Docker & Rocker
containerit
2
Why should I care about reproducible research?
(an opinionated view)
Improve quality of your work today
Existence of your work tomorrow: journal requirements 2020+
Societal challenges… Who knows the Oxford Dictionaries word of the year
2016?
https://en.oxforddictionaries.com/word-of-the-year/word-of-the-year-2016
3
J. Leek’s tidypvals
https://simplystatistics.org/2017/07/26/announcing-the-tidypvals-package/
4
“Tradition” of notebooks in lab work, e.g. in
chemistry
(analog and digital)
Open Notebook Science
(https://en.wikipedia.org/wiki/Open_notebook_science)
No comparable tradition and education in
younger and mostly digital geostatistics,
GIS, ...
https://twitter.com/wellcometrust/status/49632
3565239955456
5
Lab notebooks
https://www.google.de/search?q=chemistry+l
ab+notebook&safe=off&tbm=isch
https://en.wikipedia.org/wiki/File:Studies_of_the_Arm
_showing_the_Movements_made_by_the_Biceps.jp
g
R Markdown a.k.a. .Rmd
http://rmarkdown.rstudio.com/
Based on Mark DOWN
https://daringfireball.net/projects/markdown/syntax
6
7
#1: reproducibility helps to avoid disaster
#2: reproducibility makes it easier to write papers
#3: reproducibility helps reviewers see it your way
#4: reproducibility enables continuity of your work
#5: reproducibility helps to build your reputation
8
ERC creation process
❏ Submit workspace to publication platform
❏ Publication platform…
❏ extracts metadata
❏ executes analysis
❏ check output vs. upload (syntax)
❏ capture runtime environment
(manifest + image) 9
10
Slide by Docker inventor & Docker, Inc. CTO Solomon Hykes, DockerCon 2014
https://docs.docker.com/engine/docker-overview/ 11
science
data science
research
reproducibility
replication
package &
separate
applications and
their dependencies
for cloud
infrastructures
https://docs.docker.com/engine/docker-overview/ 12
science
data science
research
reproducibility
replication
package &
separate
applications and
their
dependencies for
cloud
infrastructures
Iconsbysynonymsof(CC-BY)
Docker basics
Dockerfile
ENV
RUN
CMD
Docker
Image
pause
stop/kill
start
logs
cp
exec
rm
stats
build
Docker CLI
run Docker
Container
Docker Engine
Docker Registry
run
14
Docker for Data Science
(all the Docker advantages… write once, biz ops, cloud, etc.)
Reproducibility through controlled working environment
Project separation + don’t clutter dev machine
Environment (re)creation, documentation
Adopt good practices on the way
Easy collaboration
Easy transition from testing to production
15
https://hub.docker.com/r/rocker/rstudio/
Base containers (r-base, r-devel, r-ver, ..)
Use case containers (r-devel-ubsan-clang, ..)
Stacks (tidyverse, geospatial, ..)
docker run -it -p 8787:8787 rocker/rstudio
http://localhost:8787/ (rstudio/rstudio)
Rocker: https://github.com/rocker-org
rocker/r-ver and other base images
https://github.com/rocker-org/rocker#base-docker-containers
16
rocker/geospatial and other use cases
https://github.com/rocker-org/rocker#versioned-stack-builds-on-r-ver
17
rocker/geospatial
https://hub.docker.com/r/rocker
/geospatial/
Opinionated view, not based
on CRAN task views
18
https://hub.docker.com/r/rocker/rstudio/
docker run --rm -it -p 8787:8787 rocker/rstudio
http://localhost:8787/ (rstudio/rstudio)
Great example: https://github.com/benmarwick/1989-excavation-report-Madjebebe
docker run --rm -it -p 8787:8787 benmarwick/mjb1989excavationpaper
http://localhost:8787/ (rstudio/rstudio)
19
RStudio Desktop vs. rocker/rstudio
No functional difference, “Desktop” version ist just a lightweight browser wrapper
(https://rpubs.com/jmcphers/rstudio-architecture)
$ docker run -d -p 8787:8787 rocker/rstudio
$ docker ps
20
https://bioconductor.org/help/docker/
21https://hub.docker.com/u/bioconductor/
22
containerit
https://github.com/
o2r-project/
containerit
http://o2r.info/2017/05/30/
containerit-package/
Packaging interactive session
23
> library(containerit); library("gstat"); library("sp")
> data(meuse)
> coordinates(meuse) = ~x+y
> data(meuse.grid)
> gridded(meuse.grid) = ~x+y
> v <- variogram(log(zinc)~1, meuse)
> m <- fit.variogram(v, vgm(1, "Sph", 300, 1))
> plot(v, model = m)
> dockerfile_object <- dockerfile()
INFO [2017-07-05 11:20:54] Trying to determine system
requirements for the package(s) 'sp, gstat, zoo,
futile.logger, xts, lambda.r, spacetime, futile.options,
FNN, intervals, lattice' from sysreq online DB
INFO [2017-07-05 11:21:03] Adding CRAN packages: sp,
gstat, zoo, futile.logger, xts, lambda.r, spacetime,
futile.options, FNN, intervals, lattice
INFO [2017-07-05 11:21:03] Created Dockerfile-Object
based on sessionInfo
> print(dockerfile_object)
FROM rocker/r-ver:3.4.1
LABEL maintainer="daniel"
RUN ["install2.r", "-r 'https://cloud.r-project.org'", "sp",
"gstat", "zoo", "futile.logger", "xts", "lambda.r",
"spacetime", "futile.options", "FNN", "intervals",
"lattice"]
WORKDIR /payload/
CMD ["R"]
> str(dockerfile_object, max.level = 2)
Formal class 'Dockerfile' [..] with 4 slots
..@ image :Formal class 'From' [..] with 2
slots
..@ maintainer :Formal class 'Label' [..] with 2 slots
..@ instructions :List of 2
..@ cmd :Formal class 'Cmd' [..] with 2
slots
Packaging a script w/ sysreqs dependency resolving
library(rgdal); require(maptools)
nc <- rgdal::readOGR(system.file("shapes/",
package="maptools"), "sids", verbose = FALSE)
proj4string(nc) <- CRS("+proj=longlat
+datum=NAD27")
plot(nc)
summary(nc)
> scriptCmd <- CMD_Rscript("demo.R")
> dockerfile_object <- dockerfile(
from = "~/Documents/2017_useR/demo.R",
cmd = scriptCmd)
# curl https://sysreqs.r-hub.io/pkg/
rgdal,sp,lattice/linux-x86_64-debian-gcc
# ["libgdal-dev", "libproj-dev", "gdal-bin"]
> print(dockerfile_object)
FROM rocker/r-ver:3.4.0
LABEL maintainer="daniel"
RUN export DEBIAN_FRONTEND=noninteractive; apt-get
-y update 
&& apt-get install -y gdal-bin 
libgdal-dev 
libproj-dev
RUN ["install2.r", "-r 'https://cloud.r-
project.org'", "rgdal", "sp", "lattice"]
WORKDIR /payload/
COPY [".", "."]
CMD ["R", "--vanilla", "-f",
"containerit_1a977e2dcdea.R"]
24
Running the container
> write(dockerfile_object)
INFO [2017-07-06 10:10:05] Writing dockerfile to
/home/daniel/Documents/2017_useR/Dockerfile
$ docker build -t user2017demo .
Sending build context to Docker daemon 6.054MB
Step 1/7 : FROM rocker/r-ver:3.4.1
3.4.1: Pulling from rocker/r-ver
c75480ad9aaf: Pull complete
[...]
The following additional packages will be installed:
[...]
* installing *source* package ‘foreign’ ...
[...]
Successfully built e30936ac8687
Successfully tagged user2017demo:latest
25
$ docker run -it user2017demo
R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical
Computing
Platform: x86_64-pc-linux-gnu (64-bit)
[..]
> library(rgdal); require(maptools)
Loading required package: sp
> nc <- rgdal::readOGR(system.file("shapes/",
package="maptools"), "sids", verbose = FALSE)
[...]
> summary(nc)
Object of class SpatialPolygonsDataFrame
Coordinates:
min max
x -84.32385 -75.45698
y 33.88199 36.58965
Is projected: FALSE
[...]
Running container with data in plain R with harbor
https://github.com/nuest/containerit/blob/79f8832975e00c84cdcc665df0c2846d834e27c5/demo/fullstack.R
> write.csv(file = “dataset.csv”, x = cars)
> dataset <- read.csv("dataset.csv")
> model <- lm(log(dist) ~ log(speed),
data = dataset)
> summary(model)
> cmd <- CMD_Rscript("script.R")
> df <- containerit::dockerfile(from = workspace,
cmd = cmd,
r_version = "3.3.3",
copy = "script_dir")
> write(df)
/tmp/Rtmpeachap/
├── dataset.csv
├── Dockerfile
└── script.R
> harbor::docker_cmd(harbor::localhost, "build",
arg = workspace,
docker_opts = c("-t", "fullstack-r-demo")
capture_text = TRUE
)
> harbor::docker_run(image = "fullstack-r-demo")
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
[...]
> dataset <- read.csv("dataset.csv")
> model <- lm(log(dist) ~ log(speed), data = dataset)
> summary(model)
Call:
lm(formula = log(dist) ~ log(speed), data = dataset)
Residuals:
Min 1Q Median 3Q Max
-1.00215 -0.24578 -0.02898 0.20717 0.88289
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7297 0.3758 -1.941 0.0581 .
26
Running container with data in plain R with harbor
https://github.com/nuest/containerit/blob/79f8832975e00c84cdcc665df0c2846d834e27c5/demo/fullstack.R
> write.csv(file = “dataset.csv”, x = cars)
> dataset <- read.csv("dataset.csv")
> model <- lm(log(dist) ~ log(speed),
data = dataset)
> summary(model)
> cmd <- CMD_Rscript("script.R")
> df <- containerit::dockerfile(from = workspace,
cmd = cmd,
r_version = "3.3.3",
copy = "script_dir")
> save(df)
/tmp/Rtmpeachap/
├── dataset.csv
├── Dockerfile
└── script.R
> harbor::docker_cmd(harbor::localhost, "build",
arg = workspace,
docker_opts = c("-t", "fullstack-r-
demo"),
capture_text = TRUE
)
> harbor::docker_run(image = "fullstack-r-demo")
27
More
Labels for metadata
devtools session information (install from git under dev.)
Custom base images
Docker vs. R
http://bit.ly/docker-r
Boettiger, Carl. 2015. “An Introduction
to Docker for Reproducible Research,
with Examples from the R
Environment.” ACM SIGOPS
Operating Systems Review 49
(January): 71–79.
doi:10.1145/2723872.2723882 28
Limitations
No shell, no fun
Windows :-(
image size
Versioning
How to access files/plots from a container?
29
Summary
Docker is a great tool for data science, reproducible research, consulting, …
Be “tidy” outside of your R Markdown
containerit makes Docker easier
(DRY, less copy&paste, best practices, automatic system dependencies)
Benefits from Rocker (MRAN by default, …), harbor, ...
Alternatives / potential for combination:
package management locally (packrat, pkgsnap, switchr/GRANBase)
or
remotely (MRAN timemachine/checkpoint), or install specific versions
from
30
https://github.com/o2r-project/containerit/projects/1
Outlook containerit
Support o2r’s ERC creation service
Get feedback
Singularity
OCI/acbuild
CRAN
Docker + R paper for RJournal?
Package rplumber / jug web apps
Versioned system libs (sf::sf_extSoftVersion()) 31
Thanks!
What are your questions?
32
@o2r_project
github.com/o2r-project
o2r.info

Mais conteúdo relacionado

Mais procurados

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型wang xing
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R Vivian S. Zhang
 
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin Stożek
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin StożekJDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin Stożek
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin StożekPROIDEA
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章moai kids
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce APITom Croucher
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Sean Krail
 
Pwrake: Distributed Workflow Engine for e-Science - RubyConfX
Pwrake: Distributed Workflow Engine for e-Science - RubyConfXPwrake: Distributed Workflow Engine for e-Science - RubyConfX
Pwrake: Distributed Workflow Engine for e-Science - RubyConfXMasahiro Tanaka
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod PolyakovYulia Shcherbachova
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Bulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaBulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaDatadog
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDAprithan
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 

Mais procurados (20)

mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin Stożek
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin StożekJDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin Stożek
JDO 2019: Kubernetes logging techniques with a touch of LogSense - Marcin Stożek
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)
 
Pwrake: Distributed Workflow Engine for e-Science - RubyConfX
Pwrake: Distributed Workflow Engine for e-Science - RubyConfXPwrake: Distributed Workflow Engine for e-Science - RubyConfX
Pwrake: Distributed Workflow Engine for e-Science - RubyConfX
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Bulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaBulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo Cabanilla
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDA
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 

Semelhante a RR & Docker @ MuensteR Meetup (Sep 2017)

Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Zabbix
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Containerd Project Update: FOSDEM 2018
Containerd Project Update: FOSDEM 2018Containerd Project Update: FOSDEM 2018
Containerd Project Update: FOSDEM 2018Phil Estes
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick ThreeNETWAYS
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRANRevolution Analytics
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAGetting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAJISC GECO
 
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...PROIDEA
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science MeetupDaniel Nüst
 

Semelhante a RR & Docker @ MuensteR Meetup (Sep 2017) (20)

Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Containerd Project Update: FOSDEM 2018
Containerd Project Update: FOSDEM 2018Containerd Project Update: FOSDEM 2018
Containerd Project Update: FOSDEM 2018
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAGetting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
 
Getting started with PostGIS geographic database
Getting started with PostGIS geographic databaseGetting started with PostGIS geographic database
Getting started with PostGIS geographic database
 
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 

Mais de Daniel Nüst

Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Daniel Nüst
 
Docker @ FOSS4G 2016, Bonn
Docker @ FOSS4G 2016, BonnDocker @ FOSS4G 2016, Bonn
Docker @ FOSS4G 2016, BonnDaniel Nüst
 
Frameworks for geoprocessing on the web with R
Frameworks for geoprocessing on the web with RFrameworks for geoprocessing on the web with R
Frameworks for geoprocessing on the web with RDaniel Nüst
 
Agile 2015 a-geo-label-for-the-sensor-web
Agile 2015 a-geo-label-for-the-sensor-webAgile 2015 a-geo-label-for-the-sensor-web
Agile 2015 a-geo-label-for-the-sensor-webDaniel Nüst
 
Visualising Interpolations of Mobile Sensor Observations
Visualising Interpolations of Mobile Sensor ObservationsVisualising Interpolations of Mobile Sensor Observations
Visualising Interpolations of Mobile Sensor ObservationsDaniel Nüst
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application PatternsDaniel Nüst
 
JavaScript Client Libraries for the (Former) Long Tail of OGC Standards
JavaScript Client Libraries for the (Former) Long Tail of OGC StandardsJavaScript Client Libraries for the (Former) Long Tail of OGC Standards
JavaScript Client Libraries for the (Former) Long Tail of OGC StandardsDaniel Nüst
 
Open Source and GitHub for Teaching with Software Development Projects
Open Source and GitHub for Teaching with Software Development ProjectsOpen Source and GitHub for Teaching with Software Development Projects
Open Source and GitHub for Teaching with Software Development ProjectsDaniel Nüst
 
5 Star Open Geoprocessing
5 Star Open Geoprocessing5 Star Open Geoprocessing
5 Star Open GeoprocessingDaniel Nüst
 
The 52°North Web Processing Service
The 52°North Web Processing ServiceThe 52°North Web Processing Service
The 52°North Web Processing ServiceDaniel Nüst
 
Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdfDaniel Nüst
 
OGC SOS for Your Data
OGC SOS for Your DataOGC SOS for Your Data
OGC SOS for Your DataDaniel Nüst
 
sos4R - Accessing SensorWeb Data from R
sos4R - Accessing SensorWeb Data from Rsos4R - Accessing SensorWeb Data from R
sos4R - Accessing SensorWeb Data from RDaniel Nüst
 
Connecting R to the Sensor Web
Connecting R to the Sensor WebConnecting R to the Sensor Web
Connecting R to the Sensor WebDaniel Nüst
 
sos4R - 52° North Innovation Price Presentation
sos4R - 52° North Innovation Price Presentationsos4R - 52° North Innovation Price Presentation
sos4R - 52° North Innovation Price PresentationDaniel Nüst
 
Visualizing the Availability of Temporally Structured Sensor Data
Visualizing the Availability of Temporally Structured Sensor DataVisualizing the Availability of Temporally Structured Sensor Data
Visualizing the Availability of Temporally Structured Sensor DataDaniel Nüst
 

Mais de Daniel Nüst (18)

Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Docker @ FOSS4G 2016, Bonn
Docker @ FOSS4G 2016, BonnDocker @ FOSS4G 2016, Bonn
Docker @ FOSS4G 2016, Bonn
 
Atlas Zukünfte
Atlas ZukünfteAtlas Zukünfte
Atlas Zukünfte
 
Frameworks for geoprocessing on the web with R
Frameworks for geoprocessing on the web with RFrameworks for geoprocessing on the web with R
Frameworks for geoprocessing on the web with R
 
Agile 2015 a-geo-label-for-the-sensor-web
Agile 2015 a-geo-label-for-the-sensor-webAgile 2015 a-geo-label-for-the-sensor-web
Agile 2015 a-geo-label-for-the-sensor-web
 
Visualising Interpolations of Mobile Sensor Observations
Visualising Interpolations of Mobile Sensor ObservationsVisualising Interpolations of Mobile Sensor Observations
Visualising Interpolations of Mobile Sensor Observations
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application Patterns
 
JavaScript Client Libraries for the (Former) Long Tail of OGC Standards
JavaScript Client Libraries for the (Former) Long Tail of OGC StandardsJavaScript Client Libraries for the (Former) Long Tail of OGC Standards
JavaScript Client Libraries for the (Former) Long Tail of OGC Standards
 
Open Source and GitHub for Teaching with Software Development Projects
Open Source and GitHub for Teaching with Software Development ProjectsOpen Source and GitHub for Teaching with Software Development Projects
Open Source and GitHub for Teaching with Software Development Projects
 
5 Star Open Geoprocessing
5 Star Open Geoprocessing5 Star Open Geoprocessing
5 Star Open Geoprocessing
 
The 52°North Web Processing Service
The 52°North Web Processing ServiceThe 52°North Web Processing Service
The 52°North Web Processing Service
 
Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdf
 
OGC SOS for Your Data
OGC SOS for Your DataOGC SOS for Your Data
OGC SOS for Your Data
 
sos4R - Accessing SensorWeb Data from R
sos4R - Accessing SensorWeb Data from Rsos4R - Accessing SensorWeb Data from R
sos4R - Accessing SensorWeb Data from R
 
Connecting R to the Sensor Web
Connecting R to the Sensor WebConnecting R to the Sensor Web
Connecting R to the Sensor Web
 
sos4R @ OGC TC
sos4R @ OGC TCsos4R @ OGC TC
sos4R @ OGC TC
 
sos4R - 52° North Innovation Price Presentation
sos4R - 52° North Innovation Price Presentationsos4R - 52° North Innovation Price Presentation
sos4R - 52° North Innovation Price Presentation
 
Visualizing the Availability of Temporally Structured Sensor Data
Visualizing the Availability of Temporally Structured Sensor DataVisualizing the Availability of Temporally Structured Sensor Data
Visualizing the Availability of Temporally Structured Sensor Data
 

Último

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Último (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

RR & Docker @ MuensteR Meetup (Sep 2017)

  • 1. Reproducible Research in R with Docker Daniel Nüst | University of Münster | @nordholmen MünsteR Meetup, Sep 2017 https://www.meetup.com/Munster-R-Users-Group/events/241108949/
  • 3. Why should I care about reproducible research? (an opinionated view) Improve quality of your work today Existence of your work tomorrow: journal requirements 2020+ Societal challenges… Who knows the Oxford Dictionaries word of the year 2016? https://en.oxforddictionaries.com/word-of-the-year/word-of-the-year-2016 3
  • 5. “Tradition” of notebooks in lab work, e.g. in chemistry (analog and digital) Open Notebook Science (https://en.wikipedia.org/wiki/Open_notebook_science) No comparable tradition and education in younger and mostly digital geostatistics, GIS, ... https://twitter.com/wellcometrust/status/49632 3565239955456 5 Lab notebooks https://www.google.de/search?q=chemistry+l ab+notebook&safe=off&tbm=isch https://en.wikipedia.org/wiki/File:Studies_of_the_Arm _showing_the_Movements_made_by_the_Biceps.jp g
  • 6. R Markdown a.k.a. .Rmd http://rmarkdown.rstudio.com/ Based on Mark DOWN https://daringfireball.net/projects/markdown/syntax 6
  • 7. 7 #1: reproducibility helps to avoid disaster #2: reproducibility makes it easier to write papers #3: reproducibility helps reviewers see it your way #4: reproducibility enables continuity of your work #5: reproducibility helps to build your reputation
  • 8. 8
  • 9. ERC creation process ❏ Submit workspace to publication platform ❏ Publication platform… ❏ extracts metadata ❏ executes analysis ❏ check output vs. upload (syntax) ❏ capture runtime environment (manifest + image) 9
  • 10. 10 Slide by Docker inventor & Docker, Inc. CTO Solomon Hykes, DockerCon 2014
  • 12. https://docs.docker.com/engine/docker-overview/ 12 science data science research reproducibility replication package & separate applications and their dependencies for cloud infrastructures Iconsbysynonymsof(CC-BY)
  • 14. 14 Docker for Data Science (all the Docker advantages… write once, biz ops, cloud, etc.) Reproducibility through controlled working environment Project separation + don’t clutter dev machine Environment (re)creation, documentation Adopt good practices on the way Easy collaboration Easy transition from testing to production
  • 15. 15 https://hub.docker.com/r/rocker/rstudio/ Base containers (r-base, r-devel, r-ver, ..) Use case containers (r-devel-ubsan-clang, ..) Stacks (tidyverse, geospatial, ..) docker run -it -p 8787:8787 rocker/rstudio http://localhost:8787/ (rstudio/rstudio) Rocker: https://github.com/rocker-org
  • 16. rocker/r-ver and other base images https://github.com/rocker-org/rocker#base-docker-containers 16
  • 17. rocker/geospatial and other use cases https://github.com/rocker-org/rocker#versioned-stack-builds-on-r-ver 17
  • 19. https://hub.docker.com/r/rocker/rstudio/ docker run --rm -it -p 8787:8787 rocker/rstudio http://localhost:8787/ (rstudio/rstudio) Great example: https://github.com/benmarwick/1989-excavation-report-Madjebebe docker run --rm -it -p 8787:8787 benmarwick/mjb1989excavationpaper http://localhost:8787/ (rstudio/rstudio) 19
  • 20. RStudio Desktop vs. rocker/rstudio No functional difference, “Desktop” version ist just a lightweight browser wrapper (https://rpubs.com/jmcphers/rstudio-architecture) $ docker run -d -p 8787:8787 rocker/rstudio $ docker ps 20
  • 23. Packaging interactive session 23 > library(containerit); library("gstat"); library("sp") > data(meuse) > coordinates(meuse) = ~x+y > data(meuse.grid) > gridded(meuse.grid) = ~x+y > v <- variogram(log(zinc)~1, meuse) > m <- fit.variogram(v, vgm(1, "Sph", 300, 1)) > plot(v, model = m) > dockerfile_object <- dockerfile() INFO [2017-07-05 11:20:54] Trying to determine system requirements for the package(s) 'sp, gstat, zoo, futile.logger, xts, lambda.r, spacetime, futile.options, FNN, intervals, lattice' from sysreq online DB INFO [2017-07-05 11:21:03] Adding CRAN packages: sp, gstat, zoo, futile.logger, xts, lambda.r, spacetime, futile.options, FNN, intervals, lattice INFO [2017-07-05 11:21:03] Created Dockerfile-Object based on sessionInfo > print(dockerfile_object) FROM rocker/r-ver:3.4.1 LABEL maintainer="daniel" RUN ["install2.r", "-r 'https://cloud.r-project.org'", "sp", "gstat", "zoo", "futile.logger", "xts", "lambda.r", "spacetime", "futile.options", "FNN", "intervals", "lattice"] WORKDIR /payload/ CMD ["R"] > str(dockerfile_object, max.level = 2) Formal class 'Dockerfile' [..] with 4 slots ..@ image :Formal class 'From' [..] with 2 slots ..@ maintainer :Formal class 'Label' [..] with 2 slots ..@ instructions :List of 2 ..@ cmd :Formal class 'Cmd' [..] with 2 slots
  • 24. Packaging a script w/ sysreqs dependency resolving library(rgdal); require(maptools) nc <- rgdal::readOGR(system.file("shapes/", package="maptools"), "sids", verbose = FALSE) proj4string(nc) <- CRS("+proj=longlat +datum=NAD27") plot(nc) summary(nc) > scriptCmd <- CMD_Rscript("demo.R") > dockerfile_object <- dockerfile( from = "~/Documents/2017_useR/demo.R", cmd = scriptCmd) # curl https://sysreqs.r-hub.io/pkg/ rgdal,sp,lattice/linux-x86_64-debian-gcc # ["libgdal-dev", "libproj-dev", "gdal-bin"] > print(dockerfile_object) FROM rocker/r-ver:3.4.0 LABEL maintainer="daniel" RUN export DEBIAN_FRONTEND=noninteractive; apt-get -y update && apt-get install -y gdal-bin libgdal-dev libproj-dev RUN ["install2.r", "-r 'https://cloud.r- project.org'", "rgdal", "sp", "lattice"] WORKDIR /payload/ COPY [".", "."] CMD ["R", "--vanilla", "-f", "containerit_1a977e2dcdea.R"] 24
  • 25. Running the container > write(dockerfile_object) INFO [2017-07-06 10:10:05] Writing dockerfile to /home/daniel/Documents/2017_useR/Dockerfile $ docker build -t user2017demo . Sending build context to Docker daemon 6.054MB Step 1/7 : FROM rocker/r-ver:3.4.1 3.4.1: Pulling from rocker/r-ver c75480ad9aaf: Pull complete [...] The following additional packages will be installed: [...] * installing *source* package ‘foreign’ ... [...] Successfully built e30936ac8687 Successfully tagged user2017demo:latest 25 $ docker run -it user2017demo R version 3.4.1 (2017-06-30) -- "Single Candle" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) [..] > library(rgdal); require(maptools) Loading required package: sp > nc <- rgdal::readOGR(system.file("shapes/", package="maptools"), "sids", verbose = FALSE) [...] > summary(nc) Object of class SpatialPolygonsDataFrame Coordinates: min max x -84.32385 -75.45698 y 33.88199 36.58965 Is projected: FALSE [...]
  • 26. Running container with data in plain R with harbor https://github.com/nuest/containerit/blob/79f8832975e00c84cdcc665df0c2846d834e27c5/demo/fullstack.R > write.csv(file = “dataset.csv”, x = cars) > dataset <- read.csv("dataset.csv") > model <- lm(log(dist) ~ log(speed), data = dataset) > summary(model) > cmd <- CMD_Rscript("script.R") > df <- containerit::dockerfile(from = workspace, cmd = cmd, r_version = "3.3.3", copy = "script_dir") > write(df) /tmp/Rtmpeachap/ ├── dataset.csv ├── Dockerfile └── script.R > harbor::docker_cmd(harbor::localhost, "build", arg = workspace, docker_opts = c("-t", "fullstack-r-demo") capture_text = TRUE ) > harbor::docker_run(image = "fullstack-r-demo") R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) [...] > dataset <- read.csv("dataset.csv") > model <- lm(log(dist) ~ log(speed), data = dataset) > summary(model) Call: lm(formula = log(dist) ~ log(speed), data = dataset) Residuals: Min 1Q Median 3Q Max -1.00215 -0.24578 -0.02898 0.20717 0.88289 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.7297 0.3758 -1.941 0.0581 . 26
  • 27. Running container with data in plain R with harbor https://github.com/nuest/containerit/blob/79f8832975e00c84cdcc665df0c2846d834e27c5/demo/fullstack.R > write.csv(file = “dataset.csv”, x = cars) > dataset <- read.csv("dataset.csv") > model <- lm(log(dist) ~ log(speed), data = dataset) > summary(model) > cmd <- CMD_Rscript("script.R") > df <- containerit::dockerfile(from = workspace, cmd = cmd, r_version = "3.3.3", copy = "script_dir") > save(df) /tmp/Rtmpeachap/ ├── dataset.csv ├── Dockerfile └── script.R > harbor::docker_cmd(harbor::localhost, "build", arg = workspace, docker_opts = c("-t", "fullstack-r- demo"), capture_text = TRUE ) > harbor::docker_run(image = "fullstack-r-demo") 27
  • 28. More Labels for metadata devtools session information (install from git under dev.) Custom base images Docker vs. R http://bit.ly/docker-r Boettiger, Carl. 2015. “An Introduction to Docker for Reproducible Research, with Examples from the R Environment.” ACM SIGOPS Operating Systems Review 49 (January): 71–79. doi:10.1145/2723872.2723882 28
  • 29. Limitations No shell, no fun Windows :-( image size Versioning How to access files/plots from a container? 29
  • 30. Summary Docker is a great tool for data science, reproducible research, consulting, … Be “tidy” outside of your R Markdown containerit makes Docker easier (DRY, less copy&paste, best practices, automatic system dependencies) Benefits from Rocker (MRAN by default, …), harbor, ... Alternatives / potential for combination: package management locally (packrat, pkgsnap, switchr/GRANBase) or remotely (MRAN timemachine/checkpoint), or install specific versions from 30
  • 31. https://github.com/o2r-project/containerit/projects/1 Outlook containerit Support o2r’s ERC creation service Get feedback Singularity OCI/acbuild CRAN Docker + R paper for RJournal? Package rplumber / jug web apps Versioned system libs (sf::sf_extSoftVersion()) 31
  • 32. Thanks! What are your questions? 32 @o2r_project github.com/o2r-project o2r.info

Notas do Editor

  1. Competitive advantage
  2. > True @wellcomelibrary? MT @Libroantiguo Marie Curie's experimental notebook - after almost 100yrs, still radioactive. http://www.openculture.com/2015/07/marie-curies-research-papers-are-still-radioactive-100-years-later.html
  3. Markdown is a lightweight markup language with plain text formatting syntax. It is designed so that it can be converted to HTML and many other formats using a tool by the same name. Markdown is often used to format readme files, for writing messages in online discussion forums, and to create rich text using a plain text editor.
  4. Who is a researcher?
  5. The ERC provides a well-structured container for both the needs of journals (ERC as the item under review), archives (suitable metadata and packaging formats), and researchers (literally everything needed to re-do an analysis is there). It relies on Docker to define and store the runtime environment. ERCs should be simple enough to be created manually and absorb best practices for organizing digital workspaces. “Bundle” Nested containers (BagIt, Docker) Librarian-ready Reproducibility range of 5 to 10 years (still worth integrating, target users are not science historians) Desktop-size data and algorithms - closed and complete “Geo-stuff” and R for the “last 10 %” Remain understandable for scientists
  6. house vs. appartment
  7. house vs. appartment
  8. Images including views (protmetcore, etc.)
  9. Dockerizing R Dockerizing Research and Development Environments Running Tests Dockerizing Documents Controll Docker Containers from R R and Docker for Complex Web Applications
  10. YES, you could do this manually, but the moment that there are other container solutions supported things become more interesting! Also, repetative tasks