SlideShare uma empresa Scribd logo
1 de 44
Reproducible
research
C. Tobin Magle, PhD
06-26-2017
10:00 – 11:30 a.m.
Morgan Library
Computer Classroom 175
Outline
• What is reproducible research?
• Why would I do that?
• How? (in R Studio)
• Automation
• Git
• R Markdown
Hypothesis Data
Experimental
design
ResultsArticle
The research cycle
1. Technological advances:
• Huge, complex digital datasets
• Ability to share
2. Human Error:
• Poor Reporting
• Flawed analyses
Complications
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Processing/
Cleaning
Analysis
Open Data
Code
The research cycle
Reproducible research
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
Reproducible research
=
Data (with metadata)
+
Code/Software
Reproducible research
=
Transparency
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Cleaning
Analysis
Open Data
Code
Reproducible
Research
The research cycle
Replication vs. Reproducibility
• Replication: Same conclusion new study (gold standard)
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Replication isn’t always feasible: too big, too costly, too time
consuming, one time event, rare samples
• Reproducibility: Same results from same data and code (minimum
standard for validity)
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Reproducibility spectrum
“Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
Using markdown and
version control in R Studio:
Documenting analysis
• Optimal: the instructions should be an automated script
file (ie, “code”)
• Minimum: Written instructions that allow for the complete
reproduction of your analysis
Raw
Data
Processed
Data
Results
Cleaning/
processing
Analysis
Research is repetitive
• Replication
• Same assay, different samples
• Longitudinal experiments
http://xkcd.com/242/
Doing things by hand is…
• Slow
• Hard to document
• Hard to repeat
Exercise 1: Making graphs
• Download height graph XLS: http://bit.ly/2ooMyaO
• Describe how to make a bar graph in excel
Automation
• Fast
• Easy to document
• Easy to repeat
• Write scripts or save log files
Example: Making graphs
• Use a script: http://bit.ly/2oXOZUH
Details to record for processing/analysis
• What software was used? (R Studio, script)
• Does it support log files/scripts? (yes!)
• What version # and settings were used? (R version 3.3.2)
• What else does the software need to run?
• Computer architecture
• OS/Software/tool/add ons (libraries/packages)
• External databases
In R: the sessionInfo() command
Intuitive
version
control
http://phdcomics.com/comics.php?f=1531
Version Control
• A system that records changes to a
file or set of files over time so that you
can recall specific versions later
• https://git-scm.com/doc
• Saves EVERYTHING
• Records who made what change
• Identifies conflicting changes
• Not just for code
Make an empty GitHub
• Github.com
Tell R Studio where git is
*if you don’t’ know where git is installed, try
• Mac terminal: which git
• PC: where git.exe
Make a new version control project
*auto-recognizes version control if you select an existing directory
Select git, select options
Clone
Remote
Local
Add files
• Check boxes to add files to
the “staging area”
• Gets them ready to be added
(committed) to the repository
Staging area vs. repository
Commit files
• Click “commit”
• Write a commit message
• Click “commit”
• Adds the changes to the
repository
Staging area vs. repository
Exercise 2
• Make some changes to the script file in your repo
• add the changes by clicking the checkbox
• Create a new file – don’t click the checkbox next to it
• Commit the changes
• Push the changes to the remote
• Does the new file appear in your remote git repository
Push to remote
Push
Literate programming
=
human readable (text)
+
machine readable (code)
R markdown
R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
Install knitr and markdown packages
• Tools > install packages
• Enter the package name
(will autocomplete)
• Knitr
• Rmarkdown
• TeX (if you want to knit to
PDF)
• OR
install.packages("knitr”)
Open/Create a markdown document
• Make your own Markdown
doc
• Download graph markdown
doc http://bit.ly/2p89drX
Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables
Embed code
• Inline – Use variables in
the human readable text
• `r 2 + 2`
• Code chunks - Include
working code that
generates output
• ```{r}
• #Code goes here
• ```
• Display Options –
Render
• Won’t render unless the code runs
with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires TeX
• Knit Word
Exercise 3
• Open a new markdown document
• Use the formatting commands from the R Markdown cheat
sheet to create a bulleted list that looks like the content of this
slide.
• https://www.rstudio.com/wp-
content/uploads/2015/02/rmarkdown-cheatsheet.pdf
SessionInfo()
Scripts
R markdown
Hypothesis
Raw
data
Experimental
design
Tidy
Data
ResultsArticle
Cleaning
Analysis
Open Data
Code
The research cycle
Version
Control
+
Scripts
Reproducible research checklist
• Think about the entire pipeline: are all the pieces reproducible?
• Is your cleaning/analysis process automated?– guarantees reproducibility
• Are you doing things “by hand”? editing tables/figures; splitting/reformatting data
• Does your software support log files or scripts?
• If no, do you have a detailed description of your process?
• Are you using version control?
• Are you keeping track of your software?
• Computer architecture;
• OS/Software/tool/add ons (libraries/packages)/external databases
• version numbers for everything (when available)
• Are you saving the right files?: if it’s not reproducible, it’s not worth saving
• Save the data and the code
• Data + Code = Output
• Are your reports human and machine readable?
Adapted from: https://github.com/DataScienceSpecialization/courses/blob/master/05_ReproducibleResearch/Checklist/Reproducible%20Research%20Checklist.pdf
Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Software Carpentry Reproducible Scientific Analysis Lesson:
http://swcarpentry.github.io/r-novice-gapminder/
• Git with R tutorial: http://happygitwithr.com/
• R markdown cheatsheet: https://www.rstudio.com/wp-
content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Mais conteúdo relacionado

Mais procurados

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyrC. Tobin Magle
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenRevolution Analytics
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked DataKaren Estlund
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysisthinrhino
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talkrtelmore
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
Dancing in R
Dancing in RDancing in R
Dancing in RSimon Roy
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharo
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class TutorialAshwin Dinoriya
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netStephen Lorello
 

Mais procurados (20)

Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked Data
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Unit 3
Unit 3Unit 3
Unit 3
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Dancing in R
Dancing in RDancing in R
Dancing in R
 
Arakno
AraknoArakno
Arakno
 
Unit 5
Unit 5Unit 5
Unit 5
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
PharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus DenkerPharoDAYS 2015: Pharo Status - by Markus Denker
PharoDAYS 2015: Pharo Status - by Markus Denker
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
OpenRefine Class Tutorial
OpenRefine Class TutorialOpenRefine Class Tutorial
OpenRefine Class Tutorial
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 

Semelhante a Reproducible research

Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practiceC. Tobin Magle
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning processDenis Dus
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroMohammad Shaker
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Josh Levy-Kramer
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
Enter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscopeEnter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscopeKamil Samigullin
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Rainer Stropek
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software DevelopmentAlexis Seigneurin
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Bruno Capuano
 
Data Visualizations with ggplot2
Data Visualizations with ggplot2Data Visualizations with ggplot2
Data Visualizations with ggplot2Ryan Harrington
 
IS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptxIS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptxNaglaaAbdelhady
 

Semelhante a Reproducible research (20)

Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
1
11
1
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning process
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - Intro
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Top 10 python ide
Top 10 python ideTop 10 python ide
Top 10 python ide
 
Enter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscopeEnter Cookbook: refactoring under a microscope
Enter Cookbook: refactoring under a microscope
 
Tech talks#6: Code Refactoring
Tech talks#6: Code RefactoringTech talks#6: Code Refactoring
Tech talks#6: Code Refactoring
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
US23_Daniels.pptx
US23_Daniels.pptxUS23_Daniels.pptx
US23_Daniels.pptx
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
 
Architecture presentation 4
Architecture presentation 4Architecture presentation 4
Architecture presentation 4
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?Que nos espera a los ALM Dudes para el 2013?
Que nos espera a los ALM Dudes para el 2013?
 
Data Visualizations with ggplot2
Data Visualizations with ggplot2Data Visualizations with ggplot2
Data Visualizations with ggplot2
 
IS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptxIS - section 1 - modifiedFinal information system.pptx
IS - section 1 - modifiedFinal information system.pptx
 

Mais de C. Tobin Magle

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using RC. Tobin Magle
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan LibraryC. Tobin Magle
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementC. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesC. Tobin Magle
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 

Mais de C. Tobin Magle (11)

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Open access day
Open access dayOpen access day
Open access day
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Último

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Último (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Reproducible research

  • 1. Reproducible research C. Tobin Magle, PhD 06-26-2017 10:00 – 11:30 a.m. Morgan Library Computer Classroom 175
  • 2. Outline • What is reproducible research? • Why would I do that? • How? (in R Studio) • Automation • Git • R Markdown
  • 3. Hypothesis Data Experimental design ResultsArticle The research cycle 1. Technological advances: • Huge, complex digital datasets • Ability to share 2. Human Error: • Poor Reporting • Flawed analyses Complications
  • 5. Reproducible research is the practice of distributing all data, software source code, and tools required to reproduce the results discussed in a research publication. https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
  • 6. Reproducible research = Data (with metadata) + Code/Software
  • 9. Replication vs. Reproducibility • Replication: Same conclusion new study (gold standard) • “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225 • Replication isn’t always feasible: too big, too costly, too time consuming, one time event, rare samples • Reproducibility: Same results from same data and code (minimum standard for validity) • “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 10. Reproducibility spectrum “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
  • 11. Using markdown and version control in R Studio:
  • 12. Documenting analysis • Optimal: the instructions should be an automated script file (ie, “code”) • Minimum: Written instructions that allow for the complete reproduction of your analysis Raw Data Processed Data Results Cleaning/ processing Analysis
  • 13. Research is repetitive • Replication • Same assay, different samples • Longitudinal experiments http://xkcd.com/242/
  • 14. Doing things by hand is… • Slow • Hard to document • Hard to repeat
  • 15. Exercise 1: Making graphs • Download height graph XLS: http://bit.ly/2ooMyaO • Describe how to make a bar graph in excel
  • 16. Automation • Fast • Easy to document • Easy to repeat • Write scripts or save log files
  • 17. Example: Making graphs • Use a script: http://bit.ly/2oXOZUH
  • 18. Details to record for processing/analysis • What software was used? (R Studio, script) • Does it support log files/scripts? (yes!) • What version # and settings were used? (R version 3.3.2) • What else does the software need to run? • Computer architecture • OS/Software/tool/add ons (libraries/packages) • External databases
  • 19. In R: the sessionInfo() command
  • 21. Version Control • A system that records changes to a file or set of files over time so that you can recall specific versions later • https://git-scm.com/doc • Saves EVERYTHING • Records who made what change • Identifies conflicting changes • Not just for code
  • 22. Make an empty GitHub • Github.com
  • 23. Tell R Studio where git is *if you don’t’ know where git is installed, try • Mac terminal: which git • PC: where git.exe
  • 24. Make a new version control project *auto-recognizes version control if you select an existing directory
  • 27. Add files • Check boxes to add files to the “staging area” • Gets them ready to be added (committed) to the repository
  • 28. Staging area vs. repository
  • 29. Commit files • Click “commit” • Write a commit message • Click “commit” • Adds the changes to the repository
  • 30. Staging area vs. repository
  • 31. Exercise 2 • Make some changes to the script file in your repo • add the changes by clicking the checkbox • Create a new file – don’t click the checkbox next to it • Commit the changes • Push the changes to the remote • Does the new file appear in your remote git repository
  • 33. Push
  • 34. Literate programming = human readable (text) + machine readable (code) R markdown
  • 35. R Markdown • Open • Write • Embed • Render https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 36. Install knitr and markdown packages • Tools > install packages • Enter the package name (will autocomplete) • Knitr • Rmarkdown • TeX (if you want to knit to PDF) • OR install.packages("knitr”)
  • 37. Open/Create a markdown document • Make your own Markdown doc • Download graph markdown doc http://bit.ly/2p89drX
  • 38. Write: useful syntax • Plain text • *italics* -> italics • **bold** -> bold • #Header -> Header (more # decreases size) • Can also draw: • Insert pictures • Ordered and unordered list • Tables
  • 39. Embed code • Inline – Use variables in the human readable text • `r 2 + 2` • Code chunks - Include working code that generates output • ```{r} • #Code goes here • ``` • Display Options –
  • 40. Render • Won’t render unless the code runs with no errors • You know it should be reproducible • Render using the knit function • Output Formats • Knit HTML • Knit PDF – requires TeX • Knit Word
  • 41. Exercise 3 • Open a new markdown document • Use the formatting commands from the R Markdown cheat sheet to create a bulleted list that looks like the content of this slide. • https://www.rstudio.com/wp- content/uploads/2015/02/rmarkdown-cheatsheet.pdf
  • 43. Reproducible research checklist • Think about the entire pipeline: are all the pieces reproducible? • Is your cleaning/analysis process automated?– guarantees reproducibility • Are you doing things “by hand”? editing tables/figures; splitting/reformatting data • Does your software support log files or scripts? • If no, do you have a detailed description of your process? • Are you using version control? • Are you keeping track of your software? • Computer architecture; • OS/Software/tool/add ons (libraries/packages)/external databases • version numbers for everything (when available) • Are you saving the right files?: if it’s not reproducible, it’s not worth saving • Save the data and the code • Data + Code = Output • Are your reports human and machine readable? Adapted from: https://github.com/DataScienceSpecialization/courses/blob/master/05_ReproducibleResearch/Checklist/Reproducible%20Research%20Checklist.pdf
  • 44. Need help? • Email: tobin.magle@colostate.edu • Data Management Services website: http://lib.colostate.edu/services/data-management • Software Carpentry Reproducible Scientific Analysis Lesson: http://swcarpentry.github.io/r-novice-gapminder/ • Git with R tutorial: http://happygitwithr.com/ • R markdown cheatsheet: https://www.rstudio.com/wp- content/uploads/2015/02/rmarkdown-cheatsheet.pdf