SlideShare a Scribd company logo
1 of 24
Bioinformatic Alchemy 101
  Transmuting dark script
  matter into reusable tools

         Ross Lazarus
           BakerIDI
                          1
Context: bioinformatic analyses
   Big data; complex analyses
   Repeatable, automated pipelines
   Reproducibility real goal
   Reproducibility is hard




                                 2
Frameworks
   Eg VGL
   Local SOPs for biologists
   Tools, canned workflows
   Minimise opportunities for error
   Maximise reproducibilty


                                 3
In real life
   90/10 rule
   Need to tweak SOPs
   Trivial 'disposable' scripts
   Not documented or curated
   Not reliably available to re-run
   “Dark script matter”
                                  4
Dark Script Matter
   Outside usual VCS/pipelines
   Manual =/= reproducible
   Necessary evil?
   Platform extensions complex
   Eg Galaxy – hours of work


                              5
Plan
   Context: Reproducible analyses
   Frameworks vs Dark Scripts
   Alchemy: script to Galaxy
    tool
   Demonstration
   Summary
   Conclusions
                                     6
Galaxy Tool Factory
   An installable Galaxy tool
   Runs scripts: Python,R,Perl,sh
   Generates new Galaxy tools
   Tool code wraps the script
   Minutes – not hours

                               7
Galaxy Tool Shed
   Separate server
   Stores/serves Galaxy tools
   Admin can install to Galaxy
   Mercurial VCS archives
   Explicit tool versioning
   Sharing and reproducibility
                             8
Demo 1: Install the Tool Factory
Demo 2: Create a new tool
Demo 3: Quick install and test
Prepare script
   Python; R; Perl; Sh
   Parse CL params – 1=in, 2=out
   Typically workflow transformations
   Arbitrary complexity
   Simple example
   Write transpose of a tabular file

                                   14
Prepare/upload test data
   SMALL sample input
   Becomes functional test case

    h1 h2   h3  h4
    r11 r12 r13 r14
    r21 r22 r23 r24




                                   15
# R transpose a tabular input file and write as
# a tabular output file
ourargs = commandArgs(TRUE)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='t')
outp = t(inp)
write.table(outp,outf,quote=FALSE, sep="t",row.names=F,col.names=FALSE)


                                                               16
Demo part 1
As an admin, test run the code




                                 17
Use Redo button; Generate
   When working right
   Use Redo to save retyping
   Select Generate option
   Provide tool ID, help text
   Execute
   Expect a toolfactory.gz in history
   Copy link (floppy disk icon)
                                     18
What's in the toolshed.gz ?
   A gzip'd mercurial tool repository (!)
   Auto generated tool XML file
   Auto generated tool python wrapper
   Functional test case - the sample data
   Familiar Galaxy tool for all users
   Executes your script over their data
   Interoperably inside Galaxy

                                             19
Upload TS gzip to new repository

    Upload to any tool shed
    Create new repo; sensible name!
    Choose Upload files to new repo
    Paste URL (floppydisk save icon)
    New tool ready to install



                                   20
Install and Test New Tool
   Back to Galaxy admin interface
   Browse local tool shed
   Choose new tool
   Install to local Galaxy
   Try it out
   Run functional test

                                     21
Summary
   GTF = script to tool in minutes
   Integrated with Galaxy and TS
   Simple workflow components
   If needed, generate simple tool
   Then add parameters manually



                                      22
Tool Factory Operation Guide
                        Galaxy              Install new tool from toolshed
   Script               Tool Factory          from Galaxy admin page;
 (Python,R,              Tool Form;              Test; Functional test;
  perl, sh)             Paste script;


  Upload/paste
Sample Input for        Test run;             Create new repository.
 functional test      Check outputs;        Upload files – paste TS gzip
                        Rerun/fix;               link and upload




                     Generate TS gzip;
                   Copy download link for         Tool Shed
                          pasting

                                                              23
GALAXY
http://usegalaxy.org
                       24
Galaxy Tool Factory
Generate a new Galaxy tool
 From a python, R, Perl or bash script
      Using a Galaxy write as a tabular output file
 # transpose a tabular input file and
                                      tool
           Via a Tool Shed
 ourargs = commandArgs(T)
 inf = ourargs[1]
 outf = ourargs[2]
 inp = read.table(inf,head=F,row.names=NULL,sep='t')
 outp = t(inp)
 write.table(outp,outf,quote=F, sep="t",row.names=F,col.names=F)




                                                                    25
Tool Factory Operation Guide
                        Galaxy              Install new tool from toolshed
Script – R,             Tool Factory          from Galaxy admin page;
perl, python             Tool Form;              Test; Functional test;
                        Paste script;


  Upload/paste
Sample Input for        Test run;             Create new repository.
 functional test      Check outputs;        Upload files – paste TS gzip
                        Rerun/fix;               link and upload




                     Generate TS gzip;
                   Copy download link for         Tool Shed
                          pasting

                                                              26

More Related Content

What's hot

What's hot (20)

Madrid JAM limitaciones - dificultades
Madrid JAM limitaciones - dificultadesMadrid JAM limitaciones - dificultades
Madrid JAM limitaciones - dificultades
 
Managing large scale projects in R with R Suite
Managing large scale projects in R with R SuiteManaging large scale projects in R with R Suite
Managing large scale projects in R with R Suite
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Golang
GolangGolang
Golang
 
Seattle useR Group - R + Scala
Seattle useR Group - R + ScalaSeattle useR Group - R + Scala
Seattle useR Group - R + Scala
 
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
 
Leiningen
LeiningenLeiningen
Leiningen
 
Mod06 new development tools
Mod06 new development toolsMod06 new development tools
Mod06 new development tools
 
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
PuppetConf 2016: Enjoying the Journey from Puppet 3.x to 4.x – Rob Nelson, AT&T
 
Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1
 
Python to scala
Python to scalaPython to scala
Python to scala
 
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...
 
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
 
Common Workflow Language (CWL) - George Carvalho
Common Workflow Language (CWL) -  George CarvalhoCommon Workflow Language (CWL) -  George Carvalho
Common Workflow Language (CWL) - George Carvalho
 
Python testing like a pro by Keith Yang
Python testing like a pro by Keith YangPython testing like a pro by Keith Yang
Python testing like a pro by Keith Yang
 
Reproducibility with Checkpoint & RRO
Reproducibility with Checkpoint & RROReproducibility with Checkpoint & RRO
Reproducibility with Checkpoint & RRO
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
2020.02.15 DelEx - CI/CD in AWS Cloud
2020.02.15 DelEx - CI/CD in AWS Cloud2020.02.15 DelEx - CI/CD in AWS Cloud
2020.02.15 DelEx - CI/CD in AWS Cloud
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
 

Similar to Toolfactory foam mar21_2013

Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...
Ivo Jimenez
 

Similar to Toolfactory foam mar21_2013 (20)

MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
Js tacktalk team dev js testing performance
Js tacktalk team dev js testing performanceJs tacktalk team dev js testing performance
Js tacktalk team dev js testing performance
 
Puppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction KitPuppet Systems Infrastructure Construction Kit
Puppet Systems Infrastructure Construction Kit
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Development Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesDevelopment Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP Libraries
 
Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
How to deploy spark instance using ansible 2.0 in fiware lab v2
How to deploy spark instance using ansible 2.0 in fiware lab v2How to deploy spark instance using ansible 2.0 in fiware lab v2
How to deploy spark instance using ansible 2.0 in fiware lab v2
 
How to Deploy Spark Instance Using Ansible 2.0 in FIWARE Lab
How to Deploy Spark Instance Using Ansible 2.0 in FIWARE LabHow to Deploy Spark Instance Using Ansible 2.0 in FIWARE Lab
How to Deploy Spark Instance Using Ansible 2.0 in FIWARE Lab
 
RichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
 
Building a Serverless Computation Environment with Python
Building a Serverless Computation Environment with PythonBuilding a Serverless Computation Environment with Python
Building a Serverless Computation Environment with Python
 
Learn enough Docker to be dangerous
Learn enough Docker to be dangerousLearn enough Docker to be dangerous
Learn enough Docker to be dangerous
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Automated User Tests with Apache Flex
Automated User Tests with Apache FlexAutomated User Tests with Apache Flex
Automated User Tests with Apache Flex
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShell
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
"Modern DevOps & Real Life Applications. 3.0.0-devops+20230318", Igor Fesenko
"Modern DevOps & Real Life Applications. 3.0.0-devops+20230318", Igor Fesenko "Modern DevOps & Real Life Applications. 3.0.0-devops+20230318", Igor Fesenko
"Modern DevOps & Real Life Applications. 3.0.0-devops+20230318", Igor Fesenko
 
PaaSing Your Code Around
PaaSing Your Code AroundPaaSing Your Code Around
PaaSing Your Code Around
 

Toolfactory foam mar21_2013

  • 1. Bioinformatic Alchemy 101 Transmuting dark script matter into reusable tools Ross Lazarus BakerIDI 1
  • 2. Context: bioinformatic analyses  Big data; complex analyses  Repeatable, automated pipelines  Reproducibility real goal  Reproducibility is hard 2
  • 3. Frameworks  Eg VGL  Local SOPs for biologists  Tools, canned workflows  Minimise opportunities for error  Maximise reproducibilty 3
  • 4. In real life  90/10 rule  Need to tweak SOPs  Trivial 'disposable' scripts  Not documented or curated  Not reliably available to re-run  “Dark script matter” 4
  • 5. Dark Script Matter  Outside usual VCS/pipelines  Manual =/= reproducible  Necessary evil?  Platform extensions complex  Eg Galaxy – hours of work 5
  • 6. Plan  Context: Reproducible analyses  Frameworks vs Dark Scripts  Alchemy: script to Galaxy tool  Demonstration  Summary  Conclusions 6
  • 7. Galaxy Tool Factory  An installable Galaxy tool  Runs scripts: Python,R,Perl,sh  Generates new Galaxy tools  Tool code wraps the script  Minutes – not hours 7
  • 8. Galaxy Tool Shed  Separate server  Stores/serves Galaxy tools  Admin can install to Galaxy  Mercurial VCS archives  Explicit tool versioning  Sharing and reproducibility 8
  • 9. Demo 1: Install the Tool Factory
  • 10. Demo 2: Create a new tool
  • 11. Demo 3: Quick install and test
  • 12. Prepare script  Python; R; Perl; Sh  Parse CL params – 1=in, 2=out  Typically workflow transformations  Arbitrary complexity  Simple example  Write transpose of a tabular file 14
  • 13. Prepare/upload test data  SMALL sample input  Becomes functional test case h1 h2 h3 h4 r11 r12 r13 r14 r21 r22 r23 r24 15
  • 14. # R transpose a tabular input file and write as # a tabular output file ourargs = commandArgs(TRUE) inf = ourargs[1] outf = ourargs[2] inp = read.table(inf,head=F,row.names=NULL,sep='t') outp = t(inp) write.table(outp,outf,quote=FALSE, sep="t",row.names=F,col.names=FALSE) 16
  • 15. Demo part 1 As an admin, test run the code 17
  • 16. Use Redo button; Generate  When working right  Use Redo to save retyping  Select Generate option  Provide tool ID, help text  Execute  Expect a toolfactory.gz in history  Copy link (floppy disk icon) 18
  • 17. What's in the toolshed.gz ?  A gzip'd mercurial tool repository (!)  Auto generated tool XML file  Auto generated tool python wrapper  Functional test case - the sample data  Familiar Galaxy tool for all users  Executes your script over their data  Interoperably inside Galaxy 19
  • 18. Upload TS gzip to new repository  Upload to any tool shed  Create new repo; sensible name!  Choose Upload files to new repo  Paste URL (floppydisk save icon)  New tool ready to install 20
  • 19. Install and Test New Tool  Back to Galaxy admin interface  Browse local tool shed  Choose new tool  Install to local Galaxy  Try it out  Run functional test 21
  • 20. Summary  GTF = script to tool in minutes  Integrated with Galaxy and TS  Simple workflow components  If needed, generate simple tool  Then add parameters manually 22
  • 21. Tool Factory Operation Guide Galaxy Install new tool from toolshed Script Tool Factory from Galaxy admin page; (Python,R, Tool Form; Test; Functional test; perl, sh) Paste script; Upload/paste Sample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 23
  • 23. Galaxy Tool Factory Generate a new Galaxy tool From a python, R, Perl or bash script Using a Galaxy write as a tabular output file # transpose a tabular input file and tool Via a Tool Shed ourargs = commandArgs(T) inf = ourargs[1] outf = ourargs[2] inp = read.table(inf,head=F,row.names=NULL,sep='t') outp = t(inp) write.table(outp,outf,quote=F, sep="t",row.names=F,col.names=F) 25
  • 24. Tool Factory Operation Guide Galaxy Install new tool from toolshed Script – R, Tool Factory from Galaxy admin page; perl, python Tool Form; Test; Functional test; Paste script; Upload/paste Sample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 26