This presentation provides some technical details on the function of the Galaxy toolshed. It was prepared for a group (Biobix at UGent), during my previous job.
4. Toolshed: get your own?...
●
Code is part of main distribution
●
./run_tool_shed.sh
●
Very easy to have it run locally...
5. Code is shared through hg
Galaxy main code on Bitbucket
hg
Galaxy
server Toolshed+
6. Toolshed: run your own?
●
Toolshed is completely separate
process to Galaxy
●
Uses it's own pg database: need to
create a new user account
●
Files of toolshed need to be stored
separate next to Galaxy root
7. Sharing a tool is basically simple
Allyouhavetoshareis(ifit'sasimplescript):
tool_conf.xml
tool.pl
ThiscanbedistributedusingtheToolShed
Dependencieshavetobeinstalledseparately
8. Sharing through the toolshed
Galaxy moves to installing everything through the Tool Shed: see
shed_tool_conf.xml
<?xml version="1.0"?>
<toolbox tool_path="/shed_tools">
<section id="textutil" name="Text Manipulation" version="">
<tool
file="/shed_tools/toolshed.g2.bx.psu.edu/repos/bjoern-
gruening/sed_wrapper/e850a63e5aed/sed_wrapper/sed.xml"
guid="toolshed.g2.bx.psu.edu/repos/bjoern-
gruening/sed_wrapper/sed_stream_editor/0.0.1">
<tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
<repository_name>sed_wrapper</repository_name>
<repository_owner>bjoern-gruening</repository_owner>
<installed_changeset_revision>e850a63e5aed
</installed_changeset_revision>
<id>toolshed.g2.bx.psu.edu/repos/bjoern-
gruening/sed_wrapper/sed_stream_editor/0.0.1</id>
<version>0.0.1</version>
</tool>
</section>
</toolbox>
9. Tasks of the toolshed
● Communicate with any Galaxy that wants
to install a tool from it (Galaxy admin that
accepts the tool needs to add your
Toolshed)
● Periodically runs functional tests on the
Tools
● Allow people to update the tools
● Codevelop tools
10. Philosophy: task of a tool
Somefunctionalityisencodedredundantlyintools.
Anexampleisvisualisingdata:somecallR,somecallGNUplot.
IreallythinkthatthepreferredoutputofGalaxyneedstobetext.Ananversatilestrongvisualisationtoolcandrawthengraphsas
neededfromtheoutput.
(PNG,PDFandothervisualformatsaresupported.)
BTW:the2differentrepositorytypescomplywiththisview,
11. My original aim...
Prod gal
test gal
Dev 1 Dev 2 Dev 3
Tool Shed
(BITS?)
Update
Offic. gal-dist
12. 'Official' advise
●
Run Galaxy and toolshed locally
●
Develop your tool in your local Galaxy
●
If everything runs, wrap it up as .tar
●
Upload everything to Toolshed of your
choice.
●
Test download in a test Galaxy from the
Toolshed
●
Debug...
Do not use the toolshed
As a development environment
13. All code is shared through hg
Galaxy main code on Bitbucket
hg
Galaxy
server
Toolshed
server+
14. All code is shared through hg
Galaxy main code on Bitbucket
hg
Galaxy
server
Toolshed
server+
FancyTool (hg repo)
SuperTool (hg repo)
PowerTool (hg repo)
Your uploaded .tar balls
22. What is mercurial?
1. keep track of the changes YOU do on your files, scripts, folders,...
joachim@joachim-laptop:~/Projects/hgprojects$ hg log
changeset: 2:726fa53bcd7d
tag: tip
user: Joachim Jacob <joachim.jacob@gmail.com>
date: Fri Nov 16 11:24:09 2012 +0100
summary: Third change, playing with copy and remove
changeset: 1:744894cb4ee6
user: Joachim Jacob <joachim.jacob@gmail.com>
date: Fri Nov 16 11:09:49 2012 +0100
summary: I have added a small change to hello.txt
changeset: 0:b84e0105967f
user: Joachim Jacob <joachim.jacob@gmail.com>
date: Fri Nov 16 11:08:01 2012 +0100
23. What is mercurial?
You can go back to a previous revision (e.g. hg update 2).
You can do some changes to the files (creating multiple heads)
“head”
“head”
24. What is mercurial?
You can go back to a previous revision.
You can do some changes to the files.
joachim@joachim-laptop:~/Projects/hgprojects$ hg update 1
1 files updated, 0 files merged, 3 files removed, 0 files unresolved
joachim@joachim-laptop:~/Projects/hgprojects$ nano hello.txt
joachim@joachim-laptop:~/Projects/hgprojects$ hg commit -m "Bug
fix"
created new head
joachim@joachim-laptop:~/Projects/hgprojects$ hg summary
parent: 3:2d1d80bd0124 tip
Bug fix
branch: default
commit: (clean)
update: 1 new changesets, 2 branch heads (merge)
25. What is mercurial?
When done a change, you can merge the heads
together again in one tip.
joachim@joachim-laptop:~/Projects/hgprojects$ hg merge
merging hello.txt and another.txt to another.txt
merging hello.txt and mvtest.txt to mvtest.txt
1 files updated, 2 files merged, 0 files removed, 0 files
unresolved
(branch merge, don't forget to commit)
“merge”
26. What is mercurial?
When done a change, you can merge the heads
together again in one tip.
joachim@joachim-laptop:~/Projects/hgprojects$ hg
commit -m 'Commit the bug fix permanently'
“commit”
In case of conflicts, use 'hg resolve --list' to view the conflicting
files. Fix them by hand.
27. What is mercurial?
1. keep track of the changes YOU do on your files, scripts, folders,...
2. clone your working directory to a new directory (e.g. to work on
another feature).
“clone”
28. What is mercurial?
You can compare two different repositories with incoming.
If you want to merge the changes, you can use pull.
“incoming”
29. What is mercurial?
You can compare two different repositories with incoming.
If you want to merge the changes, you can use pull.
“pull”
30. What is mercurial?
You can compare two different repositories with incoming.
If you want to merge the changes, you can use pull.
“merge”
32. What is mercurial?
So, in your directory,
OR you change/add yourself files
OR mercurial does this for you (during a merge) (undo with 'rollback')
Both need to be followed by a commit.
33. What is mercurial?
1. keep track of the changes YOU do on your files, scripts, folders,...
2. clone your working directory to a new directory (e.g. to work on
another feature).
3. Share changes with other users.
34. Sharing in mercurial?
The directories might be located
- on local directories:
- on your intranet (hg serve):
- on the internet:
You can also export a commit, send it through email, and import it.
You can also set up an push repository online on BitBucket.
“pull /path/to/directory”
“pull http://10.10.10.100:8000”
“pull hg clone http://joachim@toolshed.bits.vib.be/repos/joachim/clcaligner
38. Getting ready for Galaxy development
How I develop for Galaxy:
template
Set tool
name Toolshed
upload
hg clone
Dev Galaxy
hg push
39. Getting ready for Galaxy development
And the last step:
template
Set tool
name Toolshed
upload
hg clone
Dev Galaxy
hg push
Galaxy.bits.vib.be
40. How I develop for Galaxy:
- you need a personal Galaxy (hg clone …)
- you might use a Toolshed repository
1. Get a template (right): a
tar ball with some files.
Getting ready for Galaxy development
● README
● tool_data_table_conf.xml.sample
● tool_dependencies.xml
● tool_indices.loc.sample
● tool_wrapper_template.pl
● tool_wrapper.xml
41. 2. Rename the files:
- replace 'tool' with your tool name
[galaxy@joagal razers]$ ls
razers3_wrapper.xml README
tool_data_table_conf.xml.sample
tool_indices.loc.sample
tool_wrapper_template.pl
Getting ready for Galaxy development
42. 3. Edit the wrapper.xml: the <tool> section.
Getting ready for Galaxy development
43. 4. Pack again everything in a tarball and upload to the test
Toolshed in a new repository
Getting ready for Galaxy development
44. 4. Pack again everything in a tarball and upload to the test
Toolshed in a new repository
Getting ready for Galaxy development
45. 5. hg clone your repository to a folder in your development
Galaxy.
Getting ready for Galaxy development
46. 5. hg clone your repository to a folder in your development
Galaxy.
Getting ready for Galaxy development
[galaxy@joagal GalaxyHangar]$ hg clone http://joachim@192.168.10.23
:9009/repos/joachim/fastqseqlen
destination directory: fastqseqlen
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files
updating to branch default
resolving manifests
getting README
getting fastqseqlen.xml
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
47. 5. hg clone your repository to a folder in your development
Galaxy.
Getting ready for Galaxy development
[galaxy@joagal GalaxyHangar]$ cd fastqseqlen/
[galaxy@joagal fastqseqlen]$ ls
fastqseqlen.xml README
[galaxy@joagal fastqseqlen]$
[galaxy@joagal fastqseqlen]$ hg summary
parent: 0:3f22736718ef tip
Uploaded files
branch: default
commit: (clean)
update: (current)
[galaxy@joagal fastqseqlen]$
48. 6. Link the complete directory to a directory under
$GALAXY_HOME/tools/ and make Galaxy aware of it by
modifying tool_conf.xml
Getting ready for Galaxy development
49. 7. (re)start your Galaxy
$ ./run.sh –reload
And check if tool loads:
Getting ready for Galaxy development
50. 8. Get your tools parameters
display straight:
Fill the rest of the tool's XML file.
Add also the loc.file (which contains
your reference data) if needed.
(when modifying the XML, to see
the changes you have to restart
Galaxy. Kill Galaxy and run ./run.sh
--reload again.
Getting ready for Galaxy development
51. 9. Fun! Start developing your tool
Development happens in the
development Galaxy,
committing changes from time to
time (evt. with pushing to
Toolshed)
Starting Galaxy tools development
$ hg commit -m "Alpha version
of RazerS3 wrapper"
$ hg push --debug
$ hg commit -m "Some small
changes"
$ hg push --debug
52. Mercurial credentials should be stored in ~/.hgrc (hgrc.ini
for windows)
[ui]
username = "joachim <joachim.jacob@vib.be>"
verbose=True
[extensions]
hgext.graphlog =
[auth]
bb.prefix =
http://192.168.10.26:9009/repos/joachim/razers
bb.username = joachim
bb.password = ********
Starting Galaxy tools development
53. When development is ready...
Push the last changes to the Galaxy test Toolshed.
Export from the Galaxy Test Toolshed and import in BITS
Toolshed. Install in Galaxy.bits.vib.be
54. When development is ready...
Push the last changes to the Galaxy test Toolshed.
Export from the Galaxy Test Toolshed and import in BITS
Toolshed. Install in Galaxy.bits.vib.be
55.
56. Galaxy manages scripts (tools)
1. Galaxy knows the location of tools, as this is set
in (an) xml file(s)
2. The tool referenced by an xml file can be
- a script that does all calculations by itself
(e.g. bash script, python script,...)
- a script that does calculations by using
3rd
party libraries (e.g. R)
- a script that does calculations by calling a 3rd
party binary
57. 4 different XML files
● integrated_tool_panel.xml - layout of panel
● shed_tool_conf.xml - tools from shed
● tool_conf.xml - tools from install or own
● migrated_tools_conf.xml : tools removed
from tool_conf.xml upon updating.
Noot:dezexmlfileszijnpasinvoegenade laatsteupdate!
58. Galaxy installation directory
● Galaxy is installed as the user galaxy
/home/galaxy/galaxy-dist
● Installation and Version control of this directory is done by
Mercurial (config in .hg directory, file .hgignore to ignore
updating certain files)
● Installation for production required some changes:
PostgresDB, apache serving static content, network
settings, running galaxy as a daemon in the background
http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy
59. Galaxy installation directory
● Galaxy is installed on linux as the user galaxy
in /home/galaxy/galaxy-dist
● Important locations under this directory:
- universe_wsgi.ini → general config file
- *.xml → 'embedding' of tools and types
- tools/ → location of the scripts
- database/ → location of the datasets
http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy
64. The tool XML points to a script
./tools/fasta_tools/fasta_compute_length.py :
#!/usr/bin/env python
"""
Uses fasta_to_len converter code.
"""
import sys
from galaxy.datatypes.converters.fasta_to_len import compute_fasta_length
compute_fasta_length( sys.argv[1], sys.argv[2], sys.argv[3])
Inditgevalvindtdeberekeningplaatsinpythonzelf.Soms
moetenechter3rd
partieslibrariesgeinstalleerdworden.
65. The tool XML points to a binary
#!/usr/bin/env python
"""
Runs BWA on single-end or paired-end data.
Produces a SAM file containing the mappings.
Works with BWA version 0.5.9.
usage: bwa_wrapper.py [options]
See below for options
"""
import optparse, os, shutil, subprocess, sys, tempfile
def stop_err( msg ):
sys.stderr.write( '%sn' % msg )
sys.exit()
def check_is_double_encoded( fastq ):
# check that first read is bases, not one base followed by numbers
bases = [ 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't', 'N' ]
nums = [ '0', '1', '2', '3' ]
for line in file( fastq, 'rb'):
if not line.strip() or line.startswith( '@' ):
66. Options for building interfaces
Overviewofthetagson
http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax
Theparameterstoconstructtheinterfaceareplacedwithin
<input> </input>tags
Thetagsyouuseinthe<input>sectiondefinealotthesyntax
touseinothertagsets,suchas<output>,and<command>
BASICUSE
<param name=[param_name]type=text value=”default”
label=”Explanationoftheparameter”help=”help”/>
e.g.
67. Select a dataset from history
Ifthetypeofinput=”data”,adropdownlistofhistoryitemsappear.
Theacceptedformatshouldbeincludedasformat=”format”.
<param name="input"type="data"
format="tabular"label="Dataset"/>
68. Choose from a list
<param name="detection_thresh" type="select"
multiple="true" label="Detection thresholds">
<option value="0.001">0.001</option>
<option value="0.002">0.002</option>
<option value="0.003">0.003</option>
<option value="0.004">0.004</option>
</param>
69. Select reference data
<param name="indices"
type="select" label="Select a reference genome">
<options from_data_table="bwa_indexes">
<filter type="sort_by" column="2" />
<validator type="no_options" message="No
indexes are available" />
</options> <!-- is not option -->
</param>
Forsometoolsindexeddatacanbemade
available(e.g.BLAST,NGSmappers,…). Topass
indexedsets,theycanbereferencedtoby
tool_data_table_conf.xml:theypointto
./tool_data/<toolname>.loc files
75. How to integrate a tool?
Youhave:ascriptthatacceptsparametersandwritestheresultstoatextfile.
TODO
1.putyourscriptin~/galaxy-dist/tools/mytools/
2.inthatdirectory,createamytool.xmlfile,pointingtothattool,withalltagsetssetcorrectly.
3.in~/galaxy-dist/tool_conf.xml,enteralinewithyourtoolxmlfile
4.restartgalaxy:#service galaxyd restart
(4'.optional:changethelocationofyourtoolinintegrated_tool_panel.xml andrestartagain)
5.There'sthemagic.Enjoyyourtool!
81. Tool_dependencies.xml
1, define a dependency as repository of a toolshed
containin a tool dependency definition type
2, or write directly in the tool_dependencies.xml the
instructions to install the dependency, and make it
available system wide.
Galaxy aims to be platform independent, so A HELL
OF A JOB.
http://wiki.galaxyproject.org/ToolShedToolFeatures#Automatic_third-party_tool_dependency_installa
85. Lesson 1
It pays of to use / build on repositories started by
others.
86. The problem is the testing
1, build your tool and make it work in your galaxy
2, define your dependencies
3, search the (test)toolshed for repositories you can
use – tool dependency definitions (“just installing
packages, without providing an interface”).
4, put them as requirements in your tool.xml
5, the ones you do not find: decide whether to create
a separate tool dependency definition and integrate
them
OR
5' add them to your dependencies.xml file.
6' Update/Load to a Toolshed
7' Fire up a test Galaxy, and plug the tool in to see
whether it works.
87. The problem is the testing
You might consider a virtual test machine e.g. In
Virtualbox.
1, install your OS
2, fetch galaxy
3, set the universe_wsgi.ini ready (admin, location,...)
4, plug in your repository
5, SNAPSHOT your machine
6, graphically install your tool
7, define what went wrong
7` update the repository
7`` and restore the snapshot
8, interate until SUCCESS!
88. Tool dependencies
Dependencies
IGENOMES (http://support.illumina.com/sequencing/sequencing_software/igenome.ilmn)
gtf file:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf
reference whole genome sequence:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/
reference chromosome sequences:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes/
PHIX-control sequences:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/AbundantSequences/phix.fa
TopHat2 (Bowtie2) and STAR indexes:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex
Chr size file:
$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/ChromInfo.txt
Binaries
STAR (https://code.google.com/p/rna-star/)
TOPHAT2 (http://tophat.cbcb.umd.edu/)
BLASTP ( ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) or USEARCH (http://www.drive5.com/usearch/download.html)
R (http://www.r-project.org/)
SAMTOOLS (http://sourceforge.net/projects/samtools/files/samtools/)
GATK (http://www.broadinstitute.org/gatk/download)
PICARD (http://sourceforge.net/projects/picard/files/picard-tools/)
SQLITE3 (http://www.sqlite.org/download.html)
Custom Ensembl SQLite DB
tables included:
coord_system
exon_transcript
intergene (made by the intergenic TIScalling script based on gene)
transcript
exon
gene
data