Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30
PDF version: http://www.slideshare.net/soilandreyes/taverna-workflow-management-system-2010-1130-bath-workflow-tools
http://taverna.org.uk/
http://www.ukoln.ac.uk/events/devcsi/workflow_tools/programme/index.html
http://devcsi.ukoln.ac.uk/
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
1. Stian Soiland-Reyes
myGrid, School of Computer Science
University of Manchester, UK
UKOLN DevSci: Workflow Tools
Bath, 2010-11-30Gridmy
http://taverna.org.uk/
2. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
What is myGrid?
An e-Science Collaboration Since 2001
Not a grid!
Numerous partners involved:
University of Manchester
University of Southampton
University of Oxford
EMBL-EBI
Provides sustainable and production quality software
Supported by OMII-UK, EPSRC and BBSRC
Mixture of developers, bioinformaticians and
researchers
Software | Services | Content | Skills | Community
6. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Manual: disadvantages
• Scale of analysis task overwhelms researchers
– lots of data
• User bias and premature filtering of datasets –
cherry picking
• Hypothesis-Driven approach to data analysis
• Constant changes in data - problems with re-
analysis of data
• Implicit methodologies (hyper-linking through
web pages)
• Error proliferation from any of the listed issues
– notably human error
7. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Web services and workflows
Web services
Technology and standards for exposing code and
data resources that can be programmatically
consumed by a remote third party
Description on how to interact with the service,
parameters, documentation
Workflows
General technique for describing and executing a
process
Describe what you want to do running which
services
8. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Taverna workflows
A set of (local and remote)
services to analyze or manage
data
Nested workflows are also
services
Data-links connects services
i.e. output from service A is input to
service B and C
Describes the desired dataflow
instead of process coordination
Automatic iterations
Can customize list handling and
control links
Get_pathways
Workflow Inputs
Workflow Outputs
Workflow Inputs
Workflow Outputs
remove_uniprot_duplicates
merge_uniprot_ids
species
getcurrentdatabase
kegg_pathway_release
binfo
regex_2
split_for_duplicates
split_for_duplicate_pathways
remove_duplicate_kegg_genes
merge_genes_and_pathways_3
flatten_pathway_files
merged_pathways
merge_genes_and_pathways
merge_genes_and_pathways_2
merge_kegg_references
kegg_external_gene_reference
remove_pathway_duplicates
merge_pathway_desc
merge_pathway_list_1
merge_pathway_list_2
remove_duplicate_ids
merge_patwhay_ids
pathway_descriptions
merge_reports
report
merge_gene_desc
remove_nulls_3
gene_descriptions
gene_ids
REMOVE_NULLS_2
remove_entrez_duplicates
merge_entrez_genes
remove_pathway_nulls
remove_Nulls
concat_kegg_genes
split_gene_ids
remove_pathway_nulls_2
add_uniprot_to_string
gene_descriptions pathway_descriptions
add_ncbi_to_string
Kegg_gene_ids_2
pathway_ids
Kegg_gene_ids
genes_in_qtl
mmusculus_gene_ensembl
create_report
ensembl_database_releasegenes_pathways kegg_pathway_release
Merge_pathways
concat_ids
pathway_ids
regex
split_by_regex
lister
Merge_gene_pathways
pathway_genes
concat_gene_pathway_ids
get_pathways_by_genes1
chromosome_namestart_position end_position
9. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
What types of services?
Public/private/secured WSDL/SOAP web services
RESTful web services
Spreadsheet import
Command line tools (local/ssh)
Inline scripts (Beanshell, R)
Java APIs
Customizations:
BioMart, BioMoby / SADI
Soaplab
Grid services (Globus, EGEE gLite, caGrid)
… your tool (Plugin tutorial on wiki)
10. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Which services?
Taverna is general, can connect to standard
web services for any domain
Bioinformatics:
From professional third-party organisations
providing robust & open data/analysis services
..to under-the-desk web services for one particular
purpose, ran by PhD students
http://biocatalogue.org/ - 1730 services from 130
providers – crowd sourced and quality monitored
17. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Extensible UI and engine
Plugins can provide new “perspectives”
i.e.: BioCatalogue, myExperiment
Provide service-specific customization
BioMart interface replicates web site
Adding new functionality
Looping, branching, dynamic service resolution
New service types
Design helpers, “Find matching service”
18. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Taverna 3 “Next-gen”
Under development for 2011
Interactive, component-centric and data-centric
workflow design
Pre-packaged workflow components
Searching for workflow components from
BioCatalogue and myExperiment
New myGrid workflow components library
24. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Taverna on the cloud
Use-case:
SNP analysis and annotation of
genome sequenced from
breeds of cows in Africa – why are
some of them resistent to X?
Amazon EC2 with Taverna Server and local
services
Custom (built-in-a-week) Ruby on Rails web
interface
Runs through 31 chromosomes in 6.5 hours
using 10 instances - $26
26. Gridmy
http://taverna.org.uk/http://mygrid.org.uk/
Open source, open development
Taverna suite of tools are all open source
and free to use
Large user community, active mailing lists
Lead developers: myGrid in Manchester
Contributors from across the world
PAL programme
myGrid provides training, tutorials and
documentation