SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).
[2024]Digital Global Overview Report 2024 Meltwater.pdf
SADI for GMOD: Semantic Web Services for Model Organism Databases
1. SADI for GMOD:
Semantic Web Services
for Model Organism
Databases
Ben Vandervalk, Luke McCarthy, Edward
Kawas, Mark Wilkinson
James Hogg Research Centre, Heart + Lung Institute
University of British Columbia
http://code.google.com/p/sadi/wiki/SADIforGMOD
3. Background: Model Organism Databases
• several organisms are studied extensively by
biologists: e.g. yeast, mouse, fruitfly
• each model organism has its own database:
• sequences (DNA, RNA, protein)
• sequence features (e.g. genes)
• research publications
• experimental results
• biochemical pathways
• phenotype images
• evolutionary trees (for closely related
species)
All images were obtained from Wikipedia and are in the public domain.
4. Background: Sequence Features
sequence features (a.k.a. sequence annotations) are regions of
a DNA or protein sequence with a certain type (e.g. 'gene')
in genome browsers, different types of sequence annotations
are displayed in separate tracks
position on DNA sequence
promoter track
gene track
transcript track
Lincoln Stein, http://www.sequenceontology.org/gff3.shtml
5. Background: Sequence Features
Many types of
biological data are
represented as sequence
features:
promoters
chromosome bands
genes
transcripts
CDSs
proteins
protein domains
transposons
non-coding RNAs
ESTs
many more...
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
6. Background: Distributed Annotation System (DAS)
HTTP GET
DAS XML
DAS Server
HTTP GET
DAS XML
DAS Server
HTTP GET
DAS XML
DAS Server
HTTP GET
DAS XML
DAS Server
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
7. Background: Limitations of the
Distributed Annotation System (DAS)
integrating data from DAS servers requires
specialized software (“DAS clients”)
other types of data (e.g. biochemical pathways,
experimental results) cannot be automatically
integrated with sequence feature data
most bioinformatics analysis software (e.g. BLAST)
does not speak DAS
9. SADI for GMOD: Semantic Web Services for Model
Organism Databases
SADI (Semantic Automated Discovery and Integration)
• Standard for Web services that consume/generate RDF
• Motivation: automated integration of bioinformatics data and
software
GMOD (Generic Model Organism Database)
• Toolkit for building a model organism database and
website
• Collection of related open source projects: e.g. Chado,
Gbrowse, Pathway Tools
• Many sites use GMOD components: FlyBase,
BeetleBase, DictyBase, etc.
10. SADI in a Nutshell
• to invoke a SADI service:
o HTTP POST an RDF document to the service URL
o e.g. $ curl --data @input.rdf
http://sadiframework.org/examples/hello
• to get service metadata:
o HTTP GET on service URL
o returns an RDF document with service name, description, etc.
o e.g. $ curl http://sadiframework.org/examples/hello
• structure of input/output data is described in OWL
o service provider specifies one input OWL class and one output OWL class
• strengths of SADI
o no framework-specific messaging formats or ontologies
o supports batch processing of inputs
o supports long-running services (asynchronous services)
more info: http://sadiframework.org/
11. SADI for GMOD Services
• SADI services for accessing sequence feature data
• implemented as Perl CGI scripts
Service Name Input Relationship Output
get_feature_info database identifier is about feature description
get_features_ collection of feature
genomic coordinates overlaps
overlapping_region descriptions
get_sequence_ DNA, RNA, or amino
genomic coordinates is represented by
for_region acid sequence
has part / derives collection of feature
get_child_features feature description
into descriptions
is part of / derives collection of feature
get_parent_features feature description
from descriptions
12. SADI for GMOD: Structure of Service
Input/Output RDF
Input RDF (N3) Output RDF (N3)
@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> . @perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .
@prefix GeneID: <http://lsrn.org/GeneID:> . @prefix GeneID: <http://lsrn.org/GeneID:> .
@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?
GeneID:49962 id=> .
a lsrn:GeneID_Record; @prefix GenBank: <http://lsrn.org/GB:> .
sio:SIO_000008 [ # p = 'has attribute'
a lsrn:GeneID_Identifier; # p = 'is about'
sio:SIO_000300 "49962" # p = 'has value' GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .
] .
# feature
FlyBase:FBgn0040037
a SO:SO_0000704 . # o = 'gene'
range:position [
HTTP a range:RangedSequencePosition;
sio:SIO_000053 . # p = 'has proper part'
POST [ a range:StartPosition; sio:SIO_000300 26994];
sio:SIO_000053 . # p = 'has proper part'
[ a range:EndPosition; sio:SIO_000300 32391];
range:in_relation_to _:minus_strand_seq
] .
_:minus_strand_seq
sio:SIO_000011 [ # p = 'represents'
a strand:MinusStrand;
sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of'
] .
# reference feature (chromosome)
FlyBase:4 # chromosome 4
get_feature_info a SO:SO_0000105 . # o = 'chromosome arm'
16. Acknowledgements
Team
Mark Wilkinson: Principal Investigator
Luke McCarthy: Lead Programmer, SADI & SHARE
Edward Kawas: Perl Programmer, SADI
Funding
Microsoft
Research
http://sadiframework.org/
17. SADI Training Course
“Web Publishing of Scientific Data and Services”
October 22nd-23rd, 2011
University of British Columbia (next door!)
Learn how to:
=> semantically describe service functionality in OWL
=> publish Semantic Web services using the SADI
framework
More info: http://sadiframework.org/training