SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
12th July, 2016
Connecting life sciences data at the
European Bioinformatics Institute
Tony Burdett
Technical Co-ordinator –
Samples, Phenotypes and
Ontologies Team
www.ebi.ac.uk
Bioinformatics is
the science of storing,
retrieving and analysing
large amounts of
biological information.
What is EMBL-EBI?
• Europe’s home for biological data services, research
and training
• A trusted data provider for the life sciences
• Part of the European Molecular Biology Laboratory,
an intergovernmental research organisation
• International: 570 members of staff from 57 nations
• Home of the ELIXIR Technical hub.
OUR MISSION
To provide freely
available data and
bioinformatics services
to all facets of the
scientific community in
ways that promote
scientific progress
Big data, big demand
~18.5 million
requests to EMBL-EBI
websites every day
60 petabytes
of EMBL-EBI storage capacity
EMBL-EBI handles
9.2 million
jobs on average per
month
Scientists at over
5 million
unique sites use
EMBL-EBI websites
Atlas
what happens
where
From molecules to medicine
Biology is changing:
• Lower-cost sequencing
• More data produced
• New types of data
• Emphasis on systems biology
Bioinformatics enables new
applications:
• molecular medicine
• agriculture
• food
• environmental sciences
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL SureChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families &
motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
BioStudies
Gene Ontology
Experimental Factor
Ontology
Literature &
ontologies
Database interactions
• Collaborative community
facilitates social,
scientific and technical
interactions
• Right: internal
interactions between
data resources as
determined by the
exchange of data.
• Width of each internal
arc weighted according
to the number of different
data types exchanged.
Biology 101 – Central Dogma
Dhorspool at en.wikipedia [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)
or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
Sadly, it’s not *quite* that simple…
User:Dhorspool [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)
or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
Nope, not that simple either…
Proteome
Metabolome
Genome
tissue
CE-
MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array
PathwaysProtein
Interaction
Drug
targets
Connections between Databases
Gene (via identifiers.
org/ensembl)
RNA transcript (via
identifiers.org/ensembl)
uniprot:Protein
rdfs:seeAlso (not currently linking
to identifiers.org but soon)
discretized differential
gene expression ratio
(sio: SIO_001078)
Gene Expression Atlas
Ensembl
sio:'is attribute of'
(sio:SIO_000011)
Uniprot
Gene Ontology
GO BP GO MF GO CC
uniprot:classifiedWith
bq:occursIn
Organisms
Organism/taxon
ChEMBL
Assay
(?)
chem
bl:hasTarget
?
bq:isVersionOf
uniprot:organism
rdfs:seeAlso
1
1
1
*
1
* * *
1
1
BioModels
SBMLModel
Reaction
Species
Compartment
bq:is
bq:isVersionOf
bq:isVersionOf
bq:is
bq:isVersionOf
bq:isHomologTo
bq:hasPart
ChEBI
Reactome
Pathway
bq:isVersionOf
bq:isVersionOf
SBO
bq:is
Relationships within
Biomodels can be found
at https://github.
com/sarala/ricordo-
rdfconverter/wiki/SBML-
RDF-Schema
rdfs:seeAlso
Structure
PDB
1
rdfs:seeAlso
Target (?)
uniprot:transcribedFrom
Protein (via identifiers.
org/ensembl)
uniprot:translatedTo
bq:isVersionOf
We get REALLY good at doing this…
We get REALLY good at doing this…
http://www.ebi.ac.uk/rdf
How do we turn data into Linked Data
(Example from the Gene Expression Atlas)
Relational Data to RDF graph conversion
• Give “things” URIs
• Type “things” with ontologies
• Link “things” to other related “things”
Modeling data vs biology
• Typing and semantics is the main strength of RDF, so we
focused on this aspect
• A lot of ontologies for the life sciences
• However, most model biology
• What does an Ensembl entry represent? Is an Ensembl
identifier really an instance of a Sequence Ontology Gene
class?
ensembl:ENSMUSG00000001467
rdf:type
so:’protein coding gene’
Codiad
Database Entry or Real World Entity?
• Practically it makes sense to treat database entries as
proxies for the real world entity they represent
• Alternative introduces a layer of indirection that would only
make linking resources harder
• It means we can use biologically meaningful relationships
• But this may or may not work for all use cases
ensembl:ENSMUSG00000001467
rdf:type
so:’protein coding gene’
ensembl:ENSMUST00000001507
rdf:type
so:’transcript’
so:’transcribed from’
Knowledge representation challenges
• The semantics of our data is complex
• The provenance models are even more complex
• The relationship are hard to define
• Balancing use-cases with representation is a major
challenge
• The harder you try to get representation correct, the harder it
is for users to query
• Performance drops off for simple queries
Connecting Gene and Protein in EBI RDF
EBI RDF Platform
Successes
• Novel queries possible over
EBI datasets
• Production quality RDF
releases
• Community of users
• Highly available public
SPARQL endpoints
• 500+ users (10-50 million
hits per month)
• Lot of interest from industry
• Catalyst for new RDF efforts
Lessons
● Public SPARQL endpoints
problematic
● Query federation not
performant
● Inference support limited
● Not scalable for all EBI data
e.g. Variation, ENA
● Lack of expertise in service
teams
● Too much overhead to get
started quickly in this space
Ontologies for life sciences
22
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies as Graphs
• OWL ontologies aren’t graphs, but…
… can be represented as an RDF graph
… people want to use them as graphs
• Plenty of RDF databases around
• But incomplete w.r.t. OWL semantics
• SPARQL is an acquired taste
Ontology repository use-cases
• Search for ontology terms
• labels, synonyms, descriptions
• Querying the structure
• Get parent/child terms
• Querying transitive closure
• Get ancestor/descendant terms
• Querying across relations
• Partonomy or development stages
• We can satisfy these requirements with Neo4J
OWL to Neo4j schema
Label every node by type (e.g. class, property or individual) and ontology id
Label every relation by name
include additional index for “special relations” like partonomy and subsets
Powerful yet simple queries
• Get the transitive closure for “heart” following parent and
partonomy relations from the UBERON anatomy ontology
MATCH path =
(n:Class)-
[r:SUBCLASSOF|RelatedTree*]
->(parent)<-
[r2:SUBCLASSOF|RelatedTree]
-(sibling:Class)
WHERE n.ontology_name = {0}
AND n.iri = {1}
Final thoughts – Neo4j and JSON-LD?
• A lot of frameworks now make it trivial to produce good
APIs
• What’s currently missing is how to integrate data from two or
more independent APIs
• Hard to crawl independent datasets for connections without
a human to interpret semantics
• Still a need to express a schema alongside the data
• W3C standard like RDF/RDFS/SKOS/OWL provide the
basic vocabularies and semantics for expressing data
schemas
• JSON-LD is bridging the gap from JSON to RDF
Acknowledgements
• Sample Phenotypes and Ontologies
• Simon Jupp, Olga Vrousgou, Thomas Liener, Dani Welter,
Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen
Parkinson
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL, Excelerate
Questions?

Mais conteúdo relacionado

Mais procurados

Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...Maulik Kamdar
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011cmzmasek
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomaticsnguyenpg
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 

Mais procurados (20)

Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 

Destaque

EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data ChemAxon
 
Ten Years of Linked Data at the BBC
Ten Years of Linked Data at the BBCTen Years of Linked Data at the BBC
Ten Years of Linked Data at the BBCConnected Data World
 
NBA Bargaining and Negotiations
NBA Bargaining and NegotiationsNBA Bargaining and Negotiations
NBA Bargaining and Negotiationsjrbailey2
 
The Three Lines of Defense Model & Continuous Controls Monitoring
The Three Lines of Defense Model & Continuous Controls MonitoringThe Three Lines of Defense Model & Continuous Controls Monitoring
The Three Lines of Defense Model & Continuous Controls MonitoringCaseWare IDEA
 
2.antithyroidal and thyroid drugs
2.antithyroidal and thyroid drugs2.antithyroidal and thyroid drugs
2.antithyroidal and thyroid drugsMirza Anwar Baig
 
Presentasi Konsep Perancangan Kawasan di Mangkang Wetan
Presentasi Konsep Perancangan Kawasan di Mangkang WetanPresentasi Konsep Perancangan Kawasan di Mangkang Wetan
Presentasi Konsep Perancangan Kawasan di Mangkang WetanTaufiq Nashrullah
 

Destaque (10)

EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
 
Ten Years of Linked Data at the BBC
Ten Years of Linked Data at the BBCTen Years of Linked Data at the BBC
Ten Years of Linked Data at the BBC
 
Planeación y administración
Planeación y administraciónPlaneación y administración
Planeación y administración
 
NBA Bargaining and Negotiations
NBA Bargaining and NegotiationsNBA Bargaining and Negotiations
NBA Bargaining and Negotiations
 
Estrogen
EstrogenEstrogen
Estrogen
 
Sports Industry
Sports IndustrySports Industry
Sports Industry
 
The Three Lines of Defense Model & Continuous Controls Monitoring
The Three Lines of Defense Model & Continuous Controls MonitoringThe Three Lines of Defense Model & Continuous Controls Monitoring
The Three Lines of Defense Model & Continuous Controls Monitoring
 
2.antithyroidal and thyroid drugs
2.antithyroidal and thyroid drugs2.antithyroidal and thyroid drugs
2.antithyroidal and thyroid drugs
 
estrogen
estrogenestrogen
estrogen
 
Presentasi Konsep Perancangan Kawasan di Mangkang Wetan
Presentasi Konsep Perancangan Kawasan di Mangkang WetanPresentasi Konsep Perancangan Kawasan di Mangkang Wetan
Presentasi Konsep Perancangan Kawasan di Mangkang Wetan
 

Semelhante a Connecting life sciences data at the European Bioinformatics Institute

Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...Neo4j
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Nucleic acid database
Nucleic acid databaseNucleic acid database
Nucleic acid databaseEsakkiammal S
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...Koray Atalag
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 

Semelhante a Connecting life sciences data at the European Bioinformatics Institute (20)

Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Nucleic acid database
Nucleic acid databaseNucleic acid database
Nucleic acid database
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 

Mais de Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 

Mais de Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Connecting life sciences data at the European Bioinformatics Institute

  • 1. 12th July, 2016 Connecting life sciences data at the European Bioinformatics Institute Tony Burdett Technical Co-ordinator – Samples, Phenotypes and Ontologies Team www.ebi.ac.uk
  • 2. Bioinformatics is the science of storing, retrieving and analysing large amounts of biological information.
  • 3. What is EMBL-EBI? • Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • Part of the European Molecular Biology Laboratory, an intergovernmental research organisation • International: 570 members of staff from 57 nations • Home of the ELIXIR Technical hub.
  • 4. OUR MISSION To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress
  • 5. Big data, big demand ~18.5 million requests to EMBL-EBI websites every day 60 petabytes of EMBL-EBI storage capacity EMBL-EBI handles 9.2 million jobs on average per month Scientists at over 5 million unique sites use EMBL-EBI websites
  • 6. Atlas what happens where From molecules to medicine Biology is changing: • Lower-cost sequencing • More data produced • New types of data • Emphasis on systems biology Bioinformatics enables new applications: • molecular medicine • agriculture • food • environmental sciences
  • 7. Data resources at EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL SureChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central BioStudies Gene Ontology Experimental Factor Ontology Literature & ontologies
  • 8. Database interactions • Collaborative community facilitates social, scientific and technical interactions • Right: internal interactions between data resources as determined by the exchange of data. • Width of each internal arc weighted according to the number of different data types exchanged.
  • 9. Biology 101 – Central Dogma Dhorspool at en.wikipedia [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
  • 10. Sadly, it’s not *quite* that simple… User:Dhorspool [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
  • 11. Nope, not that simple either… Proteome Metabolome Genome tissue CE- MS antibody array LC-MS/MS m/z 600 800 1000 1200 1400 1600 10 20 30 40 50 60 70 80 90 100 Intensity 609.256 b6 755.422 y8 882.357 b9 852.476 y9 995.435 b10 1092.506 b11 1181.252 y12 1318.578 b13 1587.759 b16 1715.817 b18 858.408 b18 ++ 794.380 b16 ++ 0 miRNA array mRNA array PathwaysProtein Interaction Drug targets
  • 12. Connections between Databases Gene (via identifiers. org/ensembl) RNA transcript (via identifiers.org/ensembl) uniprot:Protein rdfs:seeAlso (not currently linking to identifiers.org but soon) discretized differential gene expression ratio (sio: SIO_001078) Gene Expression Atlas Ensembl sio:'is attribute of' (sio:SIO_000011) Uniprot Gene Ontology GO BP GO MF GO CC uniprot:classifiedWith bq:occursIn Organisms Organism/taxon ChEMBL Assay (?) chem bl:hasTarget ? bq:isVersionOf uniprot:organism rdfs:seeAlso 1 1 1 * 1 * * * 1 1 BioModels SBMLModel Reaction Species Compartment bq:is bq:isVersionOf bq:isVersionOf bq:is bq:isVersionOf bq:isHomologTo bq:hasPart ChEBI Reactome Pathway bq:isVersionOf bq:isVersionOf SBO bq:is Relationships within Biomodels can be found at https://github. com/sarala/ricordo- rdfconverter/wiki/SBML- RDF-Schema rdfs:seeAlso Structure PDB 1 rdfs:seeAlso Target (?) uniprot:transcribedFrom Protein (via identifiers. org/ensembl) uniprot:translatedTo bq:isVersionOf
  • 13. We get REALLY good at doing this…
  • 14. We get REALLY good at doing this…
  • 16. How do we turn data into Linked Data (Example from the Gene Expression Atlas) Relational Data to RDF graph conversion • Give “things” URIs • Type “things” with ontologies • Link “things” to other related “things”
  • 17. Modeling data vs biology • Typing and semantics is the main strength of RDF, so we focused on this aspect • A lot of ontologies for the life sciences • However, most model biology • What does an Ensembl entry represent? Is an Ensembl identifier really an instance of a Sequence Ontology Gene class? ensembl:ENSMUSG00000001467 rdf:type so:’protein coding gene’ Codiad
  • 18. Database Entry or Real World Entity? • Practically it makes sense to treat database entries as proxies for the real world entity they represent • Alternative introduces a layer of indirection that would only make linking resources harder • It means we can use biologically meaningful relationships • But this may or may not work for all use cases ensembl:ENSMUSG00000001467 rdf:type so:’protein coding gene’ ensembl:ENSMUST00000001507 rdf:type so:’transcript’ so:’transcribed from’
  • 19. Knowledge representation challenges • The semantics of our data is complex • The provenance models are even more complex • The relationship are hard to define • Balancing use-cases with representation is a major challenge • The harder you try to get representation correct, the harder it is for users to query • Performance drops off for simple queries
  • 20. Connecting Gene and Protein in EBI RDF
  • 21. EBI RDF Platform Successes • Novel queries possible over EBI datasets • Production quality RDF releases • Community of users • Highly available public SPARQL endpoints • 500+ users (10-50 million hits per month) • Lot of interest from industry • Catalyst for new RDF efforts Lessons ● Public SPARQL endpoints problematic ● Query federation not performant ● Inference support limited ● Not scalable for all EBI data e.g. Variation, ENA ● Lack of expertise in service teams ● Too much overhead to get started quickly in this space
  • 22. Ontologies for life sciences 22 Genotype Phenotype Sequence Proteins Gene products Transcript Pathways Cell type BRENDA tissue / enzyme source Development Anatomy Phenotype Plasmodium life cycle -Sequence types and features -Genetic Context - Molecule role - Molecular Function - Biological process - Cellular component -Protein covalent bond -Protein domain -UniProt taxonomy -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version -Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype - Human phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history eVOC (Expressed Sequence Annotation for Humans)
  • 23. Ontologies as Graphs • OWL ontologies aren’t graphs, but… … can be represented as an RDF graph … people want to use them as graphs • Plenty of RDF databases around • But incomplete w.r.t. OWL semantics • SPARQL is an acquired taste
  • 24. Ontology repository use-cases • Search for ontology terms • labels, synonyms, descriptions • Querying the structure • Get parent/child terms • Querying transitive closure • Get ancestor/descendant terms • Querying across relations • Partonomy or development stages • We can satisfy these requirements with Neo4J
  • 25. OWL to Neo4j schema Label every node by type (e.g. class, property or individual) and ontology id Label every relation by name include additional index for “special relations” like partonomy and subsets
  • 26. Powerful yet simple queries • Get the transitive closure for “heart” following parent and partonomy relations from the UBERON anatomy ontology MATCH path = (n:Class)- [r:SUBCLASSOF|RelatedTree*] ->(parent)<- [r2:SUBCLASSOF|RelatedTree] -(sibling:Class) WHERE n.ontology_name = {0} AND n.iri = {1}
  • 27. Final thoughts – Neo4j and JSON-LD? • A lot of frameworks now make it trivial to produce good APIs • What’s currently missing is how to integrate data from two or more independent APIs • Hard to crawl independent datasets for connections without a human to interpret semantics • Still a need to express a schema alongside the data • W3C standard like RDF/RDFS/SKOS/OWL provide the basic vocabularies and semantics for expressing data schemas • JSON-LD is bridging the gap from JSON to RDF
  • 28. Acknowledgements • Sample Phenotypes and Ontologies • Simon Jupp, Olga Vrousgou, Thomas Liener, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen Parkinson • Funding • European Molecular Biology Laboratory (EMBL) • European Union projects: DIACHRON, BioMedBridges and CORBEL, Excelerate