SlideShare uma empresa Scribd logo
1 de 19
12th June, 2016
BioHackathon 2016 Symposium, Japan
Facilitating Semantic Alignment of EBI
Resources
Simon Jupp
Ontology Project Lead
Samples, Phenotypes and
Ontologies Team
www.ebi.ac.uk
SPOT team - Adding value with ontologies
Data
Exploration
and
Cleanup
Data
structuring
Ontology
Annotatio
n
Data cleaning
and mapping
Ontology
building
Structured data
Data Enrichment Services
• Building an interoperability
toolkit for Europe (Elixir)
• Micro-service architecture
• Technology-agnostic
• Pushing boundaries of ontology
“embedding”
New ontology lookup service!
Building an ontology toolkit
Data
Exploration
and
Cleanup
Data
structuring
Ontology
Annotatio
n
Data cleaning
and mapping
Ontology
building
Webulous
OxO mapping service
Building metadata rich resources
• Ontology markup of experimental
variables/samples
• Focus on Phenotype/Disease
annotation
• Linking common to rare disease
ArrayExpress
Gene Expression atlas
0
20
40
60
80
100
89
77 78
100 99
EFO mapped coverage
OpenTargets Data Mapping Process
Reactome Metabolic pathways DOID
GWAS catalog
Common Disease
(GWAS) EFO
Atlas Expression EFO
Uniprot
Rare Disease (Expert-
reviewed OMIM)
OMIM + own controlled
vocab
European Variation
Archive Rare Disease
OMIM + Orphanet +
SNOMED + Genetic
Alliance + HPO
ChEMBL Bioactivity data
ATC classification (14
terms)
EuropePMC Literature Mining UMLS
IMPC Mouse Models MPO + HPO
Cancer Gene Census Somatic Mutations
own controlled vocab +
NCIT
Acquire
Clean
Map to
Ontology
Curate
Add new
terms
Iterate
Experiment Factor Ontology – Data Driven
Application Ontology
• EFO is an application ontology, built for use in production services in
OWL
• Imports from ~10 ontologies, isolates us from external churn
• Cross referenced to 25 additional ontologies
• Continuous integration build process, reasoning, manual error checking, multi-
editor environment
Chemical Entities of
Biological Interest
(ChEBI)
Gene Ontology
Cell Type
Anatomy
Phenotype
Disease
Ontologies Data
Managing data evolution in production
Ontology
Annotation
Provenance: who, when, context
Disease
Anatomy
Cell types
Gene function
(GO, HP, MP,
UBERON, DO,
ORDO)
Phenotype
…
Ontologies in applications
Smarter searching
Data visualisation
Data analysis
Data integration
Open Targets
Which other diseases are associated with PDE4D?
View diseases
grouped in therapeutic
areas or organised in
a tree
View more information about
PDE4D
Filter by
therapeutic
area
BioSolr
“BioSolr aims to significantly advance the state of the art
with regards to indexing and querying biomedical data
with freely available open source software”
flaxsearch/BioSolr
Solr documents with
ontology annotation
Enriched Solr with ontology content
(synonyms, structure, relations)
Solr/Elastic plugin Query expansion and
hierarchical faceting
Making it all FAIR
Data resources at EMBL-EBI
Genes, genomes & variation
RNA Central
Array
Express
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL SureChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families &
motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
BioStudies
Gene Ontology
Experimental Factor
Ontology
Literature &
ontologies
Product of previous biohackathons
EBI RDF Platform
Successes
• Novel queries possible over
EBI datasets
• Production quality RDF
releases
• Community of users
• Highly available public
SPARQL endpoints
• 500+ users (10-50 million
hits per month)
• Lot of interest from industry
• Catalyst for new RDF efforts
Lessons
● Public SPARQL endpoints
problematic
● Query federation not
performant
● Inference support limited
● Not scalable for all EBI data
e.g. Variation, ENA
● Lack of expertise in service
teams
● Too much overhead to get
started quickly in this space
Challenges for RDF at EMBL-EBI
• Most EBI resources publish data in forms that support
common use cases (pre-integrated)
• Individuals teams do the hard work so you don’t have to
• RDF representation not optimised for performance
• Barrier to building real (killer) applications
• Technology not mature enough / developer frameworks
lacking
• Doing RDF shouldn’t mandate a technology choice anyway
• RDF not yet a “core” activity for EMBL-EBI
Where we are going next with RDF
• Virtualised infrastructure for RDF
• Simpler cloud deployment
• Building a single EBI RDF cache
• Simpler to manage
• More interesting queries
• Exploring cheaper paths to RDF
• RDF from REST + JSON-LD
• Via Wikidata
• RDFa and schema.org (bioschemas)
Acknowledgements
• Sample Phenotypes and Ontologies Team
• Olga Vrousgou, Thomas Liener, Dani Welter, Catherine
Leroy, Sira Sarntivijai, Ilinca Tudose, Tony Burdett, Helen
Parkinson
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL, Excelerate
Topic and interest for the hackathon
• Ontology Mapping
• Disease (rare, common, phenotypes)
• Data annotation (automated, machine learning, text
mining)
• Virtualised RDF data deployment
• RDF on the fly
• RDF over Mongo, Neo4j, Solr, Elastic
• REST + JSON-LD

Mais conteúdo relacionado

Mais procurados

Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientrobertstevens65
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking PeoplefereiraJ
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions Pablo Pareja Tobes
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformaticsCharlie Hull
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 

Mais procurados (20)

Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficient
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Bio4j
Bio4jBio4j
Bio4j
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking People
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 

Semelhante a Facilitating semantic alignment.-biohackathon-jupp

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?EUDAT
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...Koray Atalag
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
ELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR-UK and the ELIXIR Interoperability PlatformELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR-UK and the ELIXIR Interoperability PlatformELIXIR UK
 
The Ondex Data Integration Framework
The Ondex Data Integration FrameworkThe Ondex Data Integration Framework
The Ondex Data Integration Frameworkbosc
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyDebashisnaskar
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013mhaendel
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012KUPKB_Team
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biologyrobertstevens65
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 

Semelhante a Facilitating semantic alignment.-biohackathon-jupp (20)

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
ELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR-UK and the ELIXIR Interoperability PlatformELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR-UK and the ELIXIR Interoperability Platform
 
The Ondex Data Integration Framework
The Ondex Data Integration FrameworkThe Ondex Data Integration Framework
The Ondex Data Integration Framework
 
Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Facilitating semantic alignment.-biohackathon-jupp

  • 1. 12th June, 2016 BioHackathon 2016 Symposium, Japan Facilitating Semantic Alignment of EBI Resources Simon Jupp Ontology Project Lead Samples, Phenotypes and Ontologies Team www.ebi.ac.uk
  • 2. SPOT team - Adding value with ontologies Data Exploration and Cleanup Data structuring Ontology Annotatio n Data cleaning and mapping Ontology building Structured data
  • 3. Data Enrichment Services • Building an interoperability toolkit for Europe (Elixir) • Micro-service architecture • Technology-agnostic • Pushing boundaries of ontology “embedding” New ontology lookup service!
  • 4. Building an ontology toolkit Data Exploration and Cleanup Data structuring Ontology Annotatio n Data cleaning and mapping Ontology building Webulous OxO mapping service
  • 5. Building metadata rich resources • Ontology markup of experimental variables/samples • Focus on Phenotype/Disease annotation • Linking common to rare disease ArrayExpress Gene Expression atlas 0 20 40 60 80 100 89 77 78 100 99 EFO mapped coverage
  • 6. OpenTargets Data Mapping Process Reactome Metabolic pathways DOID GWAS catalog Common Disease (GWAS) EFO Atlas Expression EFO Uniprot Rare Disease (Expert- reviewed OMIM) OMIM + own controlled vocab European Variation Archive Rare Disease OMIM + Orphanet + SNOMED + Genetic Alliance + HPO ChEMBL Bioactivity data ATC classification (14 terms) EuropePMC Literature Mining UMLS IMPC Mouse Models MPO + HPO Cancer Gene Census Somatic Mutations own controlled vocab + NCIT Acquire Clean Map to Ontology Curate Add new terms Iterate
  • 7. Experiment Factor Ontology – Data Driven Application Ontology • EFO is an application ontology, built for use in production services in OWL • Imports from ~10 ontologies, isolates us from external churn • Cross referenced to 25 additional ontologies • Continuous integration build process, reasoning, manual error checking, multi- editor environment Chemical Entities of Biological Interest (ChEBI) Gene Ontology Cell Type Anatomy Phenotype Disease
  • 8. Ontologies Data Managing data evolution in production Ontology Annotation Provenance: who, when, context Disease Anatomy Cell types Gene function (GO, HP, MP, UBERON, DO, ORDO) Phenotype …
  • 9. Ontologies in applications Smarter searching Data visualisation Data analysis Data integration
  • 10. Open Targets Which other diseases are associated with PDE4D? View diseases grouped in therapeutic areas or organised in a tree View more information about PDE4D Filter by therapeutic area
  • 11. BioSolr “BioSolr aims to significantly advance the state of the art with regards to indexing and querying biomedical data with freely available open source software” flaxsearch/BioSolr Solr documents with ontology annotation Enriched Solr with ontology content (synonyms, structure, relations) Solr/Elastic plugin Query expansion and hierarchical faceting
  • 13. Data resources at EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL SureChEMBL ChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central BioStudies Gene Ontology Experimental Factor Ontology Literature & ontologies Product of previous biohackathons
  • 14. EBI RDF Platform Successes • Novel queries possible over EBI datasets • Production quality RDF releases • Community of users • Highly available public SPARQL endpoints • 500+ users (10-50 million hits per month) • Lot of interest from industry • Catalyst for new RDF efforts Lessons ● Public SPARQL endpoints problematic ● Query federation not performant ● Inference support limited ● Not scalable for all EBI data e.g. Variation, ENA ● Lack of expertise in service teams ● Too much overhead to get started quickly in this space
  • 15. Challenges for RDF at EMBL-EBI • Most EBI resources publish data in forms that support common use cases (pre-integrated) • Individuals teams do the hard work so you don’t have to • RDF representation not optimised for performance • Barrier to building real (killer) applications • Technology not mature enough / developer frameworks lacking • Doing RDF shouldn’t mandate a technology choice anyway • RDF not yet a “core” activity for EMBL-EBI
  • 16. Where we are going next with RDF • Virtualised infrastructure for RDF • Simpler cloud deployment • Building a single EBI RDF cache • Simpler to manage • More interesting queries • Exploring cheaper paths to RDF • RDF from REST + JSON-LD • Via Wikidata • RDFa and schema.org (bioschemas)
  • 17. Acknowledgements • Sample Phenotypes and Ontologies Team • Olga Vrousgou, Thomas Liener, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Tony Burdett, Helen Parkinson • Funding • European Molecular Biology Laboratory (EMBL) • European Union projects: DIACHRON, BioMedBridges and CORBEL, Excelerate
  • 18.
  • 19. Topic and interest for the hackathon • Ontology Mapping • Disease (rare, common, phenotypes) • Data annotation (automated, machine learning, text mining) • Virtualised RDF data deployment • RDF on the fly • RDF over Mongo, Neo4j, Solr, Elastic • REST + JSON-LD