SlideShare a Scribd company logo
1 of 26
Download to read offline
Making Linked Data SPARQL with the
InterMine Biological Data Warehouse
9th
International SWAT4LS Conference
5-8 December 2016 Amsterdam
Justin Clark-Casey, Software Engineer @InterMine
Daniela Butano, Software Engineer @InterMine
Today’s Talk
● InterMine
● MOLD (Model Organism Linked
Database)
● Providing RDF and SPARQL from all
Mines: The challenges ahead
What is InterMine?
● A biological data warehouse.
● Initially for Drosophilia.
● But with a flexible and extensible data model.
● Now used as infrastructure by many model organism
(MOD) and other life sciences projects.
● Open-source, continuous development for over 15 years.
● 7 software engineers, 1 biologist, 1 PI.
InterMine Buildtime Architecture
InterMine Runtime Architecture
Extracting Data
Getting data via a script (Python example)
from intermine.webservice import Service
service = Service("http://www.flymine.org/flymine/service")
query = service.new_query("Gene")
query.add_view("name", "proteins.uniprotName")
query.add_constraint("name", "=", "zerknullt", code = "A")
for row in query.rows():
print row["name"], row["proteins.uniprotName"]
Results
zerknullt ZEN1_DROME
zerknullt B4LZ31_DROVI
Great, but...
These export mechanisms have served us well and continue to do so.
But...
● Query requires use of a bespoke language (InterMine PathQuery).
● Exported data may require transformation.
● Whole biological objects only have a human view.
A core aim of InterMine is to make its data provision FAIR, we are always looking for
ways to facilitate this...
MOLD
● Created by the Dumontier Lab in Stanford.
● Model Organism Linked Database
● Create a LOD of model organism data.
○ With links to ontologies and other LOD (e.g. Bio2RDF).
● Publish tools to access and explore the data.
InterMine RDFization Process
Example of Generated RDF
What next?
● Incorporate and extend MOLD components to allow any
mine operator to
○ Generate and publish RDF dumps.
○ Make biological objects available as RDF resources.
○ Provide a SPARQL endpoint.
○ Explore emerging approaches such as Triple Pattern Fragments.
● Mine operators may not be software engineers
○ Software and processes need to be consumable.
Data Challenge : Stable URIs
● Navigation InterMine URIs do not have a stable ID
○ http://www.flymine.org/flymine/report.do?id=1007741
● InterMine ‘shareable’ URIs are better but still have issues
○ http://www.flymine.org/flymine/portal.do?class=Gene&externalids=FBgn00
04053
● Persistence in the face of
○ Name changes
○ Scientific changes
Data Challenge : Ontologies
● As of now, InterMine has a data model with no attached ontologies.
○ Sequence Ontology is a partial exception.
● InterMine-RDFizer generates a vocabulary automatically for the data model.
● But we want to emit RDF that uses existing ontologies
○ Gene Ontology
○ FALDO
○ etc.
● Issues
○ Need a mechanism to attach arbitrary ontologies to the core data model and any extensions.
○ Which ontologies?
○ How do we facilitate user selection?
Tech Challenge : Performance
● Other projects (e.g. MODs) rely on us.
● Questions around SPARQL performance.
● Proposed Solution: Adapt MOLD’s Dockerization approach with a separate
triplestore for data.
○ Pros:
■ Easier deployment.
■ Performance issues can be contained.
■ Decoupled iteration.
○ Cons:
■ Multiple systems.
■ Maturity of Docker?
Maxime Déraspe Department of Molecular Medicine, Université Laval, Québec, CA
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US
Gail Binkley Department of Genetics, Stanford University, Stanford, US
Daniela Butano Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Justin Clark-Casey Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Kalpana Karra Department of Genetics, Stanford University, Stanford, US
Julie Sullivan Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
J. Michael Cherry Department of Genetics, Stanford University, Stanford, US
Jacques Corbeil Department of Molecular Medicine, Université Laval, Québec, CA
Gos Micklem Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Michel Dumontier Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US
THANKYOU!
Justin Clark-Casey
justincc@intermine.org
@justincc
Daniela Butano
daniela@intermine.org
MOLD
http://mo-ld.org/
InterMine
http://intermine.org
@intermineorg
Presentation licensed under Creative
Commons 4.0 Attribution International
goo.gl/IsjPzh

More Related Content

What's hot

Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
Semantic Infrastructure to Enable Collaboration in Ontology Development
Semantic Infrastructure to Enable Collaboration in Ontology DevelopmentSemantic Infrastructure to Enable Collaboration in Ontology Development
Semantic Infrastructure to Enable Collaboration in Ontology DevelopmentPaul Alexander
 
Collaborative ontology development
Collaborative ontology developmentCollaborative ontology development
Collaborative ontology developmentsssw2012
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Valery Tkachenko
 
Loughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentationLoughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentationNicola Louise Beddall-Hill
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies Simon Jupp
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
Semtech web-protege-tutorial
Semtech web-protege-tutorialSemtech web-protege-tutorial
Semtech web-protege-tutorialmatthewhorridge
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringMaría Poveda Villalón
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...Neo4j
 

What's hot (20)

Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
Semantic Infrastructure to Enable Collaboration in Ontology Development
Semantic Infrastructure to Enable Collaboration in Ontology DevelopmentSemantic Infrastructure to Enable Collaboration in Ontology Development
Semantic Infrastructure to Enable Collaboration in Ontology Development
 
Collaborative ontology development
Collaborative ontology developmentCollaborative ontology development
Collaborative ontology development
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...
 
Loughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentationLoughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentation
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Semtech web-protege-tutorial
Semtech web-protege-tutorialSemtech web-protege-tutorial
Semtech web-protege-tutorial
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 

Viewers also liked

Photographic terminology (1)
Photographic terminology (1)Photographic terminology (1)
Photographic terminology (1)ItsRylan
 
Aula de Geografia
Aula de GeografiaAula de Geografia
Aula de Geografiaescolanilce
 
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...
Como arrumar a mesa para as refeições   brincadeiras na cozinha - cristina ca...Como arrumar a mesa para as refeições   brincadeiras na cozinha - cristina ca...
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...Maria Lúcia Medeiros
 
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
  Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...  Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...Raimondo Villano
 
Continous Integration in einem Open Source Projekt
Continous Integration in einem Open Source ProjektContinous Integration in einem Open Source Projekt
Continous Integration in einem Open Source ProjektChristian Münch
 
Didactica magna
Didactica magnaDidactica magna
Didactica magnaPalOma FV
 
The Material For Writing
The Material For WritingThe Material For Writing
The Material For WritingAnisa Rahmawati
 
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAME
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAMEConclusiones Semana Mediterránea de Lideres Económicos - ASCAME
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAMEASCAME
 

Viewers also liked (20)

Photographic terminology (1)
Photographic terminology (1)Photographic terminology (1)
Photographic terminology (1)
 
LASĪTĀKĀS GRĀMATAS APRĪLĪ
LASĪTĀKĀS GRĀMATAS APRĪLĪLASĪTĀKĀS GRĀMATAS APRĪLĪ
LASĪTĀKĀS GRĀMATAS APRĪLĪ
 
Aula de Geografia
Aula de GeografiaAula de Geografia
Aula de Geografia
 
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...
Como arrumar a mesa para as refeições   brincadeiras na cozinha - cristina ca...Como arrumar a mesa para as refeições   brincadeiras na cozinha - cristina ca...
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...
 
Quimica2010
Quimica2010Quimica2010
Quimica2010
 
Magento News @ Magento Meetup Wien 18
Magento News @ Magento Meetup Wien 18Magento News @ Magento Meetup Wien 18
Magento News @ Magento Meetup Wien 18
 
TOP10 pieaugušo grāmatas martā
TOP10 pieaugušo grāmatas martāTOP10 pieaugušo grāmatas martā
TOP10 pieaugušo grāmatas martā
 
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
  Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...  Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
 
Costume research 2
Costume research 2Costume research 2
Costume research 2
 
Brochure jl jaime lopez
Brochure jl jaime lopez Brochure jl jaime lopez
Brochure jl jaime lopez
 
Continous Integration in einem Open Source Projekt
Continous Integration in einem Open Source ProjektContinous Integration in einem Open Source Projekt
Continous Integration in einem Open Source Projekt
 
New Infinity Collection Mestre
New Infinity Collection MestreNew Infinity Collection Mestre
New Infinity Collection Mestre
 
Didactica magna
Didactica magnaDidactica magna
Didactica magna
 
Viaje a Karakorum A
Viaje a Karakorum AViaje a Karakorum A
Viaje a Karakorum A
 
Resume_JMD_ 01-14-17
Resume_JMD_ 01-14-17Resume_JMD_ 01-14-17
Resume_JMD_ 01-14-17
 
New collections iv
New collections ivNew collections iv
New collections iv
 
The Material For Writing
The Material For WritingThe Material For Writing
The Material For Writing
 
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAME
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAMEConclusiones Semana Mediterránea de Lideres Económicos - ASCAME
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAME
 
Isef2012 fabri jp
Isef2012 fabri jpIsef2012 fabri jp
Isef2012 fabri jp
 
Magento News @ Magento Meetup Wien 19
Magento News @ Magento Meetup Wien 19Magento News @ Magento Meetup Wien 19
Magento News @ Magento Meetup Wien 19
 

Similar to Making Linked Data SPARQL with InterMine Biological Data Warehouse

Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIRDOM
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityUniversity Medicine Greifswald
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 

Similar to Making Linked Data SPARQL with InterMine Biological Data Warehouse (20)

Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusability
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 

Recently uploaded

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

Making Linked Data SPARQL with InterMine Biological Data Warehouse

  • 1. Making Linked Data SPARQL with the InterMine Biological Data Warehouse 9th International SWAT4LS Conference 5-8 December 2016 Amsterdam Justin Clark-Casey, Software Engineer @InterMine Daniela Butano, Software Engineer @InterMine
  • 2. Today’s Talk ● InterMine ● MOLD (Model Organism Linked Database) ● Providing RDF and SPARQL from all Mines: The challenges ahead
  • 3. What is InterMine? ● A biological data warehouse. ● Initially for Drosophilia. ● But with a flexible and extensible data model. ● Now used as infrastructure by many model organism (MOD) and other life sciences projects. ● Open-source, continuous development for over 15 years. ● 7 software engineers, 1 biologist, 1 PI.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 13. Extracting Data Getting data via a script (Python example) from intermine.webservice import Service service = Service("http://www.flymine.org/flymine/service") query = service.new_query("Gene") query.add_view("name", "proteins.uniprotName") query.add_constraint("name", "=", "zerknullt", code = "A") for row in query.rows(): print row["name"], row["proteins.uniprotName"] Results zerknullt ZEN1_DROME zerknullt B4LZ31_DROVI
  • 14. Great, but... These export mechanisms have served us well and continue to do so. But... ● Query requires use of a bespoke language (InterMine PathQuery). ● Exported data may require transformation. ● Whole biological objects only have a human view. A core aim of InterMine is to make its data provision FAIR, we are always looking for ways to facilitate this...
  • 15. MOLD ● Created by the Dumontier Lab in Stanford. ● Model Organism Linked Database ● Create a LOD of model organism data. ○ With links to ontologies and other LOD (e.g. Bio2RDF). ● Publish tools to access and explore the data.
  • 18.
  • 19.
  • 20.
  • 21. What next? ● Incorporate and extend MOLD components to allow any mine operator to ○ Generate and publish RDF dumps. ○ Make biological objects available as RDF resources. ○ Provide a SPARQL endpoint. ○ Explore emerging approaches such as Triple Pattern Fragments. ● Mine operators may not be software engineers ○ Software and processes need to be consumable.
  • 22. Data Challenge : Stable URIs ● Navigation InterMine URIs do not have a stable ID ○ http://www.flymine.org/flymine/report.do?id=1007741 ● InterMine ‘shareable’ URIs are better but still have issues ○ http://www.flymine.org/flymine/portal.do?class=Gene&externalids=FBgn00 04053 ● Persistence in the face of ○ Name changes ○ Scientific changes
  • 23. Data Challenge : Ontologies ● As of now, InterMine has a data model with no attached ontologies. ○ Sequence Ontology is a partial exception. ● InterMine-RDFizer generates a vocabulary automatically for the data model. ● But we want to emit RDF that uses existing ontologies ○ Gene Ontology ○ FALDO ○ etc. ● Issues ○ Need a mechanism to attach arbitrary ontologies to the core data model and any extensions. ○ Which ontologies? ○ How do we facilitate user selection?
  • 24. Tech Challenge : Performance ● Other projects (e.g. MODs) rely on us. ● Questions around SPARQL performance. ● Proposed Solution: Adapt MOLD’s Dockerization approach with a separate triplestore for data. ○ Pros: ■ Easier deployment. ■ Performance issues can be contained. ■ Decoupled iteration. ○ Cons: ■ Multiple systems. ■ Maturity of Docker?
  • 25. Maxime Déraspe Department of Molecular Medicine, Université Laval, Québec, CA Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US Gail Binkley Department of Genetics, Stanford University, Stanford, US Daniela Butano Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Justin Clark-Casey Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Kalpana Karra Department of Genetics, Stanford University, Stanford, US Julie Sullivan Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK J. Michael Cherry Department of Genetics, Stanford University, Stanford, US Jacques Corbeil Department of Molecular Medicine, Université Laval, Québec, CA Gos Micklem Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Michel Dumontier Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US