Big data ontology_summit_feb2012

•Download as PPTX, PDF•

1 like•422 views

Barry Smith

The strategy of annotation
Databases describe data using multiple heterogeneous
labels
If we can annotate (tag) these labels using terms from
common controlled vocabularies, then a virtual arms-
length integration can be achieved, providing
• immediate benefits for search and retrieval
• a starting point for the creation of net-centric
reference data
• potential longer term benefits for reasoning
with no need to modify existing systems, code or data

See Ceusters et al. Proceedings of DILS 2004.
http://ontology.buffalo.edu/bio/LinkSuite.pdf
2

String searches yield partial results, rest
on manual effort and on familiarity
with existing database contents
Ontologies facilitate grouping of annotations

brain 20
hindbrain 15
rhombomere 10

Query ‘brain’ without ontology 20
Query ‘brain’ with ontology 45

Examples of where this method works
• Reference Genome Annotation Project
http://www.geneontology.org/GO.refgenome.shtml
• Human resources data in large organizations
http://www.youtube.com/watch?v=OzW3Gc_yA9A
• Military intelligence data
Salmen et al. in http://ceur-ws.org/Vol-808/
Other potential areas of application:
• Crime • Public health
• Insurance • Finance

4

But normally the method does not work

Semantic technology (OWL, …) seeks to break
down data silos
Unfortunately it is now so easy to create
ontologies that myriad incompatible ontologies
are being created in ad hoc ways leading to the
creation of new, semantic silos
The Semantic Web framework as currently
conceived and governed by the W3C (modeled
on html) yields minimal standardization
The more semantic technology is
successful, they more we fail to achieve
our goals
5

Reasons for this effect
• Just as it’s easier to build a new database, so it’s
easier to build a new ontology for each new
project
• You will not get paid for reusing existing
ontologies (Let a million ontologies bloom)
• There are no ‘good’ ontologies, anyway (just
arbitrary choices of terms and relations …)
• Information technology (hardware) changes
constantly, not worth the effort of getting things
right
6

How to do it right?
• how create an incremental, evolutionary
process, where what is good survives, and what is
bad fails
• create a scenario in which people will find it
profitable to reuse ontologies, terminologies and
coding systems which have been tried and tested
• silo effects will be avoided and results of
investment in Semantic Technology will cumulate
effectively
7

Uses of ‘ontology’ in PubMed abstracts

8

By far the most successful: GO (Gene Ontology)

9

GO provides a controlled vocabulary of terms for
use in annotating (tagging) biological data

• multi-species, multi-disciplinary, open source
• built and maintained by domain experts
• contributing to the cumulativity of scientific results
obtained by distinct research communities
• natural language and logical definitions for all
terms to support consistent human application and
computational exploitation
• rigorous governance process
• feedback loop connects users to editors
10

How to do it right
• ontologies should mimic the methodology used
by the GO (following the principles of the OBO
Foundry: http://obofoundry.org)
• ontologies in the same field should be
developed in coordinated fashion to ensure
that there is exactly one ontology for each
subdomain
• ontologies should be developed incrementally
in a way that builds on successful user testing at
every stage
11

What's hot

Modern tools for sharing and synthesizing neuroimaging resultsKrzysztof Gorgolewski

Reproducibility and replicability: a practical approachKrzysztof Gorgolewski

Explorations in bioinformaticsDouglas Joubert

Reproducible research: theoryC. Tobin Magle

Towards open and reproducible neuroscience in the age of big dataKrzysztof Gorgolewski

Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm

Share and Reuse: how data sharing can take your research to the next levelKrzysztof Gorgolewski

Open Source Pharma: From philosophy to real time experienceOpen Source Pharma

Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014aceas13tern

Log Analysis to Understand Medical Professionals' Image Searching BehaviourInstitute of Information Systems (HES-SO)

What's hot (11)

Modern tools for sharing and synthesizing neuroimaging results

Reproducibility and replicability: a practical approach

Explorations in bioinformatics

Reproducible research: theory

Towards open and reproducible neuroscience in the age of big data

Pharos: A Torch to Use in Your Journey in the Dark Genome

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...

Share and Reuse: how data sharing can take your research to the next level

Open Source Pharma: From philosophy to real time experience

Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014

Log Analysis to Understand Medical Professionals' Image Searching Behaviour

Viewers also liked

Ontology Engineering for Big DataKouji Kozaki

AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...IJwest

ontology meets big data:immutabilityChris Partridge

Linking Big Data to Rich Process DescriptionsChristoph Lange

Ontology based metadata schema for digital library projects in ChinaAIMS (Agricultural Information Management Standards)

Horizontal integration of warfighter intelligence dataBarry Smith

Tutorial what is_an_ontology_ncbo_march_2012Barry Smith

Imagenes De Amistad ErikithaPte

Towards Joint Doctrine for Military InformaticsBarry Smith

Ontologies for big dataYu Lin

Towards an Ontology of PhilosophyBarry Smith

The Role of Ontology in the Era of Big Military DataBarry Smith

IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainBarry Smith

Horizontal integration of warfighter intelligence dataBarry Smith

Semantics for Big Data Integration and AnalysisCraig Knoblock

Ontology of PokerBarry Smith

Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith

Viewers also liked (17)

Ontology Engineering for Big Data

AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...

ontology meets big data:immutability

Linking Big Data to Rich Process Descriptions

Ontology based metadata schema for digital library projects in China

Horizontal integration of warfighter intelligence data

Tutorial what is_an_ontology_ncbo_march_2012

Imagenes De Amistad

Towards Joint Doctrine for Military Informatics

Ontologies for big data

Towards an Ontology of Philosophy

The Role of Ontology in the Era of Big Military Data

IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain

Horizontal integration of warfighter intelligence data

Semantics for Big Data Integration and Analysis

Ontology of Poker

Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security

Similar to Big data ontology_summit_feb2012

Ontological realism as a strategy for integrating ontologiesBarry Smith

Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth

Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn

Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance

OOR--Open-Ontology-Repository--jun2010Peter Yim

Web Apollo Tutorial for Medfly Research CommunityMonica Munoz-Torres

SciBiteDr. Haxel Consult

Open hpi semweb-06-part2Nadine Ludwig

Semantics as a service at EMBL-EBISimon Jupp

BioPortal: ontologies and integrated data resourcesat the click of a mouseINRAE (MISTEA) and University of Montpellier (LIRMM)

Tragedy of the Data Commons (ODSC-East, 2021)James Hendler

Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth

BioAssay Express: Creating and exploiting assay metadataPhilip Cheung

Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone

Ontology for the Financial Services IndustryBarry Smith

Experiences in building an ontology driven image database for ...Carla Lima

Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni

How to create a taxonomy for management buy-inMary Chitty

Being Reproducible: SSBSS Summer School 2017Carole Goble

CINECA webinar slides: Making cohort data FAIRCINECAProject

Similar to Big data ontology_summit_feb2012 (20)

Ontological realism as a strategy for integrating ontologies

Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...

Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...

Open interoperability standards, tools and services at EMBL-EBI

OOR--Open-Ontology-Repository--jun2010

Web Apollo Tutorial for Medfly Research Community

SciBite

Open hpi semweb-06-part2

Semantics as a service at EMBL-EBI

BioPortal: ontologies and integrated data resourcesat the click of a mouse

Tragedy of the Data Commons (ODSC-East, 2021)

Semantics for Bioinformatics: What, Why and How of Search, Integration and An...

BioAssay Express: Creating and exploiting assay metadata

Big Data Standards - Workshop, ExpBio, Boston, 2015

Ontology for the Financial Services Industry

Experiences in building an ontology driven image database for ...

Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval

How to create a taxonomy for management buy-in

Being Reproducible: SSBSS Summer School 2017

CINECA webinar slides: Making cohort data FAIR

More from Barry Smith

An application of Basic Formal Ontology to the Ontology of Services and Commo...Barry Smith

Ways of Worldmarking: The Ontology of the EruvBarry Smith

The Division of Deontic LaborBarry Smith

Ontology of Aging (August 2014)Barry Smith

Meaningful UseBarry Smith

The Fifth Cycle of PhilosophyBarry Smith

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith

Enhancing the Quality of ImmPort DataBarry Smith

The Philosophome: An Exercise in the Ontology of the HumanitiesBarry Smith

Science of Emerging Social MediaBarry Smith

Ethics, Informatics and ObamacareBarry Smith

e‐Human Beings: The contribution of internet ranking systems to the developme...Barry Smith

Ontology of aging and deathBarry Smith

Ontology in-buffalo-2013Barry Smith

ImmPort strategies to enhance discoverability of clinical trial dataBarry Smith

Ontology of Documents (2005)Barry Smith

Ontology and the National Cancer Institute Thesaurus (2005)Barry Smith

Introduction to the Logic of DefinitionsBarry Smith

Ontology in Buffalo -- Big Data 2013Barry Smith

How to Do Things With DocumentsBarry Smith

More from Barry Smith (20)

An application of Basic Formal Ontology to the Ontology of Services and Commo...

Ways of Worldmarking: The Ontology of the Eruv

The Division of Deontic Labor

Ontology of Aging (August 2014)

Meaningful Use

The Fifth Cycle of Philosophy

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...

Enhancing the Quality of ImmPort Data

The Philosophome: An Exercise in the Ontology of the Humanities

Science of Emerging Social Media

Ethics, Informatics and Obamacare

e‐Human Beings: The contribution of internet ranking systems to the developme...

Ontology of aging and death

Ontology in-buffalo-2013

ImmPort strategies to enhance discoverability of clinical trial data

Ontology of Documents (2005)

Ontology and the National Cancer Institute Thesaurus (2005)

Introduction to the Logic of Definitions

Ontology in Buffalo -- Big Data 2013

How to Do Things With Documents

Big data ontology_summit_feb2012

1. Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1

2. The strategy of annotation Databases describe data using multiple heterogeneous labels If we can annotate (tag) these labels using terms from common controlled vocabularies, then a virtual arms- length integration can be achieved, providing • immediate benefits for search and retrieval • a starting point for the creation of net-centric reference data • potential longer term benefits for reasoning with no need to modify existing systems, code or data See Ceusters et al. Proceedings of DILS 2004. http://ontology.buffalo.edu/bio/LinkSuite.pdf 2

3. String searches yield partial results, rest on manual effort and on familiarity with existing database contents Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query ‘brain’ without ontology 20 Query ‘brain’ with ontology 45

4. Examples of where this method works • Reference Genome Annotation Project http://www.geneontology.org/GO.refgenome.shtml • Human resources data in large organizations http://www.youtube.com/watch?v=OzW3Gc_yA9A • Military intelligence data Salmen et al. in http://ceur-ws.org/Vol-808/ Other potential areas of application: • Crime • Public health • Insurance • Finance 4

5. But normally the method does not work Semantic technology (OWL, …) seeks to break down data silos Unfortunately it is now so easy to create ontologies that myriad incompatible ontologies are being created in ad hoc ways leading to the creation of new, semantic silos The Semantic Web framework as currently conceived and governed by the W3C (modeled on html) yields minimal standardization The more semantic technology is successful, they more we fail to achieve our goals 5

6. Reasons for this effect • Just as it’s easier to build a new database, so it’s easier to build a new ontology for each new project • You will not get paid for reusing existing ontologies (Let a million ontologies bloom) • There are no ‘good’ ontologies, anyway (just arbitrary choices of terms and relations …) • Information technology (hardware) changes constantly, not worth the effort of getting things right 6

7. How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested • silo effects will be avoided and results of investment in Semantic Technology will cumulate effectively 7

8. Uses of ‘ontology’ in PubMed abstracts 8

9. By far the most successful: GO (Gene Ontology) 9

10. GO provides a controlled vocabulary of terms for use in annotating (tagging) biological data • multi-species, multi-disciplinary, open source • built and maintained by domain experts • contributing to the cumulativity of scientific results obtained by distinct research communities • natural language and logical definitions for all terms to support consistent human application and computational exploitation • rigorous governance process • feedback loop connects users to editors 10

11. How to do it right • ontologies should mimic the methodology used by the GO (following the principles of the OBO Foundry: http://obofoundry.org) • ontologies in the same field should be developed in coordinated fashion to ensure that there is exactly one ontology for each subdomain • ontologies should be developed incrementally in a way that builds on successful user testing at every stage 11

Big data ontology_summit_feb2012

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (17)

Similar to Big data ontology_summit_feb2012

Similar to Big data ontology_summit_feb2012 (20)

More from Barry Smith

More from Barry Smith (20)

Big data ontology_summit_feb2012