SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
Data integration framework for discovery and validation:
            Smart merging of experimental and public data across
                          ontologies and taxonomies
                                                     Erich Gombocz1), Robert Stanley1), Chuck Rockey1), Toshiro Nishimura1)
1
    IO Informatics, 2550 Ninth Street, Suite 114, Berkeley, CA 94710, U.S.A. [Correspondence: egombocz@io-informatics.com]


Summary
While a variety of data integration frameworks have been proposed and been used for quite some time,
most of them commonly lack the ability to generate coherent, meaningful datasets across different
disciplines. This report introduces the use of semantic technologies to apply an integrated systems biology
approach using data relationships and merged ontologies to better make sense of findings from genomics,
proteomics, metabolomics and clinical endpoints. This approach remedies many obstacles on the path
towards a functional integration.

Such an undertaking requires the ability to merge datasets independently of their existing taxonomies,
ontologies or semantic standards and to consolidate vocabularies for data classes, terms and their
relationships. “Smart” merging also needs to account for hierarchical adjustments into super- or
subclasses of the new, merged ontology tree whenever needed. The latter specifically applies when
integrating in-house experimental data with public domain reference sources with very specific
taxonomies and hierarchies. In this work, we present examples of toxicity biomarker studies across
different sample types and –OMICs data. To meaningfully integrate such data requires a semantic
representation of the resulting knowledgebase. During data import and relationship mapping, one or
multiple thesauri are applied in the background to auto-generate a coherent network from all sources.

Through use of a combination of experimental observations and publicly available reference information,
the merged network is able to provide insights about complex relationships and interactions by reduction         Fig.1: Diagrammatic overview: Definitions, requirements, example and use cases case for merging of ontologies
to query-driven sub-networks and their subsequent analysis. In this example, we were able to qualify
potential biomarkers within their multi-response activity across tissues and toxicants. The strength and
applicability of such a methodology for rational drug discovery and application in personalized medicine
as well as its impact towards a better understanding of complex biological processes are discussed.


Challenges
         General data coherence: different source data are typically categorized in different taxonomies;
          this results in normalization issues, term inconsistencies and property disparities across large,
          complex experimental data sets
         Pharmacodynamic correlations are not necessarily functionally linked within the biochemical
          network; despite resulting from the same drug perturbation, the observed -OMICs expression
          changes may represent very different biological processes
         In many cases, data relationships are not a priori contained in the data sets
         Different incompatible semantic standards make merging towards a common ontology extremely
          difficult; this also impacts applicability of query and reasoning tools
         Relationship consolidation and class hierarchy validation or adjustment may be required to make
          inferencing and reasoning meaningful
         Lack of intuitive, science-driven tools makes researchers shy away from network data analysis
          approaches in general.
         Scalability and performance issues confine most network approaches to relatively small datasets        Fig.2: Mapping and managing nomenclature via Sentient Thesaurus Manager™(left); a common knowledgebase
                                                                                                                 from genes, proteins, metabolites, tissue analytics and clinical disease codes in Sentient Knowledge Explorer™

Methodology
         Use import mapping to consolidate relationships; separate numeric from non-numeric content
          during class generation to account for scaling via numerical properties,
          Apply controlled vocabularies when importing or merging, using one or multiple thesauri and
          provide authoring capabilities for groups and representative terms for both, classes and
          relationships
         Allow for proper handling of non well-formed RDF
         Provide a facility to import from or directly connect to via URIs
         Exercise a multi-functional query tool supporting graphical and SPARQL queries to reduce
          network complexity
         Facilitate output of the resulting knowledgebase with or without its graphical representation to
          RDF, N3 or triple store backend

Approach
         For output from queries to relational databases (in tsv, csv, xml, fqml) or from devices or            Fig.3: From web query on experimental observations (left) to a multi-source, multi-standard merged ontology and
          databases using PSI-MI, an interactive graphical/textual import mapper for relationships is            network analysis using public reference resources: Drillout to NCBI, UniProt, IntAct, BioGrid, KEGG and HMDB
          applied; it also can be configured for automatic background mapping when deployed in
          conjunction with web-based query to multiple relational databases                                      Conclusions
         Use, author and apply one or multiple thesauri for consolidation of classes and relationships          Using Sentient™ tools as data integration framework, we were able to merge multi-OMICs experimental
         “Drag-to-knowledgebase” function with automatic classification of semantic data in RDF, N3,            observations with publicly available resources in a variety of formats towards a common ontology
          OBO and OWL (local or remote file system)                                                              semantic knowledgebase and apply it to discovery and qualification of toxicity biomarkers. Specifically,
         Permit direct URI input to web-based reference resources                                               we were able to demonstrate the following:
         Utilize entity browser with sorting (by subject, predicate and object) and paging for large datasets          Coherent merging of experimental and public data across standards, taxonomies and ontologies
          to navigate through instance properties                                                                        into an integrated, semantically organized data and relationship network
         Provide means to reduce network complexity: apply criteria for connection depth, numeric                      Ability to focus on visualization and analysis of a sub-network of specific scientific interest (e.g.
          scaling and weighting and other conditions (such as at least, at most, exactly n connections)                  biomarker qualification across tissues, different treatments, different genetic responder profiles,
         Employ a built-in query tool which supports graphical, textual and SPARQL queries to specify                   etc.) in a complex, large dataset
          conditions, then re-plot the resulting sub-networks to reveal hidden or unknown functional                    Qualification of potential biomarker panels for toxicity across tissues and toxicants and gaining
          biological relationships and their entities in context.                                                        insights in complex biological functions involving multiple pathways

Results                                                                                                          Acknowledgements
This poster demonstrates the applicability of a semantic data integration framework to meaningfully              This work was made possible by insights and many discussions with members of IO Informatics’ industry
merge experimental and public data for biomarker discovery and validation. Different ontologies across           and academic centers of excellence members of the Working Group on “Semantic Applications for
diverse standards have been merged into a common ontology knowledgebase. The presented network                   Translational Research”. The authors would like to acknowledge Ted Slater (Pfizer), Jonas Almeida (MD
viewer, the Sentient Knowledge Explorer™, can comfortably handle one million assertions in memory                Anderson Cancer Center), Pat Hurban (CLDA), Alan Higgins (Viamet Pharmaceuticals) and Bruce Mc.
while running on a Pentium 4 with 2GB of RAM, and is able to directly query the knowledgebase.                   Manus and Mark Wilkinson (both from James Hogg iCapture Centre) for their various contributions.

Mais conteúdo relacionado

Último

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Destaque (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Data integration framework for discovery and validation

  • 1. Data integration framework for discovery and validation: Smart merging of experimental and public data across ontologies and taxonomies Erich Gombocz1), Robert Stanley1), Chuck Rockey1), Toshiro Nishimura1) 1 IO Informatics, 2550 Ninth Street, Suite 114, Berkeley, CA 94710, U.S.A. [Correspondence: egombocz@io-informatics.com] Summary While a variety of data integration frameworks have been proposed and been used for quite some time, most of them commonly lack the ability to generate coherent, meaningful datasets across different disciplines. This report introduces the use of semantic technologies to apply an integrated systems biology approach using data relationships and merged ontologies to better make sense of findings from genomics, proteomics, metabolomics and clinical endpoints. This approach remedies many obstacles on the path towards a functional integration. Such an undertaking requires the ability to merge datasets independently of their existing taxonomies, ontologies or semantic standards and to consolidate vocabularies for data classes, terms and their relationships. “Smart” merging also needs to account for hierarchical adjustments into super- or subclasses of the new, merged ontology tree whenever needed. The latter specifically applies when integrating in-house experimental data with public domain reference sources with very specific taxonomies and hierarchies. In this work, we present examples of toxicity biomarker studies across different sample types and –OMICs data. To meaningfully integrate such data requires a semantic representation of the resulting knowledgebase. During data import and relationship mapping, one or multiple thesauri are applied in the background to auto-generate a coherent network from all sources. Through use of a combination of experimental observations and publicly available reference information, the merged network is able to provide insights about complex relationships and interactions by reduction Fig.1: Diagrammatic overview: Definitions, requirements, example and use cases case for merging of ontologies to query-driven sub-networks and their subsequent analysis. In this example, we were able to qualify potential biomarkers within their multi-response activity across tissues and toxicants. The strength and applicability of such a methodology for rational drug discovery and application in personalized medicine as well as its impact towards a better understanding of complex biological processes are discussed. Challenges  General data coherence: different source data are typically categorized in different taxonomies; this results in normalization issues, term inconsistencies and property disparities across large, complex experimental data sets  Pharmacodynamic correlations are not necessarily functionally linked within the biochemical network; despite resulting from the same drug perturbation, the observed -OMICs expression changes may represent very different biological processes  In many cases, data relationships are not a priori contained in the data sets  Different incompatible semantic standards make merging towards a common ontology extremely difficult; this also impacts applicability of query and reasoning tools  Relationship consolidation and class hierarchy validation or adjustment may be required to make inferencing and reasoning meaningful  Lack of intuitive, science-driven tools makes researchers shy away from network data analysis approaches in general.  Scalability and performance issues confine most network approaches to relatively small datasets Fig.2: Mapping and managing nomenclature via Sentient Thesaurus Manager™(left); a common knowledgebase from genes, proteins, metabolites, tissue analytics and clinical disease codes in Sentient Knowledge Explorer™ Methodology  Use import mapping to consolidate relationships; separate numeric from non-numeric content during class generation to account for scaling via numerical properties,  Apply controlled vocabularies when importing or merging, using one or multiple thesauri and provide authoring capabilities for groups and representative terms for both, classes and relationships  Allow for proper handling of non well-formed RDF  Provide a facility to import from or directly connect to via URIs  Exercise a multi-functional query tool supporting graphical and SPARQL queries to reduce network complexity  Facilitate output of the resulting knowledgebase with or without its graphical representation to RDF, N3 or triple store backend Approach  For output from queries to relational databases (in tsv, csv, xml, fqml) or from devices or Fig.3: From web query on experimental observations (left) to a multi-source, multi-standard merged ontology and databases using PSI-MI, an interactive graphical/textual import mapper for relationships is network analysis using public reference resources: Drillout to NCBI, UniProt, IntAct, BioGrid, KEGG and HMDB applied; it also can be configured for automatic background mapping when deployed in conjunction with web-based query to multiple relational databases Conclusions  Use, author and apply one or multiple thesauri for consolidation of classes and relationships Using Sentient™ tools as data integration framework, we were able to merge multi-OMICs experimental  “Drag-to-knowledgebase” function with automatic classification of semantic data in RDF, N3, observations with publicly available resources in a variety of formats towards a common ontology OBO and OWL (local or remote file system) semantic knowledgebase and apply it to discovery and qualification of toxicity biomarkers. Specifically,  Permit direct URI input to web-based reference resources we were able to demonstrate the following:  Utilize entity browser with sorting (by subject, predicate and object) and paging for large datasets  Coherent merging of experimental and public data across standards, taxonomies and ontologies to navigate through instance properties into an integrated, semantically organized data and relationship network  Provide means to reduce network complexity: apply criteria for connection depth, numeric  Ability to focus on visualization and analysis of a sub-network of specific scientific interest (e.g. scaling and weighting and other conditions (such as at least, at most, exactly n connections) biomarker qualification across tissues, different treatments, different genetic responder profiles,  Employ a built-in query tool which supports graphical, textual and SPARQL queries to specify etc.) in a complex, large dataset conditions, then re-plot the resulting sub-networks to reveal hidden or unknown functional  Qualification of potential biomarker panels for toxicity across tissues and toxicants and gaining biological relationships and their entities in context. insights in complex biological functions involving multiple pathways Results Acknowledgements This poster demonstrates the applicability of a semantic data integration framework to meaningfully This work was made possible by insights and many discussions with members of IO Informatics’ industry merge experimental and public data for biomarker discovery and validation. Different ontologies across and academic centers of excellence members of the Working Group on “Semantic Applications for diverse standards have been merged into a common ontology knowledgebase. The presented network Translational Research”. The authors would like to acknowledge Ted Slater (Pfizer), Jonas Almeida (MD viewer, the Sentient Knowledge Explorer™, can comfortably handle one million assertions in memory Anderson Cancer Center), Pat Hurban (CLDA), Alan Higgins (Viamet Pharmaceuticals) and Bruce Mc. while running on a Pentium 4 with 2GB of RAM, and is able to directly query the knowledgebase. Manus and Mark Wilkinson (both from James Hogg iCapture Centre) for their various contributions.