Judging the Relevance and worth of ideas part 2.pptx
Semantic (Web) Technologies for Translational Research in Life Sciences
1. Semantic (Web) Technologies for Translational Research in Life Sciences Ohio State University, June 16, 2011 Amit P. Sheth Ohio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis) amit.sheth@wright.edu Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith); Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford, CITAR/WSU
3. Web ofpeople - social networks, user-createdcasualcontent Web of resources - data, service, data, mashups Web of databases - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation Evolutionof Web & Semantic Computing Tech assimilated in life Web ofSensors, Devices/IoT - 40 billionsensors, 5 billionmobile connections 2007 Situations, Events Web 3.0 Semantic TechnologyUsed Objects Web 2.0 Patterns Keywords 1997 Web 1.0
4. Outline Semantic Web – very brief intro Scenarios to demonstrate the applications and benefit of semantic web technologies HealthCare BiomedicalResearch Translational
5. Biomedical Informatics... Biomedical Informatics Pubmed Clinical Trials.gov ...needs a connection Hypothesis Validation Experiment design Predictions Personalized medicine Semantic Web research aims at providing this connection! Etiology Pathogenesis Clinical findings Diagnosis Prognosis Treatment Genome Transcriptome Proteome Metabolome Physiome ...ome More advanced capabilities for search, integration, analysis, linking to new insights and discoveries! Genbank Uniprot Medical Informatics Bioinformatics
6. Decision Making, Insights, InnovationsHuman Performance Data and Facts Knowledge and Understanding Health & Performance Cognitive Science, Psychology Neuroscience Anatomy, Physiology Cellular biology Molecular Biology ACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCG Biochemistry
7. Semantic Web standards @ W3C Semantic Web is built in a layered manner Not everybody needs all the layers … Queries: SPARQL, Rules: RIF Semantic Web Rich ontologies: OWL Simple data models & taxonomies: RDF Schema Uniformmetamodel: RDF+ URI Encoding structure: XML Encoding characters : Unicode
8. Linked Data: Semantic Web “diluted” Achieve for data what Web did to documents Relationship with the original Semantic Web vision: no AI, no agents, no autonomy Interoperability is still very important interoperability of formats interoperability of semantics Enables interchange of large data sets (thus very useful in, say, collaborative research) Semantic Web vision is largely predicated on the availability of data Linked Data is a movement that gets us there Thanks – OraLassila
9. Opportunity: exploiting clinical and biomedical data text Health Information Services Elsevier iConsult Scientific Literature PubMed 300 Documents Published Online each day User-contributed Content (Informal) GeneRifs WikiGene NCBI Public Datasets Genome, Protein DBs new sequences daily Laboratory Data Lab tests, RTPCR, Mass spec Clinical Data Personal health history Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
10. Major Community Efforts W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/ Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
11. Major SW Projects OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/ LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/ NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
12. Semantic Web Enablers and Techniques Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic Semantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
14. N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 GNT-Vattaches GlcNAc at position 6 N-acetyl-glucosaminyl_transferase_V UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-Glycosylation metabolic pathway GNT-Iattaches GlcNAc at position 2
15. Maturing capabilites and ongoing research Ontology Creation SemanticAnnotation & Textmining: Entity recognition, Relationship extraction SemanticIntegration & Provenance: Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimedia Semantic search, browsing, analysis Clinical and Scientific Workflows with semantic web services SemanticExplorationofscientific literature, Undiscovered publicknowledge
16. Project 1: ASEMR Why:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice. What: Use of semantic Web technologies for clinical decision support Where: Athens Heart Center & its partners and labs Status: In usecontinuously since 01/2006
18. Active Semantic EMR Annotate ICD9s Annotate Doctors Lexical Annotation Insurance Formulary Level 3 Drug Interaction Drug Allergy Demo at: http://knoesis.org/library/demos/
19. Project 2: Glycomics Why:To help in the treatment of certain kinds of cancer and Parkinson's Disease. What: Semantic Annotation of Experiment Data Where:Complex Carbohydrate Research Center, UGA Status: Research prototype in use Workflow with Semantic Annotation of Experimental Data already in use
20. N-Glycosylation Process (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification Peptide list N-dimensional array Data correlation Signal integration
21. Agent Agent Agent Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/Sequest) Results Post-process (ProValt) O I O I O I O I O Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Biological Information Scientific workflow for proteome analysis Semantic Annotation Applications
22. Semantic Annotation of Experimental Data parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data
23. Semantic Annotation of Experimental Data <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/> <fragment_ionm-z=“580.2985” abundance=“0.3592”/> <fragment_ionm-z=“688.3214” abundance=“0.2526”/> <fragment_ionm-z=“779.4759” abundance=“38.4939”/> <fragment_ionm-z=“784.3607” abundance=“21.7736”/> <fragment_ionm-z=“1543.7476” abundance=“1.3822”/> <fragment_ionm-z=“1544.7595” abundance=“2.9977”/> <fragment_ionm-z=“1562.8113” abundance=“37.4790”/> <fragment_ionm-z=“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data
24. Project 3: Why: To associate genotype and phenotype information for knowledge discovery What:integrated data sources to run complex queries Enriching data with ontologies for integration, querying, and automation Ontologies beyond vocabularies: the power of relationships Where: NCRR (NIH) Status:Completed
25. Use data to test hypothesis Gene name GO Interactions gene Sequence PubMed OMIM Link between glycosyltransferase activity and congenital muscular dystrophy? Glycosyltransferase Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
26. In a Web pages world… (GeneID: 9215) has_associated_disease Congenital muscular dystrophy,type 1D has_molecular_function Acetylglucosaminyl-transferase activity Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
27. With the semantically enhanced data glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl-transferase GO:0008375 has_molecular_function acetylglucosaminyl-transferase GO:0008375 EG:9215 LARGE Muscular dystrophy, congenital, type 1D MIM:608840 has_associated_phenotype SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . ?g has molecular function ?t . ?g has_associated_phenotype ?b2 . ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) } From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
28. Project 4: Nicotine Dependence Why: For understanding the genetic basis of nicotine dependence. What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources). Where: NLM (NIH) Status: Completed research
43. Project 5: T. cruzi SPSE Why: For Integrative Parasite Research to help expedite knowledge discovery What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosomacruzi Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA Who: Kno.e.sis, UGA, NCBO (Stanford) Status: Research prototype – in regular lab use
44.
45.
46. Provenance in Parasite Research Gene Name Sequence Extraction Gene Knockout and Strain Creation* Related Queries from Biologists List all groups in the lab that used a Target Region Plasmid? Which researcher created a new strain of the parasite (with ID = 66)? An experiment was not successful – has this experiment been conducted earlier? What were the results? 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
49. SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:
50. Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
51.
52. Focused KB Work Flow (Use case: HPCO) HPC keywords Doozer: Base Hierarchy from Wikipedia Focused Pattern based extraction SenseLab Neuroscience Ontologies Initial KB Creation Meta Knowledgebase PubMed Abstracts Knoesis: Parsing based NLP Triples Enrich Knowledge Base NLM: Rule based BKR Triples Final Knowledge Base
53. Triple Extraction Approaches Open Extraction No fixed number of predetermined entities and predicates At Knoesis – NLP (parsing and dependency trees) Supervised Extraction Predetermined set of entities and predicates At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques At NLM – NLP and rule based approaches
54. Mapping Triples to Base Hierarchy Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB Preliminary synonyms based on anchor labels and page redirects in Wikipedia Prolactostatin redirects to Dopamine Predicates (verbs) and entities are subjected to stemming using Wordnet
58. New Knowledge/hypothesis Example Three triples from different abstracts VIP Peptide – increases – Catecholamine Biosynthesis Catecholamines – induce – β-adrenergic receptor activity β-adrenergic receptors – are involved – fear conditioning New implicit knowledge VIP Peptide – affects – fear conditioning Caveat: Each triple above was observed in a different organism (cows, mice, humans), but still interesting hypothesis. Scooner’s contextual browsing makes this clear to the user.
59. Project 7: Drug Abuse Why: To study social trends in pharmaceutical opioid abuse What: Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of OxyContin® Describe temporal patterns of non-medical use of OxyContin® tablets as discussed on Web-based forums Where: CITAR (Center for Interventions, Treatment and Addictions Research) at Wright State Univ. Status: In-progress (Recently funded from NIDA)
60.
61. Project 8: NMR Why: Streamline the NMR data processing tasks. Processing NMR experimental data is complex and time consuming. What: Providing biologists with tools to effectively process and manage Nuclear Magnetic Resonance (NMR) experimental data. How: Use Domain Specific Languages (DSL) to create scientist-friendly abstractions for complex statistical workflows. Use semantics based techniques to store and manage data. Where: Air Force Research Lab Status: In progress
62.
63. A complex NMR spectrum, marked with chemical compound identifiers by human observers.
64.
65. Use a DSL to provide abstractions for the operators (named SCALE)
66.
67. Future Interoperability Challenge:360 degree health Insurance, Financial Aspects Clinical Care Follow up, Lifestyle Genetic Tests… Profiles Clinical Trials Social Media
68. For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these! Possibly largest growth in data will be in sensors (eg Body Area Networks, Biosensors) and social content. Extensive use of mobile phones. Credit: ece.virginia.edu
69. Summary Semantic Web is an “interoperability technology” Semantic Web provides the needed interoperability, and can accommodate all necessary “points of view” Linked Data as a way of sharing data is highly promising Many examples of viable usage of Semantic Web technologies Words of warning about deployment Significant research challenges remain as Health presents the most complex domain
70. Representative References A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006. SatyaSahoo, Olivier Bodenreider, Kelly Zeng, and AmitSheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype InformationWWW2007 HCLS Workshop, May 2007. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and AmitSheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4 Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008. CarticRamakrishnan, Krzysztof J. Kochut, and AmitSheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596 Satya S. Sahoo, Christopher Thomas, AmitSheth, William S. York, and SamirTartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006. Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and KrishnaprasadThirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010. Papers: http://knoesis.org/library Demos at: http://knoesis.wright.edu/library/demos/
Notas do Editor
Cognitive model, cognitive behavioral model
In parasite research, create new strains of a parasite by knocking out specific genes. So, given a cloned sample, we may need to know the gene(s) that was knocked out.Both these scenarios are real world examples of the importance of provenance. There are many research issues in provenance management. This presentation is on addressing 1) the provenance modeling issue. Specifically, provenance interoperability, consistent modeling, and reduction of terminological heterogeneity. (2) Provenance Query