SlideShare uma empresa Scribd logo
1 de 115
Mining text and data on chemicals




           Lars Juhl Jensen
three parts
text mining
data integration
medical records
Part 1
text mining
exponential growth
some things are constant
~45 seconds per paper
information retrieval
find the relevant papers
still too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
identify the concepts
small molecules
proteins
diseases
comprehensive lexicon
synonyms
orthographic variation
“black list”
unfortunate names
Reflect
augmented browsing
browser add-on
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
            O’Donoghue et al., Journal of Web Semantics, 2010
Firefox
Internet Explorer
Google Chrome
Safari
Utopia Documents
web services
collaboration
SciVerse
information extraction
formalize the facts
co-mentioning
NLP
Natural Language Processing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction

[nxexpr The expression of
       [nxgene the cytochrome genes
           [nxpg CYC1 and CYC7]]]
   is controlled by
   [nxpg HAP1]
Part 2
data integration
STITCH
Kuhn et al., Nucleic Acids Research, 2012
~300,000 small molecules
~2.6 million proteins
1100+ genomes
experimental data
physical binding
chemical–protein
protein–protein
curated knowledge
drug targets
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
text mining
co-mentioning
NLP
Natural Language Processing
many data types
many databases
different formats
different identifiers
variable quality
not comparable
spread over many genomes
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
probabilistic scores
orthology transfer
combine the evidence
Part 3
patient records
a hard problem
in Danish
by busy doctors
about psychiatric patients
no lexicon
acronyms
typos
delusions
domain specific system
patient record excerpt
Negation

F20

F200
       Family
medication
adverse drug events
diagnoses
pharmacovigilance
patient stratification
Roque et al., PLoS Computational Biology, 2011
disease comorbidity
Roque et al., PLoS Computational Biology, 2011
DNA sequencing
genotype
phenotype
Acknowledgments

Reflect                    STITCH              EPJ-mining
Sune Frankild              Michael Kuhn        Francisco S Roque
Heiko Horn                 Damian Szklarczyk   Peter B Jensen
Evangelos Pafilis          Andrea              Robert Eriksson
Juan-Carlos Silla-Castro   Franceschini        Henriette Schmock
Michael Kuhn               Milan Simonovic     Marlene Dalgaard
Reinhardt Schneider        Alexander Roth      Massimo Andreatta
Sean O’Donoghue            Pablo Minguez       Thomas Hansen
                           Tobias Doerks       Karen Søeby
                           Manuel Stark        Søren Bredkjær
                           Christian von       Anders Juul
                           Mering              Thomas Werge
                           Peer Bork           Søren Brunak
larsjuhljensen

Mais conteúdo relacionado

Destaque

Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
The pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health recordsThe pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health records
Lars Juhl Jensen
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
Lars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 
Using side effects for drug target identification
Using side effects for drug target identificationUsing side effects for drug target identification
Using side effects for drug target identification
Lars Juhl Jensen
 
Network biology - Large-scale biomedical data and text mining
Network biology - Large-scale biomedical data and text miningNetwork biology - Large-scale biomedical data and text mining
Network biology - Large-scale biomedical data and text mining
Lars Juhl Jensen
 
Using side effects for drug target identification
Using side effects for drug target identificationUsing side effects for drug target identification
Using side effects for drug target identification
Lars Juhl Jensen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 

Destaque (13)

Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
The pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health recordsThe pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health records
 
Disease Systems Biology
Disease Systems BiologyDisease Systems Biology
Disease Systems Biology
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Network biology
Network biologyNetwork biology
Network biology
 
Using side effects for drug target identification
Using side effects for drug target identificationUsing side effects for drug target identification
Using side effects for drug target identification
 
Network biology - Large-scale biomedical data and text mining
Network biology - Large-scale biomedical data and text miningNetwork biology - Large-scale biomedical data and text mining
Network biology - Large-scale biomedical data and text mining
 
Data integration: The STITCH database of protein–small molecule interactions
Data integration: The STITCH database of protein–small molecule interactionsData integration: The STITCH database of protein–small molecule interactions
Data integration: The STITCH database of protein–small molecule interactions
 
Using side effects for drug target identification
Using side effects for drug target identificationUsing side effects for drug target identification
Using side effects for drug target identification
 
MobilActif - Comment intégrer les questions SMS au sein de votre événement ?
MobilActif - Comment intégrer les questions SMS au sein de votre événement ?MobilActif - Comment intégrer les questions SMS au sein de votre événement ?
MobilActif - Comment intégrer les questions SMS au sein de votre événement ?
 
MoWall by MobilActif - Animation Photo Interactive
MoWall by MobilActif - Animation Photo InteractiveMoWall by MobilActif - Animation Photo Interactive
MoWall by MobilActif - Animation Photo Interactive
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 

Semelhante a Mining text and data on chemicals

Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Network biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text miningNetwork biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text mining
Lars Juhl Jensen
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Network biology - Large-scale data integration and text mining
Network biology - Large-scale data integration and text miningNetwork biology - Large-scale data integration and text mining
Network biology - Large-scale data integration and text mining
Lars Juhl Jensen
 
Reflect and friends - Tools and resources for mining biomedical text
Reflect and friends - Tools and resources for mining biomedical textReflect and friends - Tools and resources for mining biomedical text
Reflect and friends - Tools and resources for mining biomedical text
Lars Juhl Jensen
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
Lars Juhl Jensen
 
Mining molecules from text and data
Mining molecules from text and dataMining molecules from text and data
Mining molecules from text and data
Lars Juhl Jensen
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
Lars Juhl Jensen
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
Lars Juhl Jensen
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 

Semelhante a Mining text and data on chemicals (20)

Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
Mining biomedical texts
Mining biomedical textsMining biomedical texts
Mining biomedical texts
 
Network biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text miningNetwork biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text mining
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
Disease Systems Biology
Disease Systems BiologyDisease Systems Biology
Disease Systems Biology
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Network biology - Large-scale data integration and text mining
Network biology - Large-scale data integration and text miningNetwork biology - Large-scale data integration and text mining
Network biology - Large-scale data integration and text mining
 
Reflect and friends - Tools and resources for mining biomedical text
Reflect and friends - Tools and resources for mining biomedical textReflect and friends - Tools and resources for mining biomedical text
Reflect and friends - Tools and resources for mining biomedical text
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Mining molecules from text and data
Mining molecules from text and dataMining molecules from text and data
Mining molecules from text and data
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Visualization of large-scale protein and disease networks
Visualization of large-scaleprotein and disease networksVisualization of large-scaleprotein and disease networks
Visualization of large-scale protein and disease networks
 
Large-scale biomedical data and text integration
Large-scale biomedical data and text integrationLarge-scale biomedical data and text integration
Large-scale biomedical data and text integration
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Mining heaps of data and piles of papers
Mining heaps of data and piles of papersMining heaps of data and piles of papers
Mining heaps of data and piles of papers
 

Mais de Lars Juhl Jensen

Mais de Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Mining text and data on chemicals