SlideShare uma empresa Scribd logo
1 de 61
Turning literature into databases




 >10 km



           Lars Juhl Jensen
corpora
22M abstracts
1.9M freely available articles
1.9M Elsevier documents
entity recognition
identify the concepts
comprehensive lexicon
small molecules
proteins
cellular components
tissues
organisms
phenotypes
diseases
orthographic variation
singular vs. plural
flexible matching
spaces and hyphens
“black list”
information extraction
count co-mentioning
within documents
within paragraphs
within sentences
new scoring scheme
STRING v9.1
~2x better sensitivity
web-centric databases
suite of web interfaces
common backend database
diseases.jensenlab.org
search for a protein
ranked table of diseases
search for a disease
STRING network
evidence viewer
compartments.jensenlab.org
text mining
curated knowledge
sequence-based predictions
visualization
tissues.jensenlab.org
related projects
importance of full text
NIH grant abstracts
electronic patient records
patient stratification
Roque et al., PLoS Computational Biology, 2011
pharmacovigilance
Eriksson et al., in preparation, 2012
Thank you!

    Sune Frankild
    Janos Binder
    Kalliopi Tsafou
Peter Bjødstrup Jensen
   Robert Eriksson
Turning literature into databases

Mais conteúdo relacionado

Mais procurados

Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data Portal
Tom Loughran
 

Mais procurados (8)

A Brief History of Mitochondria: The Elegant Origins of a Magnificent Organelle
A Brief History of Mitochondria: The Elegant Origins of a Magnificent OrganelleA Brief History of Mitochondria: The Elegant Origins of a Magnificent Organelle
A Brief History of Mitochondria: The Elegant Origins of a Magnificent Organelle
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
Zhen Zeng (University of Helsinki): Patterns of genomic variations in multipl...
Zhen Zeng (University of Helsinki): Patterns of genomic variations in multipl...Zhen Zeng (University of Helsinki): Patterns of genomic variations in multipl...
Zhen Zeng (University of Helsinki): Patterns of genomic variations in multipl...
 
Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data Portal
 
Andriy Kovalchuk (University of Helsinki): Genomic and exon-capture system id...
Andriy Kovalchuk (University of Helsinki): Genomic and exon-capture system id...Andriy Kovalchuk (University of Helsinki): Genomic and exon-capture system id...
Andriy Kovalchuk (University of Helsinki): Genomic and exon-capture system id...
 
Resume_020717
Resume_020717Resume_020717
Resume_020717
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 

Destaque

Interaction networks - Prediction, data integration and text mining
Interaction networks - Prediction, data integration and text miningInteraction networks - Prediction, data integration and text mining
Interaction networks - Prediction, data integration and text mining
Lars Juhl Jensen
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
Lars Juhl Jensen
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
Lars Juhl Jensen
 

Destaque (9)

Interaction networks - Prediction, data integration and text mining
Interaction networks - Prediction, data integration and text miningInteraction networks - Prediction, data integration and text mining
Interaction networks - Prediction, data integration and text mining
 
Disease Systems Biology
Disease Systems BiologyDisease Systems Biology
Disease Systems Biology
 
Network biology
Network biologyNetwork biology
Network biology
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
Network biology
Network biologyNetwork biology
Network biology
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
 
Text-mining practical
Text-mining practicalText-mining practical
Text-mining practical
 

Semelhante a Turning literature into databases

Data integration and visualization
Data integration and visualizationData integration and visualization
Data integration and visualization
Lars Juhl Jensen
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
 
Turning big data and text collections into web resrouces
Turning big data and text collections into web resroucesTurning big data and text collections into web resrouces
Turning big data and text collections into web resrouces
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
Lars Juhl Jensen
 
Network biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text miningNetwork biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text mining
Lars Juhl Jensen
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 

Semelhante a Turning literature into databases (20)

Data integration and visualization
Data integration and visualizationData integration and visualization
Data integration and visualization
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
Disease systems biology
Disease systems biologyDisease systems biology
Disease systems biology
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Mining biomedical texts
Mining biomedical textsMining biomedical texts
Mining biomedical texts
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
STRING: Large-scale data and text mining
STRING: Large-scale data and text miningSTRING: Large-scale data and text mining
STRING: Large-scale data and text mining
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Turning big data and text collections into web resrouces
Turning big data and text collections into web resroucesTurning big data and text collections into web resrouces
Turning big data and text collections into web resrouces
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
The pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized dataThe pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized data
 
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized dataThe pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
 
Network biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text miningNetwork biology: Large-scale biomedical data and text mining
Network biology: Large-scale biomedical data and text mining
 
Cellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous dataCellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous data
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 

Mais de Lars Juhl Jensen

Mais de Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks