SlideShare a Scribd company logo
1 of 32
Download to read offline
“From (Text) Mining to Models: Applying Large-Scale
Text Mining on Patents and Electronic Patient Records“
Martin Hofmann-Apitius
Head of the Department of Bioinformatics
Fraunhofer Institute for Algorithms and Scientific
Computing (SCAI)
page 2
Fraunhofer: Applied Research for Industrial Applications
Fraunhofer stands for:
• sustainable (applied) research
• focus on contract research and innovation
• bridging between excellent academic research and industrial
application
• clear mission towards improving and fostering innovation
• research done with the idea in mind to generate added value in a
commercial sense
page 3
SCAI Department of Bioinformatics: R&D in a nutshell
Fraunhofer SCAI Department of Bioinformatics R&D activities:
1. Information extraction in the life sciences:
I. Text Mining - Recognition of named entities & relationships in text
II. Image Mining - Reconstruction of chemical information from chemical
structure depictions
2. Disease modelling (focus on neurodegenerative diseases)
3. eScience, Grid-/Cloud- Computing and HPC (Cluster)
Seite 4
Productive Use of Large Compute Infrastructures
High Throughput Extraction of Scientific
Information from Full Text Sources:
The UIMA-HPC Project
Efficient Information Extraction
Workflows in many-core environments
18. April 2012 Folie 6
Scientific Challenge:
The knowledge in Chemistry, Biology and Pharmaceutical
Sciences growths with impressive speed. As a result, the number
of publications in these areas is reaching unparalleled dimensions.
However, knowledge is being communicated in non-standardised
ways.
Relevant knowledge sources are not well standardized, let
alone is the knowledge structured. This limits the ability to
query knowledge sources.
Problem-solving approach:
Development of technology that – based on HPC – allows for high
throughput extraction of structured information from unstructured
knowledge sources
Structured Knowledge
Vision
18. April 2012 Folie 7
Use case scenario:
automatic patent structuring
Document
collection
Splitting in
pages
Splitting in
objects
Analysis of the
objects
Text
processing
Structure
(chemoCR)
Table
analysis
Image
annotation
Structured
storage
18. April 2012 Folie 8
Previous study: The grand patent challenge 2009
• Is it computationally feasible?
• How do we use a protocol DB?
• What is the best job size?
• What‘s in there?
SCAI
• Storage Element: Input (200Gb) and Output
• License Server
• ProtocolDB
UNICORE
D-Grid
JUGGLE: Computing Element
18. April 2012 Folie 9
Technical Issues and Pitfalls
• User accession rights (files, scheduler, installed tools and
libs, ...)
• Firewall (ports: MySQL, denial of service attack, time outs,
...)
• Missing files (NFS down, package lost, not installed, ...)
• Too many requests on license server
• Too many connections in database
• Ressources (reservation, priorities, ...)
18. April 2012 Folie 10
UIMA AS in the context of HPC
Support of many-core architecture
• several instances of a service
• eff. usage of shared memory (JVM)
• asynchronous execution
Support of clusters
• several remote services (eg SOAP)
• communication via JMX and http
Control via pre-configured parameters
• CAS pool size
• casMultiplier poolSize
• … Manual
tuning
18. April 2012 Folie 11
Problem-Solving Approach
http://uima.apache.org/
http://www.unicore.eu/
HPC
Multi CoreThird parties
18. April 2012 Folie 12
System Architecture
UNICORE
layer
Interface
layer
UIMA
layer
18. April 2012 Folie 13
First Prototype: PDF Annotation of Patents
18. April 2012 Folie 14
Chemical
Index
Annotations +
Linkouts
Overview
Chemical
Popups
page 15
Performance of Fraunhofer SCAI in international
Benchmarking Competitions
BiTeM10
PAx
SCAI10C
IENTP
SCAI10C
ITENT
SCAI10C
ITNP
SCAI10C
ITTOK
SCAI10N
RMENT
SCAI10N
RMNP
SCAI10N
RMTOK
York10C
aPA01
map 0.2657 0.4121 0.2336 0.2065 0.0947 0.0665 0.0551 0.0172 0.0136
ndcg 0.4975 0.5834 0.4119 0.3764 0.1888 0.2547 0.2224 0.0868 0.0885
p30 0.3485 0.4554 0.2794 0.2485 0.1126 0.1169 0.1 0.0366 0.0309
r100 0.4724 0.5491 0.3596 0.3265 0.1511 0.1974 0.1743 0.0629 0.0583
mrr 0.6121 0.7153 0.5324 0.4769 0.1956 0.3456 0.3088 0.1133 0.1022
bpref 0.6592 0.7075 0.5468 0.511 0.2804 0.4171 0.3702 0.1536 0.1681
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TRECCHEM 2009 better not talk about it ...
TRECCHEM 2010 winner of the Prior Art Search Task
TRECCHEM 2011 winner of the Technology Survey Task
I2B2 challenge 2010 rank 3 (out of 22) in the concept id task
TRECMED 2011 rank 4 (amongst 24 participants)
Seite 16
Direct Usage of Unstructured Information
Sources for Disease Modelling
From Medline Mining
to
Modelling Neurodegenerative Diseases
Seite 17
Why Modelling of Neurodegeneration?
In 2009 the Federal Government of Germany decided to start a new research centre that
focuses on translational research on neurodegenerative diseases. In fact,
neurodegenerative diseases (Alzheimer, Parkinson, Multiple Sclerosis; Epilepsy;
„rare“ NDDs)
The total costs of Alzheimer is estimated to exceed 20 trillion US$ in the US in the years
between 2020 - 2050. (source: Alzheimer.org). Current costs / year in the US
(according to Alzheimer.org): 183 billion US$
The incidence rate of Alzheimer and other dementias is almost 50% in the population
older than 85 years. Next generation will regularly have a life span of >100 years.
-29 %
-13 % -11 %
-20 %
+66 %
Changes in selected causes of death in USA , 2000-20101
1 www.alz.org
Diseases specific mortality rate
Seite 19
The Starting Conditions
What we have:
- An ontology capturing relevant knowledge on Alzheimer´s Disease (ADO)
- An ontology representing and integrating brain regions and cell types
(BRCO)
- A method for the automated identification of hypotheses in text based on
regular expressions
- An excellent machinery for biomedical text mining (ProMiner) with top
performing gene and protein name recognition
Seite 20
Alzheimer’s ontology:
Captures more than 700
classes/concepts
BFO already implemented
Alzheimer´s Disease Ontology (ADO)
Seite 21
Brain Region and Cell-type Ontology (BRCO)
Current state: more than 3000 concepts; more than 5000 synonyms
Seite 22
Expression of Speculative Statements in Scientific Text
Seite 23
Hypotheses finder∩ AD ontology ∩Human genes and
proteins
Seite 24
Performance of Hypotheses finder
S.No Data type Source Precision Recall F score
1 200 abstracts related
to Alzheimer’s
PubMed 0.84 0.86 0.85
2 2 full text articles
related to Alzheimer’s
Journal of Medical
Hypotheses
0.85 0.88 0.86
3 143 abstracts related
to Alzheimer’s
Alzswan/PubMed 0.90 0.97 0.93
4 100 abstracts related
to Epilepsy
PubMed 0.96 0.91 0.94
5 100 abstracts related
to Parkinson’s
PubMed 0.90 0.93 0.92
Seite 25
Performance of Hypotheses finder
Seite 26
Knowledge retrieval
and extraction
Genes and proteins associated
to hypotheses
Hypothetical sentences on
disease mechanism
Molecular representation of disease
hypotheses
Alzheimer ontology Hypotheses finder
The Knowledge – Discovery Strategy
Seite 27
Analysis of hypotheses patterns across
disease stages
Seite 28
Current Work at SCAI: NDD Pharmacome
NDD Pharmacome Network:
Total nodes: 2,153
Total edges: 11,630
Drugs
• Targets
Seite 29
Using the Pharmacome for VS
1.Galantamine
2.Memantine
3.Rivastigmine
4.Tacrine
5.Donepezil
Drugs having structural similarity
and targets having binding site
similarities are connected by
edges.
Seite 30
What is Competition doing? Competitive
intelligence and strategic partnering
Phase I
Phase II
Phase III
Phase IV
Drugs in various phases
Seite 31
Brain Drug Portfolios in the Pharmacome
Seite 32
Summary
Fraunhofer SCAI Department of Bioinformatics stands for:
• Advanced technologies in text- and data mining, disease modelling in the
area of neurodegeneration and high performance computing
• We are using our competence in HPC to enable large-scale information
extraction from full text documents (focus on patents)
• Internal usage of technologies (information extraction; distributed and high
performance computing) for biomedical application: modelling of
neurodegenerative diseases
• Information integration and aggregation in models: the brain pharmacome

More Related Content

What's hot

ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processingAnubhav Jain
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATIONMAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATIONTigerGraph
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMCarole Goble
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN Australia
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習Ichigaku Takigawa
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 

What's hot (20)

ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processing
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATIONMAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 

Viewers also liked

II-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationII-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationDr. Haxel Consult
 
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic AnalysisII-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic AnalysisDr. Haxel Consult
 
II-SDV 2012 Actionable Intelligence for the Whole Enterprise
II-SDV 2012 Actionable Intelligence for the Whole EnterpriseII-SDV 2012 Actionable Intelligence for the Whole Enterprise
II-SDV 2012 Actionable Intelligence for the Whole EnterpriseDr. Haxel Consult
 
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...Dr. Haxel Consult
 
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...Ly Nguyen
 
II-SDV 2012 Merging Information from Structured and Unstructured Information ...
II-SDV 2012 Merging Information from Structured and Unstructured Information ...II-SDV 2012 Merging Information from Structured and Unstructured Information ...
II-SDV 2012 Merging Information from Structured and Unstructured Information ...Dr. Haxel Consult
 

Viewers also liked (7)

II-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationII-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data Exploration
 
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic AnalysisII-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
 
II-SDV 2012 Actionable Intelligence for the Whole Enterprise
II-SDV 2012 Actionable Intelligence for the Whole EnterpriseII-SDV 2012 Actionable Intelligence for the Whole Enterprise
II-SDV 2012 Actionable Intelligence for the Whole Enterprise
 
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...
I-SDV 2012 What Information Tools can you Imagine for the Future? - Conferenc...
 
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...
Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Rel...
 
II-SDV 2012 Merging Information from Structured and Unstructured Information ...
II-SDV 2012 Merging Information from Structured and Unstructured Information ...II-SDV 2012 Merging Information from Structured and Unstructured Information ...
II-SDV 2012 Merging Information from Structured and Unstructured Information ...
 
International Patent Classification IPC for Performing Patent Infringement Se...
International Patent Classification IPC for Performing Patent Infringement Se...International Patent Classification IPC for Performing Patent Infringement Se...
International Patent Classification IPC for Performing Patent Infringement Se...
 

Similar to From Text Mining to Models: Applying Large-Scale Text Mining on Patents and Electronic Patient Records

Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsProf. Wim Van Criekinge
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012Elif Ceylan
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceAlbert Yefimov
 
download
downloaddownload
downloadbutest
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Bill Liu
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19Andrew Zhang
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...DataScienceConferenc1
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data DriverLarry Smarr
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 

Similar to From Text Mining to Models: Applying Large-Scale Text Mining on Patents and Electronic Patient Records (20)

Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conference
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
download
downloaddownload
download
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
Role of computers
Role of computersRole of computers
Role of computers
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 

Recently uploaded (20)

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 

From Text Mining to Models: Applying Large-Scale Text Mining on Patents and Electronic Patient Records

  • 1. “From (Text) Mining to Models: Applying Large-Scale Text Mining on Patents and Electronic Patient Records“ Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI)
  • 2. page 2 Fraunhofer: Applied Research for Industrial Applications Fraunhofer stands for: • sustainable (applied) research • focus on contract research and innovation • bridging between excellent academic research and industrial application • clear mission towards improving and fostering innovation • research done with the idea in mind to generate added value in a commercial sense
  • 3. page 3 SCAI Department of Bioinformatics: R&D in a nutshell Fraunhofer SCAI Department of Bioinformatics R&D activities: 1. Information extraction in the life sciences: I. Text Mining - Recognition of named entities & relationships in text II. Image Mining - Reconstruction of chemical information from chemical structure depictions 2. Disease modelling (focus on neurodegenerative diseases) 3. eScience, Grid-/Cloud- Computing and HPC (Cluster)
  • 4. Seite 4 Productive Use of Large Compute Infrastructures High Throughput Extraction of Scientific Information from Full Text Sources: The UIMA-HPC Project
  • 5. Efficient Information Extraction Workflows in many-core environments
  • 6. 18. April 2012 Folie 6 Scientific Challenge: The knowledge in Chemistry, Biology and Pharmaceutical Sciences growths with impressive speed. As a result, the number of publications in these areas is reaching unparalleled dimensions. However, knowledge is being communicated in non-standardised ways. Relevant knowledge sources are not well standardized, let alone is the knowledge structured. This limits the ability to query knowledge sources. Problem-solving approach: Development of technology that – based on HPC – allows for high throughput extraction of structured information from unstructured knowledge sources Structured Knowledge Vision
  • 7. 18. April 2012 Folie 7 Use case scenario: automatic patent structuring Document collection Splitting in pages Splitting in objects Analysis of the objects Text processing Structure (chemoCR) Table analysis Image annotation Structured storage
  • 8. 18. April 2012 Folie 8 Previous study: The grand patent challenge 2009 • Is it computationally feasible? • How do we use a protocol DB? • What is the best job size? • What‘s in there? SCAI • Storage Element: Input (200Gb) and Output • License Server • ProtocolDB UNICORE D-Grid JUGGLE: Computing Element
  • 9. 18. April 2012 Folie 9 Technical Issues and Pitfalls • User accession rights (files, scheduler, installed tools and libs, ...) • Firewall (ports: MySQL, denial of service attack, time outs, ...) • Missing files (NFS down, package lost, not installed, ...) • Too many requests on license server • Too many connections in database • Ressources (reservation, priorities, ...)
  • 10. 18. April 2012 Folie 10 UIMA AS in the context of HPC Support of many-core architecture • several instances of a service • eff. usage of shared memory (JVM) • asynchronous execution Support of clusters • several remote services (eg SOAP) • communication via JMX and http Control via pre-configured parameters • CAS pool size • casMultiplier poolSize • … Manual tuning
  • 11. 18. April 2012 Folie 11 Problem-Solving Approach http://uima.apache.org/ http://www.unicore.eu/ HPC Multi CoreThird parties
  • 12. 18. April 2012 Folie 12 System Architecture UNICORE layer Interface layer UIMA layer
  • 13. 18. April 2012 Folie 13 First Prototype: PDF Annotation of Patents
  • 14. 18. April 2012 Folie 14 Chemical Index Annotations + Linkouts Overview Chemical Popups
  • 15. page 15 Performance of Fraunhofer SCAI in international Benchmarking Competitions BiTeM10 PAx SCAI10C IENTP SCAI10C ITENT SCAI10C ITNP SCAI10C ITTOK SCAI10N RMENT SCAI10N RMNP SCAI10N RMTOK York10C aPA01 map 0.2657 0.4121 0.2336 0.2065 0.0947 0.0665 0.0551 0.0172 0.0136 ndcg 0.4975 0.5834 0.4119 0.3764 0.1888 0.2547 0.2224 0.0868 0.0885 p30 0.3485 0.4554 0.2794 0.2485 0.1126 0.1169 0.1 0.0366 0.0309 r100 0.4724 0.5491 0.3596 0.3265 0.1511 0.1974 0.1743 0.0629 0.0583 mrr 0.6121 0.7153 0.5324 0.4769 0.1956 0.3456 0.3088 0.1133 0.1022 bpref 0.6592 0.7075 0.5468 0.511 0.2804 0.4171 0.3702 0.1536 0.1681 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 TRECCHEM 2009 better not talk about it ... TRECCHEM 2010 winner of the Prior Art Search Task TRECCHEM 2011 winner of the Technology Survey Task I2B2 challenge 2010 rank 3 (out of 22) in the concept id task TRECMED 2011 rank 4 (amongst 24 participants)
  • 16. Seite 16 Direct Usage of Unstructured Information Sources for Disease Modelling From Medline Mining to Modelling Neurodegenerative Diseases
  • 17. Seite 17 Why Modelling of Neurodegeneration? In 2009 the Federal Government of Germany decided to start a new research centre that focuses on translational research on neurodegenerative diseases. In fact, neurodegenerative diseases (Alzheimer, Parkinson, Multiple Sclerosis; Epilepsy; „rare“ NDDs) The total costs of Alzheimer is estimated to exceed 20 trillion US$ in the US in the years between 2020 - 2050. (source: Alzheimer.org). Current costs / year in the US (according to Alzheimer.org): 183 billion US$ The incidence rate of Alzheimer and other dementias is almost 50% in the population older than 85 years. Next generation will regularly have a life span of >100 years.
  • 18. -29 % -13 % -11 % -20 % +66 % Changes in selected causes of death in USA , 2000-20101 1 www.alz.org Diseases specific mortality rate
  • 19. Seite 19 The Starting Conditions What we have: - An ontology capturing relevant knowledge on Alzheimer´s Disease (ADO) - An ontology representing and integrating brain regions and cell types (BRCO) - A method for the automated identification of hypotheses in text based on regular expressions - An excellent machinery for biomedical text mining (ProMiner) with top performing gene and protein name recognition
  • 20. Seite 20 Alzheimer’s ontology: Captures more than 700 classes/concepts BFO already implemented Alzheimer´s Disease Ontology (ADO)
  • 21. Seite 21 Brain Region and Cell-type Ontology (BRCO) Current state: more than 3000 concepts; more than 5000 synonyms
  • 22. Seite 22 Expression of Speculative Statements in Scientific Text
  • 23. Seite 23 Hypotheses finder∩ AD ontology ∩Human genes and proteins
  • 24. Seite 24 Performance of Hypotheses finder S.No Data type Source Precision Recall F score 1 200 abstracts related to Alzheimer’s PubMed 0.84 0.86 0.85 2 2 full text articles related to Alzheimer’s Journal of Medical Hypotheses 0.85 0.88 0.86 3 143 abstracts related to Alzheimer’s Alzswan/PubMed 0.90 0.97 0.93 4 100 abstracts related to Epilepsy PubMed 0.96 0.91 0.94 5 100 abstracts related to Parkinson’s PubMed 0.90 0.93 0.92
  • 25. Seite 25 Performance of Hypotheses finder
  • 26. Seite 26 Knowledge retrieval and extraction Genes and proteins associated to hypotheses Hypothetical sentences on disease mechanism Molecular representation of disease hypotheses Alzheimer ontology Hypotheses finder The Knowledge – Discovery Strategy
  • 27. Seite 27 Analysis of hypotheses patterns across disease stages
  • 28. Seite 28 Current Work at SCAI: NDD Pharmacome NDD Pharmacome Network: Total nodes: 2,153 Total edges: 11,630 Drugs • Targets
  • 29. Seite 29 Using the Pharmacome for VS 1.Galantamine 2.Memantine 3.Rivastigmine 4.Tacrine 5.Donepezil Drugs having structural similarity and targets having binding site similarities are connected by edges.
  • 30. Seite 30 What is Competition doing? Competitive intelligence and strategic partnering Phase I Phase II Phase III Phase IV Drugs in various phases
  • 31. Seite 31 Brain Drug Portfolios in the Pharmacome
  • 32. Seite 32 Summary Fraunhofer SCAI Department of Bioinformatics stands for: • Advanced technologies in text- and data mining, disease modelling in the area of neurodegeneration and high performance computing • We are using our competence in HPC to enable large-scale information extraction from full text documents (focus on patents) • Internal usage of technologies (information extraction; distributed and high performance computing) for biomedical application: modelling of neurodegenerative diseases • Information integration and aggregation in models: the brain pharmacome