SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Integrated Data
Platform at Bayer
Turning bits into insights
Wolfgang Thielemann
Agenda
What platform did we built?
What does it look like?
Why did we build it?
Architecture and data enrichment
Challenges
Plans for the future
2 /// AI-SDV 2022 // Integrated Data Platform at Bayer
/// AI-SDV 2022 // Integrated Data Platform at Bayer
3
What Platform did we built?
1
/// AI-SDV 2022 // Integrated Data Platform at Bayer
4
Our platform semantically integrates Terabytes
of external scientific textual data to support
insight generation along the R&D value chain
/// AI-SDV 2022 // Integrated Data Platform at Bayer
5
Big data platform
This platform is…
• A semantically integrated and harmonized big data hub containing major external, text-
rich, and life-science related data sources
• Enriched with FAIR meta-data generated by extracting the key information (e.g., molecular
targets, medical conditions, active ingredients, technologies etc.) using NLP
• An analysis-ready platform for end-users (GUI access) and data scientists (API access)
/// AI-SDV 2022 // Integrated Data Platform at Bayer
6
Scientific
end users
Data scientists
Developers of
digital products
The users
/// AI-SDV 2022 // Integrated Data Platform at Bayer
7
The users
End-user GUIs
more power &
precision for
scientific search
Project leaders
R&D scientists
Tech scouts
& Co
Find relevant information
Alerts
Analysis
Filter & Review
Expert APIs
Provide structured
data for insight
generation
Data scientists
Computational scientists
Information professionals
Bioinformaticians
Generate insights
Find new targets & treatments
Support pipeline decisions
Build predictive models
/// AI-SDV 2022 // Integrated Data Platform at Bayer
8
What does it look like?
2
/// AI-SDV 2022 // Integrated Data Platform at Bayer
9
Example: Liver cancer
Google-like search interface
/// AI-SDV 2022 // Integrated Data Platform at Bayer
10
Example: Liver cancer
Interactive analysis and filtering
/// AI-SDV 2022 // Integrated Data Platform at Bayer
11
Example: Liver cancer
Result overview
/// AI-SDV 2022 // Integrated Data Platform at Bayer
12
Example: Liver cancer
Record view
/// AI-SDV 2022 // Integrated Data Platform at Bayer
13
Why did we build it?
3
/// AI-SDV 2022 // Integrated Data Platform at Bayer
14
Big Data Platform
6 Reasons why building it made and makes sense
Richness of data sources
Flexibility
Costs
Scalability
FAIR meta-data
Full transparency
and control
/// AI-SDV 2022 // Integrated Data Platform at Bayer
15
Scientific sources in our platform Platforms limited to publicly
available data
1. Bandwidth and richness of data sources
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
16
2. Maximum flexibility to analyze the data and to integrate it into our
Bayer data ecosystem
Existing platforms often come with limited/pre-defined analysis options and
limited integrability
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
17
Our platform is built on a scalable cloud infrastructure for big data analysis
and does allow you to analyze millions of records in one go.
Big Data Platform
Why did we build it?
3. Full scalability
/// AI-SDV 2022 // Integrated Data Platform at Bayer
18
4. Costs
This platform allowed us to save money and reduce complexity be replacing
various proprietary legacy platforms
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
19
5. One terminology across entire content and option to
adjust it to our needs
Individual sources / platforms typically have their own standards and
terminologies
One terminology
for entire platform
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
20
6. Comprehensiveness and quality of meta-data
Since we built on 20 years of thesauri and NLP algorithms optimized to
Bayer’s needs, our terminologies cover the real-life use of science much
better than established terminologies
Big Data Platform
Why did we build it?
MeSH:
/// AI-SDV 2022 // Integrated Data Platform at Bayer
21
6. Comprehensiveness and quality of meta-data
Proprietary disease thesaurus:
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
22
Architecture & Data enrichment
4
/// AI-SDV 2022 // Integrated Data Platform at Bayer
23
Conference Abstracts
Literature Abstracts
Literature Fulltexts
Patents
Patent Chemistry
Clinical Trials
Pipeline Information
Market reports
Company Websites Industry News
Research Grants
Tech Transfer Offers
D
A
T
A
Data Engineering: Normalization, Deduplication, Classification, etc
(Kafka Streams)
Index, Search, and API Services (Elastic)
Semantic Enrichment: Targets, Organisms, Sequences, Drugs,
Active Ingredients, Companies/Organizations, Analytics, etc
Automated Data Acquisition (Kafka Technology)
P
R
O
C
E
S
S
APIs & Data Science
Platform architecture
End User Products
D
E
L
I
V
E
R
Cross-search GUI
Advanced literature GUI
Advanced patent GUI
System/Application Integrations
Other proprietary
platforms and
workflows use this
platform as source
/// AI-SDV 2022 // Integrated Data Platform at Bayer
24
Resolve all flavours of heterogeneity to make textual data FAIR
Big Data Platform
Semantic data integration at large
Semantic data
integration
Structural heterogeneity
Same facts expressed in different
schemata
Missing / additional attributes
Technical heterogeneity
Data formats (JSON vs. XML),
communication protocols (REST vs.
ODBC), query languages (SQL vs.
SPARQL)
Data model heterogeneity
Relational vs. Semi-structured, Tuples
vs. Graphs,…
Syntactic heterogeneity
Different presentation of the same fact
(Unicode or ASCII, EUR or €,…)
Semantic heterogeneity
Same concepts are named differently
➢ Pulmonary carcinoma
➢ Neoplasm of the lung
➢ ….
Different concepts are named same
GSK
Lung cancer
/// AI-SDV 2022 // Integrated Data Platform at Bayer
26
Challenges
5
Heterogeneous
formats
/// AI-SDV 2022 // Integrated Data Platform at Bayer
27
Challenges: Data ingestion
Heterogeneous
update schedules
hourly
daily
weekly
monthly
/// AI-SDV 2022 // Integrated Data Platform at Bayer
28
Challenges: Data ingestion
Changes in record
structure
Changes in
volume over time
/// AI-SDV 2022 // Integrated Data Platform at Bayer
29
Challenges: Data ingestion
De-duplication
De-duplication
De-duplication
De-duplication
De-duplication
/// AI-SDV 2022 // Integrated Data Platform at Bayer
30
Challenges: Semantic enrichment
Lack of universially accepted identifier for an entity class
Human gene
NCBI Gene ID
Chemical compound
INN name
IUPAC
CAS-Nr
PubChem CID
Canonical smiles
Disease
MeSH ID
UMLS ID
Snomed ID
NCIT ID
Orphanet ID
Mondo ID
ICD-10 ID
MedDRA ID
DO ID
…..
/// AI-SDV 2022 // Integrated Data Platform at Bayer
31
Challenges: Semantic enrichment
Identification of different entities require different technologies:
➢Terminology based NLP (e.g., disease names)
➢ML based NLP (e.g., for ambiguous acronyms like cell lines, gene acronyms etc.)
➢Rule/pattern-based extraction (e.g., IUPAC chemical names, gene mutations)
“A lamp-snp assay detecting c580y mutation in pfkelch13 gene from clinically dried blood spot samples”
➢Image/graph processing (e.g., image2mol)
C1=CC=C(C(=C1)CC(=O)[O-])NC2=C(C=CC=C2Cl)Cl.[Na+]
/// AI-SDV 2022 // Integrated Data Platform at Bayer
32
Status quo & Plans for the future
6
/// AI-SDV 2022 // Integrated Data Platform at Bayer
33
Are we now living in a fairytale where everything is perfect?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
34
Are we now living in a fairytale where everything is perfect?
There is still a lot to do…
➢Terminology is constantly evolving (new companies, new technologies etc.)
➢Development of scalable algorithms for complex entities
➢Finding the most relevant information in the ocean of data
➢Advanced visualization and analytics
➢Further standardization
➢…..
/// AI-SDV 2022 // Integrated Data Platform at Bayer
35
What can you do to help us in our endevour?
Vendors / Publisher / Data base producers
• Data quality
• FAIRification
• Using generally available
standards & IDs
• Consistency
• Collecting scattered data
• Harmonization
/// AI-SDV 2022 // Integrated Data Platform at Bayer
36
SOURCES
e.g., drug labels,
guidelines
USABILITY
THESAURI
Automatization
e.g. alerting CHEMISTRY
ANALYSES features
Big Data Platform
Plans for the future
Thank you!
Special thanks to
my colleagues on
the team

Mais conteúdo relacionado

Semelhante a AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
DataWorks Summit
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
OSTHUS
 

Semelhante a AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany ) (20)

User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
Infosession for IQED dataproviders (14-22.04.2016)
Infosession for IQED dataproviders (14-22.04.2016)Infosession for IQED dataproviders (14-22.04.2016)
Infosession for IQED dataproviders (14-22.04.2016)
 
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
 
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food Workshop
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food WorkshopSC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food Workshop
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food Workshop
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
Bridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through TechnologyBridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through Technology
 
Leveraging Neo4j to Create a Sustainable Partnership-Based Metagenomic Supply...
Leveraging Neo4j to Create a Sustainable Partnership-Based Metagenomic Supply...Leveraging Neo4j to Create a Sustainable Partnership-Based Metagenomic Supply...
Leveraging Neo4j to Create a Sustainable Partnership-Based Metagenomic Supply...
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
Managing R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute InfrastructureManaging R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute Infrastructure
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 

Mais de Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Dr. Haxel Consult
 

Mais de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
 

Último

Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
nilamkumrai
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 

Último (20)

Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

  • 1. Integrated Data Platform at Bayer Turning bits into insights Wolfgang Thielemann
  • 2. Agenda What platform did we built? What does it look like? Why did we build it? Architecture and data enrichment Challenges Plans for the future 2 /// AI-SDV 2022 // Integrated Data Platform at Bayer
  • 3. /// AI-SDV 2022 // Integrated Data Platform at Bayer 3 What Platform did we built? 1
  • 4. /// AI-SDV 2022 // Integrated Data Platform at Bayer 4 Our platform semantically integrates Terabytes of external scientific textual data to support insight generation along the R&D value chain
  • 5. /// AI-SDV 2022 // Integrated Data Platform at Bayer 5 Big data platform This platform is… • A semantically integrated and harmonized big data hub containing major external, text- rich, and life-science related data sources • Enriched with FAIR meta-data generated by extracting the key information (e.g., molecular targets, medical conditions, active ingredients, technologies etc.) using NLP • An analysis-ready platform for end-users (GUI access) and data scientists (API access)
  • 6. /// AI-SDV 2022 // Integrated Data Platform at Bayer 6 Scientific end users Data scientists Developers of digital products The users
  • 7. /// AI-SDV 2022 // Integrated Data Platform at Bayer 7 The users End-user GUIs more power & precision for scientific search Project leaders R&D scientists Tech scouts & Co Find relevant information Alerts Analysis Filter & Review Expert APIs Provide structured data for insight generation Data scientists Computational scientists Information professionals Bioinformaticians Generate insights Find new targets & treatments Support pipeline decisions Build predictive models
  • 8. /// AI-SDV 2022 // Integrated Data Platform at Bayer 8 What does it look like? 2
  • 9. /// AI-SDV 2022 // Integrated Data Platform at Bayer 9 Example: Liver cancer Google-like search interface
  • 10. /// AI-SDV 2022 // Integrated Data Platform at Bayer 10 Example: Liver cancer Interactive analysis and filtering
  • 11. /// AI-SDV 2022 // Integrated Data Platform at Bayer 11 Example: Liver cancer Result overview
  • 12. /// AI-SDV 2022 // Integrated Data Platform at Bayer 12 Example: Liver cancer Record view
  • 13. /// AI-SDV 2022 // Integrated Data Platform at Bayer 13 Why did we build it? 3
  • 14. /// AI-SDV 2022 // Integrated Data Platform at Bayer 14 Big Data Platform 6 Reasons why building it made and makes sense Richness of data sources Flexibility Costs Scalability FAIR meta-data Full transparency and control
  • 15. /// AI-SDV 2022 // Integrated Data Platform at Bayer 15 Scientific sources in our platform Platforms limited to publicly available data 1. Bandwidth and richness of data sources Big Data Platform Why did we build it?
  • 16. /// AI-SDV 2022 // Integrated Data Platform at Bayer 16 2. Maximum flexibility to analyze the data and to integrate it into our Bayer data ecosystem Existing platforms often come with limited/pre-defined analysis options and limited integrability Big Data Platform Why did we build it?
  • 17. /// AI-SDV 2022 // Integrated Data Platform at Bayer 17 Our platform is built on a scalable cloud infrastructure for big data analysis and does allow you to analyze millions of records in one go. Big Data Platform Why did we build it? 3. Full scalability
  • 18. /// AI-SDV 2022 // Integrated Data Platform at Bayer 18 4. Costs This platform allowed us to save money and reduce complexity be replacing various proprietary legacy platforms Big Data Platform Why did we build it?
  • 19. /// AI-SDV 2022 // Integrated Data Platform at Bayer 19 5. One terminology across entire content and option to adjust it to our needs Individual sources / platforms typically have their own standards and terminologies One terminology for entire platform Big Data Platform Why did we build it?
  • 20. /// AI-SDV 2022 // Integrated Data Platform at Bayer 20 6. Comprehensiveness and quality of meta-data Since we built on 20 years of thesauri and NLP algorithms optimized to Bayer’s needs, our terminologies cover the real-life use of science much better than established terminologies Big Data Platform Why did we build it? MeSH:
  • 21. /// AI-SDV 2022 // Integrated Data Platform at Bayer 21 6. Comprehensiveness and quality of meta-data Proprietary disease thesaurus: Big Data Platform Why did we build it?
  • 22. /// AI-SDV 2022 // Integrated Data Platform at Bayer 22 Architecture & Data enrichment 4
  • 23. /// AI-SDV 2022 // Integrated Data Platform at Bayer 23 Conference Abstracts Literature Abstracts Literature Fulltexts Patents Patent Chemistry Clinical Trials Pipeline Information Market reports Company Websites Industry News Research Grants Tech Transfer Offers D A T A Data Engineering: Normalization, Deduplication, Classification, etc (Kafka Streams) Index, Search, and API Services (Elastic) Semantic Enrichment: Targets, Organisms, Sequences, Drugs, Active Ingredients, Companies/Organizations, Analytics, etc Automated Data Acquisition (Kafka Technology) P R O C E S S APIs & Data Science Platform architecture End User Products D E L I V E R Cross-search GUI Advanced literature GUI Advanced patent GUI System/Application Integrations Other proprietary platforms and workflows use this platform as source
  • 24. /// AI-SDV 2022 // Integrated Data Platform at Bayer 24 Resolve all flavours of heterogeneity to make textual data FAIR Big Data Platform Semantic data integration at large Semantic data integration Structural heterogeneity Same facts expressed in different schemata Missing / additional attributes Technical heterogeneity Data formats (JSON vs. XML), communication protocols (REST vs. ODBC), query languages (SQL vs. SPARQL) Data model heterogeneity Relational vs. Semi-structured, Tuples vs. Graphs,… Syntactic heterogeneity Different presentation of the same fact (Unicode or ASCII, EUR or €,…) Semantic heterogeneity Same concepts are named differently ➢ Pulmonary carcinoma ➢ Neoplasm of the lung ➢ …. Different concepts are named same GSK Lung cancer
  • 25. /// AI-SDV 2022 // Integrated Data Platform at Bayer 26 Challenges 5
  • 26. Heterogeneous formats /// AI-SDV 2022 // Integrated Data Platform at Bayer 27 Challenges: Data ingestion Heterogeneous update schedules hourly daily weekly monthly
  • 27. /// AI-SDV 2022 // Integrated Data Platform at Bayer 28 Challenges: Data ingestion Changes in record structure Changes in volume over time
  • 28. /// AI-SDV 2022 // Integrated Data Platform at Bayer 29 Challenges: Data ingestion De-duplication De-duplication De-duplication De-duplication De-duplication
  • 29. /// AI-SDV 2022 // Integrated Data Platform at Bayer 30 Challenges: Semantic enrichment Lack of universially accepted identifier for an entity class Human gene NCBI Gene ID Chemical compound INN name IUPAC CAS-Nr PubChem CID Canonical smiles Disease MeSH ID UMLS ID Snomed ID NCIT ID Orphanet ID Mondo ID ICD-10 ID MedDRA ID DO ID …..
  • 30. /// AI-SDV 2022 // Integrated Data Platform at Bayer 31 Challenges: Semantic enrichment Identification of different entities require different technologies: ➢Terminology based NLP (e.g., disease names) ➢ML based NLP (e.g., for ambiguous acronyms like cell lines, gene acronyms etc.) ➢Rule/pattern-based extraction (e.g., IUPAC chemical names, gene mutations) “A lamp-snp assay detecting c580y mutation in pfkelch13 gene from clinically dried blood spot samples” ➢Image/graph processing (e.g., image2mol) C1=CC=C(C(=C1)CC(=O)[O-])NC2=C(C=CC=C2Cl)Cl.[Na+]
  • 31. /// AI-SDV 2022 // Integrated Data Platform at Bayer 32 Status quo & Plans for the future 6
  • 32. /// AI-SDV 2022 // Integrated Data Platform at Bayer 33 Are we now living in a fairytale where everything is perfect?
  • 33. /// AI-SDV 2022 // Integrated Data Platform at Bayer 34 Are we now living in a fairytale where everything is perfect? There is still a lot to do… ➢Terminology is constantly evolving (new companies, new technologies etc.) ➢Development of scalable algorithms for complex entities ➢Finding the most relevant information in the ocean of data ➢Advanced visualization and analytics ➢Further standardization ➢…..
  • 34. /// AI-SDV 2022 // Integrated Data Platform at Bayer 35 What can you do to help us in our endevour? Vendors / Publisher / Data base producers • Data quality • FAIRification • Using generally available standards & IDs • Consistency • Collecting scattered data • Harmonization
  • 35. /// AI-SDV 2022 // Integrated Data Platform at Bayer 36 SOURCES e.g., drug labels, guidelines USABILITY THESAURI Automatization e.g. alerting CHEMISTRY ANALYSES features Big Data Platform Plans for the future
  • 36. Thank you! Special thanks to my colleagues on the team