SlideShare a Scribd company logo
1 of 45
Download to read offline
Towards a framework for automating the
Data Scientist – application to life science
and bio data
Radouane Oudrhiri, Chief Data Scientist
Monday 27th February 2017
radouane.oudrhiri@eaglegenomics.com
27 February 2018
Cognitive & AI Data Infrastructure Meetup
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• Data linkage & associated models
• Value of data and information
• The (Machine) Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
About Eagle Genomics
Based in Cambridge, UK since 2008, on the Wellcome Genome Campus
Smart data management for Life Sciences - software & services
• Human & animal health
• Personal care and cosmeceuticals
• Food and nutriceuticals
Delivering the innovation platform for the genomics era:
e[automateddatascientist]
• to increase the success rate of innovation
• to enable data driven decisions
• to enable customers to become insight driven
The Eagle Genomics journey:
from services, to solutions, to platform
Partnerships & Collaborations
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• data linkage & associated models
• Value of data and information
• The (Machine) Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
BioPharmaceutical is evolving in pockets
Driven by precision medicine and high throughput technologies
Data-driven innovation is a must
• Must be designed, aligned with strategy and continuously
adapted
• Requires a deep cultural change to liberate the business
opportunity
Data-intensive systems and processes are the business!
• this goes way beyond digitisation
• data is the currency
• The technical challenges of data-intensive systems are
stretching classical system engineering approaches
Urgent need for comprehensive strategy to manage data assets!
“Software Data is eating the world” *
(*) ANDREESSEN M., “Software Is Eating the World”, The Wall Street Journal Essay, August 20, 2011.
http://online.wsj.com/article/SB10001424053111903480904576512250915629460.html
Current Process is Entangled, Human Intensive and Inefficient
The fundamental requirements for data-driven innovation
The bottleneck to data-driven innovation and data governance
AI
Machine
Learning
Modelling
Focus of the industry for AI + ML applications
necessary but not sufficient
effort
value
Accelerating data driven innovation
distributed
increased
Eagle Genomics focus - across the entire bridge
Manual biocuration and data tagging is complex, unstructured and time consuming
example of microbiome + clinical data
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• Data linkage & associated models
• Value of data and information
• The (Machine) Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
How biocurators, scientists and subject matter experts work
Solving the crucial data linkage problem:
Questions, Knowledge, and Experimental Process Models
How the data was
collected?
Process Modelling Knowledge Modelling
What does the
data represent?
Questions Modelling and mapping
Why the data was
generated?
Semantic enrichment
Data sources
Map to process
graph
Map to entities
of interest
Valuation
hierarchies
eaglediscover
eaglecurate
Why the data will be
generated?
Value of data and information - the missing link
Data Information
Value
Insight
E[VCp1 p2 …n|e] = E[v|CVp1 2 pne] – E[VS1S2…Sn|e]
Howard, R.A. (1966). Information value theory. IEEE Transactions on Systems Science and
Cybernetics (SSC-2), 22-26.
• Semantical mapping processes, from one level
of abstraction to another level, surrounded by
ambiguity and uncertainty
• According to the Information Theory,
Information is defined as “reduction of
uncertainty”
• The Value of Information is the price that one would pay
a “Clairvoyant” for additional information to reduce
risks and uncertainties at each stage of a study, so as to
increase profit
Value of data and information
Two factors determine the value of information:
1. whether the information is new to you;
2. whether the information causes you to change your decisions
Consequences:
The value of information is a subjective but quantitative utility that is
realised at decision time
The value of data/information is defined by its use and/or intended use
Data valuation is a conversational process among multiple stakeholders
Scientists Bioinformaticians
Data scientists
Marketing
Business
leaders
Business
leaders
Measure concordance/discordance among
stakeholders
Use the metrics as a means to reduce
ambiguity and reach consensus
The value of data/information is multidimensional
Data Valuation is a prioritisation process
Seeking the Pareto effect
Data quality versus data value
V
a
l
u
e
H
i
g
H
Missed
opportunities
Ideal situation
L
o
w
???
over-
engineered
Low High
Quality
Data value and data
quality are correlated
and follow a Pareto
distribution
Most organizations,
curate what is easy
rather than what is
necessary
7 laws of data asset management
Information Is
(Infinitely)
Shareable
1
The Value of
information Increases
with use
2
The Value of information
increases when combined
with other information
5
More is not
necessarily better
Smart not
necessarily Big
6
Information is
not depletable Information is
self-generating,
the more you use
it, the more you
have
7
4
QUALITY
The Value of Information
increases with Quality
Moody D., Walsh P., (1999), Measuring The Value Of Information: An Asset Valuation Approach.
Seventh European Conference on Information Systems (ECIS’99)
Information is
perishable
3
3
Building valuation models
Definition of a multi-dimensional
metadata value model
2
Mapping Questions to
meta-data value model
(intended use)
3
(atomic level)
Automated mapping of
value to datasets
4
Data Value Exploitation
5
Model
calibration
Questions Definition &
Context
1
pairwise &
hierarchical
scholar
citations
… … probabilistic
& statistical
multi-scales
system
Model
calibration
Model
calibration
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• Data linkage & associated models
• Value of data and information
• The (Machine) Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
The Learning Journey…
Meta-data, ontologies
and templates based
Rule-based
AI, Machine Learning and
Deep Learning
Limitations: not flexible
Learnings: process structures and patterns
Limitations: does not scale up, requires continual
change
Learnings: heuristics and constraints
Limitations: requires large
amount of data and variation of
experiments
Experts in the loop and
Adversarial Learning
Provides more flexibility and
scalability
Value-driven automated data curation and tagging process
Primary Experimental
data-sets
Questions & Goals
X
1 -Represent data as
an experimental
process
2- Represent
questions as
experimental
processes
3 - Cross-map
4 - Enrich
1 - Representation of data as experimental process models
Primary Experimental data-sets
Meta-data model
Meta model
Experimental Data
Process Pattern
Tagging and Categorisation
principles
Experimental Data represented as a typed Process Graph.
Identification of missing process components from
experimental process patterns and models
Autocuration Engine: Semantical enrichment and Context Mesh Entailment
present
absent
Process-oriented representation of
Experimental data-sets
Process element
asset
Graph theory and algorithms
Topologically: highly constrained
Learning for information
representation
2 - Mapping Questions to Experimental Process Representations
Autocuration Engine: Semantical
enrichment and Context Mesh Entailment
Questions & Goals
Process-oriented
representation of
Experimental data-sets
a) Mapping of the questions to Process-oriented
graph
b) Mapping of question to data (Experimental
process)
c) Identification of gaps
3 - Cross mapping and Identification of enrichment data sources based on
Value
Autocuration Engine: Semantical
enrichment and Context Mesh Entailment
Publications and
references
Meta-data
- Ad hoc
- Nomenclatures
- Ontologies
Internal & External sources
Use e[discover] to identify and select sources of
data enrichment based on added value
questions
4 - Semantical enrichment as: Inductive data weaving and context mesh
entailment
Autocuration Engine: Semantical
enrichment and Context Mesh Entailment
Publications and references
Meta-data
- Ad hoc
- Nomenclatures
- Ontologies
Internal & External sources
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• Data linkage & associated models
• Value of data and information
• The (Machine) Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
Requirements for data modelling and management
• Graph data structure and models are a natural fit
 Networks Science
• Support for Multi-layered graphs (“Lasagne graph”)
• Rich semantic expressiveness
• Flexible and dynamic (meta) modelling
• Multidimensional relationships (n-ary) and not just binary
• Support of integrity constraints
• Language for both graph traversing (navigation) and computation (optimisation)
• Verifiability or at least ease of verification
A brief history of databases and models – key concepts
Hierarchical data model
1960’s C. Backman
IBM 1968 IDS
Relational Model
- Algebraic
- Normalisation
- Functional
dependencies
T. Codd 1970’s
Entity-Relationship
- n-ary & reflexive
relationships
- Cardinalities 1976-
77 P. Chen & H.
Tardieu
RDBMS and SQL
- Data independence
(logical, physical)
- IBM DB2 1980’s
UML
1994
Graph
databases
OODBMS
O2,
early 90’s
OO
- Class and
subclass
O. Dahl, 1967
Semi structured,
XML, complex
data types
Giant datastores
Google, Yahoo,
Amazon
XML and
XQuery
early 2000’s
late 2000’s
Functional Data
Model
- Data mining,
OLAP
- Functional
dependencies
ER+
- integrity
constraints
- Generalisation
- specialisation
- Genericity
AFCET group
Early 90’s
Eagle Genomics platform functional architecture
Data access layer
Adaptors
Factory
Extract
Load
Qualitycontrol&
cleansing
Staging
e[catalog]
Data Catalog & Model Management
Graphbuilder
(transform)
Questions
Valuation Models:
Relevance
(sources + entities)
Risk quantification
Data Quality
Model
builder
Enrich
Weave/Entai
l
e[discover
]
Data
Sources
Discovery
Data
storage
Data finding
& selection
*
*
*
adaptors
# #
Changemanagementandaudittrail
WorkflowManager–e[hive]
SecurityManagement
AI and ML functionalities
e[curate
]
Conversational Learning Interface
Semantics Engine key modules
Data access layer
Adaptors
Factory
Questions
Valuation Models:
Relevance
(sources + entities)
Risk quantification
Data Quality
Model
builder
e[discover
]
*
#
AI and ML functionalities
Metadata Ontologie
s
Reference
data
e[catalog]
Data Catalog & Model Management
Data Access and Management Layer
• Provides a unification layer for data access and
management
• Allows for Distributed and Federated Database
• Structured data is added, with schema, to the
Ingest API, and structured queries are handled
through the Search API
• Data is stored in data stores
• High level queries use schemas and ontologies to
be mapped to database level ones, which
translate to direct database requests
• Schema mapping enables data integration at this
level
• Materialization Engine persist results of most
used queries as well as expected queries to
database, thus catering for efficiency and
scalability
• Ensures efficient and scalable access to
structured data
• Scalability by design-in by opposition to test-in
Ingest APIBulk Data APIMetadata APISearch API
Caching
RDBMS GraphDB / Ontology Store FileStore
Materialization Engine
Graql
Grakn DB
Table of content
• Eagle Genomics - introduction
• BioPharma industry – data-driven innovation
• Challenges and bottleneck
• The manual data process
• Principles and concepts
• Data linkage & associated models
• Value of data and information
• The Learning approach and mechanism
• Functional Architecture
• The Data layer
• Summary
The Future: Automating the Data Scientist, providing
data-science-as-a-service
Raj
Bioinformatician
R&D Lead
Jennifer
Biologist
Tony
Director,
Scientific Innovation
Conduct
experiment
Ask questions
Determine
investigation /
studies
View /refine
Generate source
report Analyze all
data
Generate study
report
Think of new
questions /
studies
• Reduce inertia
• Increase speed to insight
• Reduce time & cost
• Leverage stranded data assets
• Data science at the fingertips
of the biologist
Clarify
goals
Analyze data
Refine / ask new questions
Load into
e[curate]
Raw data
Value with e[dicover]
Load into
e[curate]
Load into
e[curate]
Generate
map
Generate
report data
External data sources
Ensembl, Pubmed, UniProt, ClinicalTrials.gov etc.
www.eaglegenomics.com
Thank you
Q&A
radouane.odrhiri@eaglegenomics.comCognitive & AI Data Infrastructure Meetup
Screenshot from Eagle Platform Demo:
Data valuation ad prioritisation
Screenshot from Eagle Platform Demo:
Semantic enrichment and contextual data tagging
Screenshot from Eagle Platform Demo:
Question Recommendations and Formulation, based on Previous Analyses/Studies
Screenshot from Eagle Platform Demo:
Key Entities, Relationships, and Associated Evidence
Unilever: Eagle’s Platform accelerating project
timelines across global teams
“Unilever’s digital data program now processes genetic sequences
twenty times faster - without incurring higher compute costs. In
addition, its robust architecture supports ten times as many
scientists, all working simultaneously.”
Pete Keeley, eScience Technical Lead
- R&D IT at Unilever
Past: Stand alone Present: Server/Cluster
Emerging: Secure Scalable
Cloud
Weeks / Months Days / Weeks Hours
100K Reads 5 Billion reads
Model Builder and Metamodel management
• Generic meta models are more
evolvable but lack semantic
expressiveness and are hard to
implement (performance)
 dynamic ontologies
• Specific models are semantically
richer but tend to be rigid
 static ontologies
Model builder solving the dilemma of
performance versus genericity
• Analogue to the blending of a
compiler and interpreter
• Virtual machine for models
Connectors & Wrappers Factory
Model builder
Generic Meta model & ontologies
Semantically-
richspecific
models
Ontologies and Ontologies Management
Ontologies are necessary components, but not
sufficient for semantics resolutions and
management
• Multitude of ontologies
• Conflict between ontologies (incoherence)
• Noisy ontologies (inconsistence)
From static to dynamic ontologies
• Managing ontologies as assets
• Valuation models applied to ontologies based
on questions and context
• Use of Machine Learning to construct
dynamically ontologies

More Related Content

What's hot

Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 
Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie Mac
DataWorks Summit
 

What's hot (20)

Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
 
Enterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & servicesEnterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & services
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
 
Apouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-dataApouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-data
 
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyFrom Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Harnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie MacHarnessing the Power of Big Data at Freddie Mac
Harnessing the Power of Big Data at Freddie Mac
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 

Similar to Automating Data Science over a Human Genomics Knowledge Base

Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
butest
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 

Similar to Automating Data Science over a Human Genomics Knowledge Base (20)

Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
DOWLD SLIDES.pptx
DOWLD SLIDES.pptxDOWLD SLIDES.pptx
DOWLD SLIDES.pptx
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 

More from Vaticle

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Vaticle
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
Vaticle
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Vaticle
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
Vaticle
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
Vaticle
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
Vaticle
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
Vaticle
 

More from Vaticle (20)

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
 
Loading Huge Amounts of Data
Loading Huge Amounts of DataLoading Huge Amounts of Data
Loading Huge Amounts of Data
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
 
The Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityThe Next Big Thing in AI - Causality
The Next Big Thing in AI - Causality
 
Building a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge GraphBuilding a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge Graph
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
 
Building a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdfBuilding a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdf
 
Enabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfEnabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdf
 
TypeDB Academy | Inference with Rules
TypeDB Academy | Inference with RulesTypeDB Academy | Inference with Rules
TypeDB Academy | Inference with Rules
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
 
Beyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLBeyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQL
 
TypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema DesignTypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema Design
 
Comparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDBComparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDB
 
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning EngineReasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
 
Pandora Paper Leaks With TypeDB
 Pandora Paper Leaks With TypeDB Pandora Paper Leaks With TypeDB
Pandora Paper Leaks With TypeDB
 
Strongly Typed Data for Machine Learning
Strongly Typed Data for Machine LearningStrongly Typed Data for Machine Learning
Strongly Typed Data for Machine Learning
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 

Recently uploaded (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Automating Data Science over a Human Genomics Knowledge Base

  • 1. Towards a framework for automating the Data Scientist – application to life science and bio data Radouane Oudrhiri, Chief Data Scientist Monday 27th February 2017 radouane.oudrhiri@eaglegenomics.com 27 February 2018 Cognitive & AI Data Infrastructure Meetup
  • 2. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • Data linkage & associated models • Value of data and information • The (Machine) Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 3. About Eagle Genomics Based in Cambridge, UK since 2008, on the Wellcome Genome Campus Smart data management for Life Sciences - software & services • Human & animal health • Personal care and cosmeceuticals • Food and nutriceuticals Delivering the innovation platform for the genomics era: e[automateddatascientist] • to increase the success rate of innovation • to enable data driven decisions • to enable customers to become insight driven
  • 4. The Eagle Genomics journey: from services, to solutions, to platform
  • 6. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • data linkage & associated models • Value of data and information • The (Machine) Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 7. BioPharmaceutical is evolving in pockets Driven by precision medicine and high throughput technologies Data-driven innovation is a must • Must be designed, aligned with strategy and continuously adapted • Requires a deep cultural change to liberate the business opportunity Data-intensive systems and processes are the business! • this goes way beyond digitisation • data is the currency • The technical challenges of data-intensive systems are stretching classical system engineering approaches Urgent need for comprehensive strategy to manage data assets! “Software Data is eating the world” * (*) ANDREESSEN M., “Software Is Eating the World”, The Wall Street Journal Essay, August 20, 2011. http://online.wsj.com/article/SB10001424053111903480904576512250915629460.html
  • 8. Current Process is Entangled, Human Intensive and Inefficient
  • 9. The fundamental requirements for data-driven innovation
  • 10. The bottleneck to data-driven innovation and data governance AI Machine Learning Modelling Focus of the industry for AI + ML applications necessary but not sufficient effort value
  • 11. Accelerating data driven innovation distributed increased Eagle Genomics focus - across the entire bridge
  • 12. Manual biocuration and data tagging is complex, unstructured and time consuming example of microbiome + clinical data
  • 13. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • Data linkage & associated models • Value of data and information • The (Machine) Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 14. How biocurators, scientists and subject matter experts work
  • 15. Solving the crucial data linkage problem: Questions, Knowledge, and Experimental Process Models How the data was collected? Process Modelling Knowledge Modelling What does the data represent? Questions Modelling and mapping Why the data was generated? Semantic enrichment Data sources Map to process graph Map to entities of interest Valuation hierarchies eaglediscover eaglecurate Why the data will be generated?
  • 16. Value of data and information - the missing link Data Information Value Insight E[VCp1 p2 …n|e] = E[v|CVp1 2 pne] – E[VS1S2…Sn|e] Howard, R.A. (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics (SSC-2), 22-26. • Semantical mapping processes, from one level of abstraction to another level, surrounded by ambiguity and uncertainty • According to the Information Theory, Information is defined as “reduction of uncertainty” • The Value of Information is the price that one would pay a “Clairvoyant” for additional information to reduce risks and uncertainties at each stage of a study, so as to increase profit
  • 17. Value of data and information Two factors determine the value of information: 1. whether the information is new to you; 2. whether the information causes you to change your decisions Consequences: The value of information is a subjective but quantitative utility that is realised at decision time The value of data/information is defined by its use and/or intended use
  • 18. Data valuation is a conversational process among multiple stakeholders Scientists Bioinformaticians Data scientists Marketing Business leaders Business leaders Measure concordance/discordance among stakeholders Use the metrics as a means to reduce ambiguity and reach consensus
  • 19. The value of data/information is multidimensional Data Valuation is a prioritisation process Seeking the Pareto effect
  • 20. Data quality versus data value V a l u e H i g H Missed opportunities Ideal situation L o w ??? over- engineered Low High Quality Data value and data quality are correlated and follow a Pareto distribution Most organizations, curate what is easy rather than what is necessary
  • 21. 7 laws of data asset management Information Is (Infinitely) Shareable 1 The Value of information Increases with use 2 The Value of information increases when combined with other information 5 More is not necessarily better Smart not necessarily Big 6 Information is not depletable Information is self-generating, the more you use it, the more you have 7 4 QUALITY The Value of Information increases with Quality Moody D., Walsh P., (1999), Measuring The Value Of Information: An Asset Valuation Approach. Seventh European Conference on Information Systems (ECIS’99) Information is perishable 3 3
  • 22. Building valuation models Definition of a multi-dimensional metadata value model 2 Mapping Questions to meta-data value model (intended use) 3 (atomic level) Automated mapping of value to datasets 4 Data Value Exploitation 5 Model calibration Questions Definition & Context 1 pairwise & hierarchical scholar citations … … probabilistic & statistical multi-scales system Model calibration Model calibration
  • 23. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • Data linkage & associated models • Value of data and information • The (Machine) Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 24. The Learning Journey… Meta-data, ontologies and templates based Rule-based AI, Machine Learning and Deep Learning Limitations: not flexible Learnings: process structures and patterns Limitations: does not scale up, requires continual change Learnings: heuristics and constraints Limitations: requires large amount of data and variation of experiments Experts in the loop and Adversarial Learning Provides more flexibility and scalability
  • 25. Value-driven automated data curation and tagging process Primary Experimental data-sets Questions & Goals X 1 -Represent data as an experimental process 2- Represent questions as experimental processes 3 - Cross-map 4 - Enrich
  • 26. 1 - Representation of data as experimental process models Primary Experimental data-sets Meta-data model Meta model Experimental Data Process Pattern Tagging and Categorisation principles Experimental Data represented as a typed Process Graph. Identification of missing process components from experimental process patterns and models Autocuration Engine: Semantical enrichment and Context Mesh Entailment present absent Process-oriented representation of Experimental data-sets Process element asset Graph theory and algorithms Topologically: highly constrained Learning for information representation
  • 27. 2 - Mapping Questions to Experimental Process Representations Autocuration Engine: Semantical enrichment and Context Mesh Entailment Questions & Goals Process-oriented representation of Experimental data-sets a) Mapping of the questions to Process-oriented graph b) Mapping of question to data (Experimental process) c) Identification of gaps
  • 28. 3 - Cross mapping and Identification of enrichment data sources based on Value Autocuration Engine: Semantical enrichment and Context Mesh Entailment Publications and references Meta-data - Ad hoc - Nomenclatures - Ontologies Internal & External sources Use e[discover] to identify and select sources of data enrichment based on added value questions
  • 29. 4 - Semantical enrichment as: Inductive data weaving and context mesh entailment Autocuration Engine: Semantical enrichment and Context Mesh Entailment Publications and references Meta-data - Ad hoc - Nomenclatures - Ontologies Internal & External sources
  • 30. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • Data linkage & associated models • Value of data and information • The (Machine) Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 31. Requirements for data modelling and management • Graph data structure and models are a natural fit  Networks Science • Support for Multi-layered graphs (“Lasagne graph”) • Rich semantic expressiveness • Flexible and dynamic (meta) modelling • Multidimensional relationships (n-ary) and not just binary • Support of integrity constraints • Language for both graph traversing (navigation) and computation (optimisation) • Verifiability or at least ease of verification
  • 32. A brief history of databases and models – key concepts Hierarchical data model 1960’s C. Backman IBM 1968 IDS Relational Model - Algebraic - Normalisation - Functional dependencies T. Codd 1970’s Entity-Relationship - n-ary & reflexive relationships - Cardinalities 1976- 77 P. Chen & H. Tardieu RDBMS and SQL - Data independence (logical, physical) - IBM DB2 1980’s UML 1994 Graph databases OODBMS O2, early 90’s OO - Class and subclass O. Dahl, 1967 Semi structured, XML, complex data types Giant datastores Google, Yahoo, Amazon XML and XQuery early 2000’s late 2000’s Functional Data Model - Data mining, OLAP - Functional dependencies ER+ - integrity constraints - Generalisation - specialisation - Genericity AFCET group Early 90’s
  • 33. Eagle Genomics platform functional architecture Data access layer Adaptors Factory Extract Load Qualitycontrol& cleansing Staging e[catalog] Data Catalog & Model Management Graphbuilder (transform) Questions Valuation Models: Relevance (sources + entities) Risk quantification Data Quality Model builder Enrich Weave/Entai l e[discover ] Data Sources Discovery Data storage Data finding & selection * * * adaptors # # Changemanagementandaudittrail WorkflowManager–e[hive] SecurityManagement AI and ML functionalities e[curate ] Conversational Learning Interface
  • 34. Semantics Engine key modules Data access layer Adaptors Factory Questions Valuation Models: Relevance (sources + entities) Risk quantification Data Quality Model builder e[discover ] * # AI and ML functionalities Metadata Ontologie s Reference data e[catalog] Data Catalog & Model Management
  • 35. Data Access and Management Layer • Provides a unification layer for data access and management • Allows for Distributed and Federated Database • Structured data is added, with schema, to the Ingest API, and structured queries are handled through the Search API • Data is stored in data stores • High level queries use schemas and ontologies to be mapped to database level ones, which translate to direct database requests • Schema mapping enables data integration at this level • Materialization Engine persist results of most used queries as well as expected queries to database, thus catering for efficiency and scalability • Ensures efficient and scalable access to structured data • Scalability by design-in by opposition to test-in Ingest APIBulk Data APIMetadata APISearch API Caching RDBMS GraphDB / Ontology Store FileStore Materialization Engine Graql Grakn DB
  • 36. Table of content • Eagle Genomics - introduction • BioPharma industry – data-driven innovation • Challenges and bottleneck • The manual data process • Principles and concepts • Data linkage & associated models • Value of data and information • The Learning approach and mechanism • Functional Architecture • The Data layer • Summary
  • 37. The Future: Automating the Data Scientist, providing data-science-as-a-service Raj Bioinformatician R&D Lead Jennifer Biologist Tony Director, Scientific Innovation Conduct experiment Ask questions Determine investigation / studies View /refine Generate source report Analyze all data Generate study report Think of new questions / studies • Reduce inertia • Increase speed to insight • Reduce time & cost • Leverage stranded data assets • Data science at the fingertips of the biologist Clarify goals Analyze data Refine / ask new questions Load into e[curate] Raw data Value with e[dicover] Load into e[curate] Load into e[curate] Generate map Generate report data External data sources Ensembl, Pubmed, UniProt, ClinicalTrials.gov etc.
  • 39. Screenshot from Eagle Platform Demo: Data valuation ad prioritisation
  • 40. Screenshot from Eagle Platform Demo: Semantic enrichment and contextual data tagging
  • 41. Screenshot from Eagle Platform Demo: Question Recommendations and Formulation, based on Previous Analyses/Studies
  • 42. Screenshot from Eagle Platform Demo: Key Entities, Relationships, and Associated Evidence
  • 43. Unilever: Eagle’s Platform accelerating project timelines across global teams “Unilever’s digital data program now processes genetic sequences twenty times faster - without incurring higher compute costs. In addition, its robust architecture supports ten times as many scientists, all working simultaneously.” Pete Keeley, eScience Technical Lead - R&D IT at Unilever Past: Stand alone Present: Server/Cluster Emerging: Secure Scalable Cloud Weeks / Months Days / Weeks Hours 100K Reads 5 Billion reads
  • 44. Model Builder and Metamodel management • Generic meta models are more evolvable but lack semantic expressiveness and are hard to implement (performance)  dynamic ontologies • Specific models are semantically richer but tend to be rigid  static ontologies Model builder solving the dilemma of performance versus genericity • Analogue to the blending of a compiler and interpreter • Virtual machine for models Connectors & Wrappers Factory Model builder Generic Meta model & ontologies Semantically- richspecific models
  • 45. Ontologies and Ontologies Management Ontologies are necessary components, but not sufficient for semantics resolutions and management • Multitude of ontologies • Conflict between ontologies (incoherence) • Noisy ontologies (inconsistence) From static to dynamic ontologies • Managing ontologies as assets • Valuation models applied to ontologies based on questions and context • Use of Machine Learning to construct dynamically ontologies