GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
gsk.com
How will knowledgegraphs improve clinical reporting workflows?
Presenters: Alexey Kuznetsov (alexey.k.kuznetsov@gsk.com) & Shannon Haughton(shannon.l.haughton@gsk.com)
16Nov2022
21 March 2023 2
Our Problem Statement
Tremendous Resource, Multiple Handoffs, Numerous Transformations
Single
Study
SDTM* ADaM TFLs
Submission
5 – 10
studies
99 modules
*From 77 legacy datasets
Clinical
data flow
Trial
design
Collect
data
Review
observed
datasets
Analyse
datasets
Review
results
EDC
Lab data
Randomisation
Others...
Protocol
Metadata
Examples 71 datasets 42 datasets <= 250 outputs
SDTM ADaM TFLs
350 - 710
to integrate
210 – 420
to integrate
<= 250 integrated
outputs
Re-transformations
(Several Standard)
Re-Mappings
(Several Standards)
21 March 2023 3
Imagine a world where anything is possible…
True automation of
standard analyses
Ad-hoc requests
delivered on demand
(Blinded) Analysis
results reviewedin
real-time
Manual effort of
results validation
virtually eliminated
Data visualisations
available in-stream
GDPR and patient
consent
Risk-based
monitoring is
proactive
Google-like Q&A
system for our
trial data
Clinical
application of
preclinical AI
algorithms
21 March 2023 4
From Imagination to Reality
Clinical Knowledge Graph
Let’s move away from isolated data domain silos…
…to ONE contextualised Clinical Knowledge Graph
Exposure
Domain
Subject = Bob
Study Day = 1 Dosage = 40mg
Trial = Trial1
MedicalHistory
Domain Subject = Bob
Event Date = 2000
Term =
Hypertension
Trial = Trial1
Adverse Events
Domain
Subject = Bob
Study Day = 1 Term = Headache
Trial = Trial 1
Demographics
Domain
Sex = M Age = 75
Subject = Bob
Trial = Trial1
Clinical Knowledge Graph
21 March 2023 5
Our Idea
…the Google Translate for our clinical data – helping us translate our complex data landscape to
answer important scientific questions
Clinical
data flow
Trial
design
Collect
data
Review
observed
datasets
Analyse
datasets
Review
results
EDC
Lab data
Randomisation
Others...
Protocol
Metadata
Examples
99 Modules
GSK Design
(1 Standard)
One Connected Data Model
Parallel Processing
SDTM (71)
ADaM (42)
TFLs (<=250)
ISS/ISE
Select Required Standard
ETL
Modules
Parallel Processing
21 March 2023 6
Unique Value
KnowledgeGraph
Greater control
over data privacy
Modern graph
analytics &
visualisation
Decoupling
vertical data
pipeline
Accelerated
decision making
21 March 2023 7
Goal: Test feasibility, desirability & sustainability of idea
Phased agile & risk-based approach with predefined success criteria
EXPERIMENT 1
Can we ingest SDTM
data
into CLD MVP?
EXPERIMENT 3
Can we analyse, report
and egress
TLFs from CLD MVP?
EXPERIMENT 2
Can we enrich CLD MVP
model with ADaM
Transformations?
2021 H1 2021 H2 2022 H1 2022 H2
MVP PILOT
Can we use the CLD
MVP to perform QC for
an ongoing trial?
Continuouslearning and iteration
21 March 2023 14
How do we store machine readable derivation metadata as graph
21 March 2023 15
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 16
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 17
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 18
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
21 March 2023 19
How do we store machine readable derivation metadata as graph
aage = scrndt - brthdtc +1
modular
dependant
orchestration
21 March 2023 20
How do we store machine readable derivation metadata as graph
21 March 2023 21
How do we store machine readable summary statistics as graph
21 March 2023 22
How do we store machine readable summary statistics as graph
21 March 2023 23
How do we store machine readable summary statistics as graph
specification of statistics
21 March 2023 24
How do we store machine readable summary statistics as graph
specification of statistics
the what
21 March 2023 25
How do we store machine readable summary statistics as graph
specification of statistics
the what
the how
21 March 2023 26
How do we store machine readable summary statistics as graph
specification of statistics
the what
the how
qualifiers
21 March 2023 27
How do we store machine readable summary statistics as graph
21 March 2023 28
How do we store machine readable summary statistics as graph
SEX Mean Value
F 32.4
M 34.0
21 March 2023 29
Open source assets released
To be released:
• tab2neo
• neo4cdisc
GSK-Biostatistics/neointerface:
NeoInterface -Neo4j made easy for
Python programmers!(github.com)
read/write
csv, xls, xlsx, xpt,
sas7bdat, rda
dm/ae/lb/...
dm/ae/../custom
21 March 2023 30
Learnings that helped us accelerate our idea
Pre-defined
success criteria
critical in quick
decision making
Prioritise 1 idea, test it,
refine it, test it, refine it…
Focused innovation
challenge can greatly help
test disruptive ideas
Understand painpoints &
test ideas to drive
informed innovation
Timeboxed focused
sprints are great to
inform the path ahead
21 March 2023 31
Special thanks to…
Jorine Putter
Michael Rimler
Samantha Warden
Kirsten Langendorf
Johannes Ulander
Dave Iberson-Hurst
Eleanor Sparling
Rachel Ren
James Sefton
William McDermott
Jonathan Deacon
Benjamin Grinsted
Julian West
It takes a village
to raise an idea…