Mais conteúdo relacionado Semelhante a Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals (20) Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals2. Data Driven Drugs:
Predictive Models to Improve
Product Quality in Pharmaceuticals
Sarah Aerni, PhD
Senior Data Scientist at Pivotal
saerni@gopivotal.com
Strata RX
September 26, 2013
© Copyright 2013 Pivotal. All rights reserved.
2
3. The Quantified Patient
Medical History!
Genetics!
Family !
History!
Imaging!
Clinical!
Narratives!
Medications!
Molecular!
Diagnostics!
Lab tests!
Environment!
© Copyright 2013 Pivotal. All rights reserved.
Sensors!
& Mobile!
3
4. Data driven drugs: From discovery to delivery
Drug discovery
+ development
RICH DATA SOURCES
Clinical
Trials
Distribution and
surveillance
! Molecular data
– Cellular drug screens
– Animal models
! Clinical data including notes, images,
markers (e.g. genomics, lab results)
! Sensor and assay data
! Internal and partner/purchased external
data
Manufacturing ! Contact center data
Marketing
© Copyright 2013 Pivotal. All rights reserved.
! Patient registries, public and federal
data, clinical partnerships
4
5. Data integration
How Pivotal can enable industries to
extract new value from data sources
© Copyright 2013 Pivotal. All rights reserved.
5
6. Successful transformation into a data-driven
enterprise requires a paradigm shift
! Bring available data sources to a
central location
Integration of a variety of data leads to
new insights
DATA
IS THE NEW
CENTER OF GRAVITY
! Analyze large volumes of variable
data for richer models
Building models without data movement
reduces time to insight
! Share data, insights and ideas
Leveraging various expertise will lead to
more relevant business insights
© Copyright 2013 Pivotal. All rights reserved.
Data > Application!
6
7. Traditional Analytics Processes
If you think databases are only good for storing data
Time-to-Insights
sample
In-memory
statistics
tool
In-memory
optimization
tool
solution
forecast
© Copyright 2013 Pivotal. All rights reserved.
7
8. Pivotal One: Heritage
Application Fabric
Data Fabric
GemFire
Ingest & Query: very high-capacity &
in-memory
Scale-out storage: HDFS/Object
vFabric
Languages
&
Frameworks
Services
Analytics
Automation: App Provisioning & Life-cycle
Service Registry
Cloud Abstraction (portability)
Cloud Fabric
© Copyright 2013 Pivotal. All rights reserved.
8
9. Performance Through Parallelism
! Automatic parallelization
Database
– Load and query like any database
– Automatically distributed tables across
nodes
– No need for manual partitioning or tuning
! Analytics Optimized:
– Analytics-oriented query optimization
! Extremely scalable MPP shared-nothing
architecture
Interconnect
Compute
Storage
Loading
– All nodes can scan and process in parallel
– Linear scalability by adding nodes
© Copyright 2013 Pivotal. All rights reserved.
9
10. Performance Through Parallelism
! Automatic parallelization
Database
– Load and query like any database
– Automatically distributed tables across
nodes
– No need for manual partitioning or tuning
! Analytics Optimized:
– Analytics-oriented query optimization
! Extremely scalable MPP shared-nothing
architecture
– All nodes can scan and process in parallel
– Linear scalability by adding nodes
© Copyright 2013 Pivotal. All rights reserved.
Interconnect
Compute
Storage
ETL
Loadin
File
g
Systems
External Sources: Loading, streaming, etc.
10
11. Pivotal HD Architecture
Pivotal HD
Enterprise
Resource
Management
& Workflow
Pig, Hive,
Mahout
HBase
Map Reduce
Configure,
Monitor, Manage
Hadoop Virtualization (HVE)
Yarn
Command
HDFS
Zookeeper
Center
Sqoop
Apache
© Copyright 2013 Pivotal. All rights reserved.
Deploy,
Data Loader
Flume
Pivotal HD Enterprise
11
12. Pivotal HD Architecture
HAWQ– Advanced
Database Services
ANSI SQL + Analytics
Pivotal HD
Enterprise
Resource
Management
& Workflow
Xtension
Framework
HBase
Query
Optimizer
Dynamic Pipelining
Pig, Hive,
Mahout
Map Reduce
Deploy,
Configure,
Monitor, Manage
Hadoop Virtualization (HVE)
Yarn
Command
HDFS
Zookeeper
Center
Sqoop
Apache
© Copyright 2013 Pivotal. All rights reserved.
Catalog
Services
Flume
Data Loader
Pivotal HD Enterprise
HAWQ
12
13. Leveraging healthcare data to drive predictive and
precision care
Clinical!
Narratives!
Medications!
Decision support
Imaging!
Precision care
Genetics!
Environment!
Labs test!
Cohort identification
Unified data supporting unified risk evaluation, decision-making, etc.
! Acting on full patient and medical profile!
© Copyright 2013 Pivotal. All rights reserved.
13
14. Traditional Analytics Processes
If you think databases are only good for storing data
Time-to-Insights
sample
In-memory
statistics
tool
In-memory
optimization
tool
solution
forecast
© Copyright 2013 Pivotal. All rights reserved.
14
15. Analytics with Pivotal
A single address for everything analytics
Time-to-Insights
Forecasting
Clustering
Regression
Optimization
Classification
© Copyright 2013 Pivotal. All rights reserved.
15
18. Data driven drugs: From discovery to delivery
Drug discovery
+ development
! Molecular data
Clinical
Trials
Distribution and
surveillance
Marketing
© Copyright 2013 Pivotal. All rights reserved.
– Cellular drug screens
– Animal models
! Clinical data including notes,
images, markers (e.g. genomics,
lab results)
! Sensor and assay data
! Internal and partner/purchased
external data
Manufacturing
! Contact center data
! Patient registries, public and
federal data, clinical partnerships
18
20. Predicting potency in vaccine manufacturing
Customer
Solution
A major pharmaceutical company
•
Introduced a new data model to make
data accessible and enable analytics
•
Built automated outlier detection/
correction methods to address manual
data entry quality issues
•
Devised imputation methods to deal with
data completeness issues
•
Built predictive models with high accuracy
Business Problem
Predict potency and antigen levels of live
virus vaccines based on manufacturing
sensor data and manual data collected
throughout the process.
Challenges
•
Customer’s data model was not optimal
for running analytical queries
•
Manual data quality issues
•
Data capture was performed with
varying consistency due to high cost
associated with manual data collection
© Copyright 2013 Pivotal. All rights reserved.
20
21. Building predictive models to improved outcomes in
manufacturing of vaccines
Temp
Counts
Future Looking
Predictive Models
Cell
expansion
Virus
propagation
Duration of step
Time
Warning!
Entered value not
in expected range
© Copyright 2013 Pivotal. All rights reserved.
Pooling into
final product
Backward Looking
Models
21
22. Enabling predictive models through rearchitecting
Challenges
• Accessibility
– Certain parts of the data have
never been used in any predictive
modeling since it is extremely hard
to query them
Cell
expansion
• Data Integrity
– Manual data entries are prone to
errors. There is no immediate
feedback to examine the validity of
the values entered
Virus
propagation
• Data Completeness
– Manual data entry is time
consuming. There is no feedback
on what data is most useful in
improving the efficiency and
quality and hence no prioritization
of what data should be collected
© Copyright 2013 Pivotal. All rights reserved.
Pooling into
final product
22
23. Enabling predictive models through rearchitecting
Challenges
• Accessibility
– Certain parts of the data have
never been used in any predictive
modeling since it is extremely hard
to query them
Purpose-built data models for rapid
data querying and exploration
• Data Integrity
– Manual data entries are prone to
errors. There is no immediate
feedback to examine the validity of
the values entered
Automated data cleansing
techniques
• Data Completeness
– Manual data entry is time
consuming. There is no feedback
on what data is most useful in
improving the efficiency and
quality and hence no prioritization
of what data should be collected
© Copyright 2013 Pivotal. All rights reserved.
Opportunities to eliminate collection
of incomplete or non-predictive data
23
24. Identifying and correcting data integrity problems
Creating automated methods for detection and correction
all data
60
80
100
! Data integrity problems cause
challenges in modeling
0
20
40
! Sources of variation in entries
of measurements
1
3
5
7
9
11
13
15
17
19
21
23
– Variable units of
measurement
– Manual data entry errors
Approach: Detect the optimal
threshold to separate two
distributions
© Copyright 2013 Pivotal. All rights reserved.
24
25. Identifying and correcting data integrity problems
Creating automated methods for detection and correction
all data
60
80
100
! Data integrity problems cause
challenges in modeling
20
40
! Sources of variation in entries
of measurements
– Variable units of
measurement
– Manual data entry errors
0
1
3
5
7
9
11
13
15
17
19
lower half
lower half
upper half
23
! Approach: Detect the
optimal threshold to
separate two distributions
40
10
20
510 5 20 10
10 15 20
30
lower half
30
Frequency
15
40
50
5020 60
60
upper half
0
0
00
Frequency
Frequency Frequency
21
0.12
0.12
0.12 12
0.14
0.16
0.18
0.20
0.14
0.16
0.18
0.14 newVals[seq(1, maxBreak, 1)] 0.20 22
0.16
14
16
180.18 20 0.20
newVals[seq(1, maxBreak, 1)]
newVals[seq(1, maxBreak, 1)]
newVals[seq(maxBreak + 1, length(newVals), 1)]
© Copyright 2013 Pivotal. All rights reserved.
0.22
0.22
0.22
24
12
14
16
18
20
22
24
newVals[seq(maxBreak + 1, length(newVals), 1)]
25
26. Identifying and correcting data integrity problems
Creating automated methods for detection and correction
0
20
40
60
80
100
all data
1
3
5
7
9
11
13
15
17
19
lower half
lower half
upper half
23
Foreground
Background
40
10
20
510 5 20 10
10 15 20
30
lower half
30
Frequency
15
40
50
5020 60
60
upper half
0
0
00
Frequency
Frequency Frequency
21
0.12
0.12
0.12 12
0.14
0.16
0.18
0.20
0.14
0.16
0.18
0.14 newVals[seq(1, maxBreak, 1)] 0.20 22
0.16
14
16
180.18 20 0.20
newVals[seq(1, maxBreak, 1)]
newVals[seq(1, maxBreak, 1)]
newVals[seq(maxBreak + 1, length(newVals), 1)]
© Copyright 2013 Pivotal. All rights reserved.
0.22
0.22
0.22
24
12
14
16
18
20
22
24
newVals[seq(maxBreak + 1, length(newVals), 1)]
26
27. Identifying and correcting data integrity problems
Creating automated methods for detection and correction
0
20
40
60
80
100
all data
1
3
5
7
9
11
13
15
17
19
lower half
lower half
upper half
23
Foreground
Background
40
10
20
510 5 20 10
10 15 20
30
lower half
30
Frequency
15
40
50
5020 60
60
upper half
0
0
00
Frequency
Frequency Frequency
21
0.12
0.12
0.12 12
0.14
0.16
0.18
0.20
0.14
0.16
0.18
0.14 newVals[seq(1, maxBreak, 1)] 0.20 22
0.16
14
16
180.18 20 0.20
newVals[seq(1, maxBreak, 1)]
newVals[seq(1, maxBreak, 1)]
newVals[seq(maxBreak + 1, length(newVals), 1)]
© Copyright 2013 Pivotal. All rights reserved.
0.22
0.22
0.22
24
12
14
16
18
20
22
24
newVals[seq(maxBreak + 1, length(newVals), 1)]
27
28. Identifying and correcting data integrity problems
Creating automated methods for detection and correction
60
80
100
all data
5
7
9
11
13
15
17
19
lower half
lower half
upper half
23
0
40
12
20
510 5 20 10
10 15 20
30
lower half
30
Frequency
15
40
50
5020 60
60
20
20
upper half
12
12
14
14
14
16 16
16
18 18
18
20 20
20
22 22
22
24
24
10
c(loh, uph)
0
0
00
Frequency
Frequency Frequency
21
40
40
3
Frequency
1
60
60
0
20
8080
40
cleanedHistogram of c(loh, uph) = 100
histogram with multiplier
0.12
0.12
0.12 12
0.14
0.16
0.18
0.20
0.14
0.16
0.18
0.14 newVals[seq(1, maxBreak, 1)] 0.20 22
0.16
14
16
180.18 20 0.20
newVals[seq(1, maxBreak, 1)]
newVals[seq(1, maxBreak, 1)]
newVals[seq(maxBreak + 1, length(newVals), 1)]
© Copyright 2013 Pivotal. All rights reserved.
0.22
0.22
0.22
24
12
14
16
18
20
22
24
newVals[seq(maxBreak + 1, length(newVals), 1)]
28
29. Building models: First, start with the answer
How to build models that solve the right problem
Cell
expansion
Approach: Use historical data to build a model
predicting potency of a final product using data
from the manufacturing process
! Model form, how do we pick the right one?
Virus
propagation
– How do we deal with correlated features?
– Accuracy or interpretability?
! Available data
Pooling into
final product
© Copyright 2013 Pivotal. All rights reserved.
– Thousands of features, without expert guidance how do we
choose the right ones?
– What data do we want to use to predict? When is the right
time for an intervention?
29
30. Model generation and evaluation
Predicting vaccine potency using manufacturing data
13.5
! Feature engineering and transformation
Test R2=0.742
Train R2=0.823
– Enabled by rapid in-database processing
●
●
●
13.0
●
●
predTest[, i]
Predicted Potency
Total test 0.742003189411406
●
●
●
●
●
12.5
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ● ●●
●
● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
● ● ●
●● ● ●
●
●
●
●●●
●●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●● ● ●
●
●
●
●
●
●
●● ● ● ●●● ●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
12.0
●
●
●
●● ●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●● ● ●
● ●
● ●●
●
●
●
●● ●
● ● ●
●
●
●
●
●
●
●● ●
● ● ●
●● ●
●
●
●
●
●
●
●● ●
● ● ●
● ●●●
●
●●
●
●
●●
●
● ●●
●
●
● ●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
– Partial least squares
– Random forest
– Regularized regression
●
●
●
●
! Interpretation of model results for
insight generation
●
●
●
●
●
12.0
12.5
13.0
True Potency
allTest[, i]
© Copyright 2013 Pivotal. All rights reserved.
! Experimentation with model forms
13.5
– Use cross-validation framework to
assess variable importance
30
31. Sample model insights
Interpreting the utility of a measure obtained during manufacturing based
on model outcomes
13.0
12.8
13.0
Log of Potency
12.6
Potency
12.6
12.2
12.4
! Features consistently absent
from models may be
uninformative for predicting
potency
12.4
12.8
Potency
12.0
12.2
12.0
Log of Potency
! Some features may reveal
tunable parameters to alter
potency, others may simply
be markers
Correlation = 0.38
Correlation = -0.45
0.20
0.25
0.30
0.35
0.40
SP1 Total Viable Cells Harvested Per Sq. Cm
Assayed value
© Copyright 2013 Pivotal. All rights reserved.
0.45
12
12.5
13
13.5
14
14.5
15
15.5
SP2 Total Trypsinization Exposure Time of per CCS
Duration of a step
>=16
! Opportunities to provide realtime feedback on data entry
errors and predicted potency
outcomes
31
33. Data driven drugs: From discovery to delivery
Drug discovery
+ development
Clinical
Trials
Distribution and
surveillance
Manufacturing
Marketing
© Copyright 2013 Pivotal. All rights reserved.
33
34. Data driven drugs: From discovery to delivery
Drug discovery
+ development
Clinical
Trials
Distribution and
surveillance
! Data repurposing
New value exists in leveraging
historical data across drugs and stages
! Data discovery
External and publicly available
datasets can augment proprietary
sources
Manufacturing ! Data collection
Marketing
© Copyright 2013 Pivotal. All rights reserved.
Obtaining new data from different
sources drives additional value
34
35. Data driven drugs: From discovery to delivery
Drug discovery
+ development
Clinical
Trials
Distribution and
surveillance
! Data repurposing
New value exists in leveraging
historical data across drugs and stages
Adverse events for new clinical
indications
! Data discovery
External and publicly available
datasets can augment proprietary
sources
Twitter data to forecast demand
Manufacturing ! Data collection
Marketing
© Copyright 2013 Pivotal. All rights reserved.
Obtaining new data from different
sources drives additional value
Mobile and sensor data to measure
patient adherence and outcomes
35
36. Leveraging Data to Improve Demand Forecasts
Hospitals
Doctor’s Offices
Supply Distr.
Surgery Centers
Sales Data
Pharmacies
Analyze orders from
customers
Patients
Laboratories
Self-Reporting
Publicly Available Resources
Monitoring Patient Populations
© Copyright 2013 Pivotal. All rights reserved.
36
37. Promising Advancements in Diabetes Studies
Use of telehealth to provide tight glucose control
Biochemical
Measurements
EMR
Genomics
Lifestyle
Intervention
© Copyright 2013 Pivotal. All rights reserved.
37
38. Launching a successful diabetes management program
Multiple potential points of failure, requires use of analytics at every step
Increase
Awareness
Patient
Enrollment
Comparative
Effectiveness
Remote
Patient
Monitoring
Design
Interventions
Measure
Impact on
Population
Best channel
per cohort
Best therapy for
Resource
each cohort:
allocation
Identify highest
• Medication
decisions
impact channels
• Delivery
Medication
Method
adherence
Stochastic • Monitoring
Churn
Identify
entity
prediction
influencers
Method
Predict risk of
resolution
negative
Measure
Campaign
outcome for
engagement
optimization
A/B testing to design best
next 3 months
engagement platform
© Copyright 2013 Pivotal. All rights reserved.
Attribution
models
Careful design
of experiment to
quantify the
Impact
38
39. Launching a successful diabetes management program
Interdisciplinary collaboration of data scientists essential to success
Marketing
Increase
Awareness
Healthcare
Patient
Enrollment
Web Analytics
Comparative
Effectiveness
Remote
Patient
Monitoring
Optimization
Design
Interventions
General ML
Measure
Impact on
Population
Best channel
per cohort
Best therapy for
Resource
each cohort:
allocation
Identify highest
• Medication
decisions
impact channels
• Delivery
Medication
Method
adherence
Stochastic • Monitoring
Churn
Identify
entity
prediction
influencers
Method
Predict risk of
resolution
negative
Measure
Campaign
outcome for
engagement
optimization
A/B testing to design best
next 3 months
engagement platform
© Copyright 2013 Pivotal. All rights reserved.
Attribution
models
Careful design
of experiment
to quantify the
Impact
39
40. Pivotal Labs rapid application development
! Rheumatoid arthritis remote patient
monitoring system
– Self-reporting
– Intuitive user interface
https://itunes.apple.com/us/app/myra/id563338979?mt=8
© Copyright 2013 Pivotal. All rights reserved.
40
41. Pivotal One: Heritage
Application Fabric
Data Fabric
GemFire
Ingest & Query: very high-capacity &
in-memory
Scale-out storage: HDFS/Object
vFabric
Languages
&
Frameworks
Services
Analytics
Automation: App Provisioning & Life-cycle
Service Registry
Cloud Abstraction (portability)
Cloud Fabric
© Copyright 2013 Pivotal. All rights reserved.
41