More Related Content Similar to NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018 (20) More from Amazon Web Services (20) NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NLP in Healthcare to Predict
Adverse Events with Amazon
SageMaker
Garin Kessler
Data Scientist
AWS Machine Learning Solutions Lab
A I M 3 4 6
Mayank Thakkar
Life Sciences Specialist
AWS Solutions Architecture
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Goal
Learn how to apply machine learning methods to predict adverse
events from reported patient data
… and much more
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Background
• Pharmacovigilance and patient safety programs
• Adverse events and FDA regulations
• FAERS
• Workable data
• Call center recording / summaries
• Emails / faxes
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adverse event detection – The challenge
• Disparate data types
• Unstructured data
• Understanding semantic
dispositions
• Synonyms, spelling mistakes
• Sentiment detection
• Categorizing interactions
• Various data sources
• Meeting compliance
objectives
• True positives, “sleeping doctor”
• Scale, enormous scale
• Cost efficiency
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine learning to the rescue
• Improve accuracy and
reliability
• Doesn’t replace humans – aids
humans
• Offload repetitive work – humans
can handle edge cases
• Decrease costs
• Repurpose human workforce for
‘value-adding’ endeavors
• Keep up with ongoing
research
• Incorporate published articles at
scale
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine learning – The process
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage Notebook
environments
• Get data to notebooks securely
Experimentation
• Set up and manage clusters
• Scale/distribute ML algorithms
Deployment
• Set up and manage
inference clusters
• Manage and auto scale
inference APIs
• Testing, versioning, and
monitoring 6-18
months
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A managed service
that provides one of the quickest and easiest ways for
your data scientists and developers to get
ML models from idea to production
Amazon SageMaker
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing Amazon SageMaker
End-to-end
machine learning
platform
Zero setup Flexible model
training - bring
your own deep
learning script
Pay by the
second
Or your custom
algorithm
Docker image
One step
deployment
A/B testing Low latency,
high
throughput,
high reliability
Choice of several
ML algorithms
Train faster, in
a single pass
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing Amazon SageMaker
Choice of several
ML algorithms
XGBoost, FM,
and Linear for
classification
and regression
K-means and
PCA for
clustering and
dimensionality
reduction
LDA and NTM
for topic
modeling,
seq2seq for
translation
Image
classification
with
convolutional
neural
networks
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Natural language processing methods
• Dataset preprocessing - feature generators
• Latent Dirichlet Analysis (LDA)
• Comprehend topic modeling
• BlazingText word embeddings
• Classification - algorithms utilized
• K-nearest neighbors
• Logistic regression
• XGBoost
• Amazon SageMaker BlazingText Classifier
• Deep convolutional neural network running on TensorFlow and Amazon SageMaker
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Preprocessing
• Data sources
• Call center summaries
• Stored in Amazon Simple Storage Service (Amazon S3)
• Preprocessing
• Lemmatization with Natural Language Toolkit (NLTK)
• BlazingText with Amazon SageMaker
Using BlazingText, reduced the preprocessing time by 10x
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker tooling
• TensorFlow and Keras
• “Bring your own model”
• Convolutional neural network
• Built-in algorithms
• Automatic model tuning
• Spinning out many jobs simultaneously
• Amazon CloudWatch and TensorBoard
• Monitoring instances and accuracy metrics
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture
VPC
Private subnet
AWS Cloud
Availability zone 1
AWS Region Raw data and
model artifacts
Production
data
Availability zone 2
Private subnet
Training Deployment
Training Deployment
Auto
Scaling
group
Auto
Scaling
group
Endpoint
Endpoint
19. Results by algorithm
Feature generator Classifier Accuracy AUC
False
positive
rate
False
negative rate
Precision Recall Sensitivity Specificity
LDA
(Latent Dirichlet Allocation)
kNN 0.775 0.767 0.182 0.288 0.729 0.712 0.712 0.818
Logistic regression 0.728 0.787 0.277 0.257 0.485 0.743 0.743 0.723
XGBoost 0.812 0.905 0.152 0.240 0.774 0.759 0.759 0.848
Comprehend topic modeling
kNN 0.759 0.718 0.254 0.189 0.516 0.811 0.811 0.742
Logistic regression 0.516 0.892 0.395 0.602 0.433 0.398 0.398 0.605
XGBoost 0.855 0.936 0.069 0.230 0.908 0.769 0.769 0.931
Amazon SageMaker
BlazingText
BlazingText Classifier 0.979 0.997 0.023 0.020 0.980 0.985 0.985 0.970
Amazon SageMaker Deep
CNN
0.978 0.998 0.021 0.020 0.978 0.982 0.982 0.972
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BlazingText embeddings
overview
• Plot the top 5000 most common terms
• Terms overlap with semantically similar
terms
• Models leverage these semantics for
computation and performance
• Will look at terms in two sections of the
word embedding space
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BlazingText embeddings: Zoomed in – Part 1
• Model has learned
important familial
and patient
relationships,
including caregivers
and reporters
• Robust to typos:
Patient, Pateint, Pt
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BlazingText embeddings: Zoomed in – Part 2
• Model has learned
important side effects
and adverse drug
reactions
• Types of reactions are
even clustered
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost
Service Resources used Pricing dimension Cost
Amazon S3 50 GB for one month $0.023 per GB-month $1.15
Amazon EFS Storage $3
$1.3
$0.0714 per instance-minute $8.55
$0.021 per instance-minute $0.084
Total $14.08 ($0.11 per 1000 predictions)*
What does it cost to run this model?
Amazon SageMaker on-demand ML instances let you pay for machine learning compute capacity by the second, with a one-minute minimum, with no long-term
commitments.
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
To learn more…
• Amazon SageMaker here
• Blogs:
• Enhanced text classification and word vectors using Amazon SageMaker BlazingText
• https://tinyurl.com/sagemaker-blazingtext
• Bring your own pre-trained MXNet or TensorFlow models into Amazon SageMaker
• https://tinyurl.com/sagemaker-byom
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Questions?
Garin Kessler
Data Scientist
AWS Machine Learning Solutions Lab
Mayank Thakkar
Life Sciences Specialist
AWS Solutions Architecture
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Garin Kessler
Data Scientist
AWS Machine Learning Solutions Lab
Mayank Thakkar
Life Sciences Specialist
AWS Solutions Architecture
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.