SlideShare uma empresa Scribd logo
1 de 52
Using the Hadoop Ecosystem to
Drive Healthcare Innovation
Aly Sivji
April 25, 2017
About Me
• Aly Sivji
– Twitter: @CaiusSivjus
– Blog: http://alysivji.github.io
• Senior Analyst @ IBM Watson Health
– Value-Based Care: Planning Solutions
• Grad Student @ Northwestern University
– Medical Informatics
• Interests:
– Technology 🐍
– Data 📈
– Star Trek 🖖🖖
Overview
• Big Data drives most industries
Overview
• What about Healthcare?
– Machine Learning
• Fraud detection ($65+ billion lost every year)
– Wired Article
– dataiku - Detecting Medicare Fraud
• Preventing unnecessary procedures
– Data Mining
• Identifying medication prescribed together
– Recommender Systems
• Finding similar patients
Overview
Healthcare is Different.
People who work in healthcare
Additional Reading
• John Halamka (The Health Care Blog)
• Health Catalyst
Overview
• Data Analytics / Data Science
– Retrospective versus Predictive
• Machine Learning
– Types of Algorithms
• Healthcare Analytics
Overview
• Apache Hadoop Ecosystem
– Big Data framework
– Distributed computation on commodity hardware
– Demo!
Road to Electronic Health Records
1920s –
Modern
record
keeping
begins
1960s – Dr.
Larry Weed
introduces
problem-
oriented
medical
records
1972 –
Regenstrief
Institute
develops
first EMR
System
1980s-90s –
Siloed adoption
by departments
& admin
1996 –
HIPAA
establishes
national
standards
for
electronic
health
records
2004 –
President Bush
calls for
Computerized
Health Records
2009: EHRs Go Mainstream
• HITECH Act passed by President Obama
– $25.9 billion to expand Health IT (HIT) adoption
• Meaningful Use (MU) program
– Incentive payments for using HIT to
• Improve quality, safety, efficiency of care
• Engage patients
• Increase care co-ordination
– Goal: MU compliance => better outcomes
EHR Adoption: Doubled Since 2008
Office-based Physician Electronic Health Record Adoption (2005-2015)
Source: Office of the National Coordinator for Health Information Technology. 'Office-based Physician Electronic Health Record
Adoption,' Health IT Quick-Stat #50. dashboard.healthit.gov/quickstats/pages/physician-ehr-adoption-trends.php. Dec 2016.
Health Data Today
• Electronic Health Records
• Genomic Data ($1000 genome)
• Medical Internet of Things (mIoT)
• Wearable devices
• Bottom Line: Data is growing
Big Data = 'Bigger Data' in Healthcare (article)
Data Analytics
• Businesses collect lots of data
– IBM: 90% of world’s data created in last 2 years
• How can we find hidden patterns in the data
and make information actionable?
Data Science!
Types of Analytics
• Retrospective Analytics
– Summarizing historical activity / performance
– Limited scope for making future plans
• Better than nothing
Types of Analytics
• Predictive Analytics
– Finding patterns (correlations) between historical
environment and results
– Apply to current environment to make predictions
Predictive Analytics
"Once you have enough data, you start to see
patterns. You can then build a model of how
these data work. Once you build a model, you
can predict.”
Michael Wu
Chief Scientist, Lithium Technologies
Predictive Analytics
Machine Learning (ML)
“Field of study that gives computers the ability
to learn without being explicitly programmed”
Arthur Samuel
Artificial Intelligence Pioneer
Machine Learning Algorithms
• A probabilistic framework to create models
used for predictions
• Predictive models are developed iteratively
• Models are refined until they converge
– i.e. output gets close to a specific value
Types of ML Algorithms
• Unsupervised Learning
– Group objects by similar characteristics
– Given inputs (X), find label for each observation
• Supervised Learning
– Given inputs (X) and output (Y)
– Find function f that maps X to Y
– Given new inputs (Xnew), predict value/label (Ynew)
Types of Supervised Learning
• Regression
– Try to predict a value (continuous variable)
• Classification
– Try to predict a label (discrete variable)
Analytics in Healthcare
“Advanced analytics can be used to improve
medical outcomes, increase financial
performance, deepen relationships with
customers and patients, and drive new medical
innovations”
Jason Burke
Author of Health Analytics
Healthcare Challenges
• US Healthcare spending = $3.4 trillion / year
Healthcare Challenges
• US system wastes $750 billion annually
Source: Washington Post (Sept 2012). Retrieved from https://www.washingtonpost.com/news/wonk/wp/2012/09/07/we-spend-
750-billion-on-unnecessary-health-care-two-charts-explain-why/
Healthcare Challenges
• Low quality
– To Err is Human Report:
• 44,000 - 98,000 deaths to preventable medical errors
– Rates poorly when compared to other countries
• Last in 2014 Commonwealth Fund survey on:
– Quality of care
– Access to doctors
– Equity
Solution: Big Data!
• Use data analytics and machine learning to
improve outcomes & lower costs
Types of Healthcare Analytics
Good News
• Most of the analytical and software
capabilities needed to drive systemic changes
in healthcare are already available as:
– Commercial software
– Open Source solutions 🎉
• Hadoop ecosystem
Big Data
• Characteristics (4 V’s of Big Data)
– Volume
• Scale of data
– Variety
• Diversity of data (many sources)
– Velocity
• Speed of data
– Veracity
• Certainty of data
• 5th V: Value?
Types of Data
• Structured
– Highly organized information that fits neatly into a
relational database (columns and rows)
• Unstructured
– Has internal structure, but does not fit into a
traditional database (or spreadsheet)
– Most data is unstructured (>80%)
– Can use Extract-Transform-Load (ETL) Processing to
turn unstructured data into structured data
Apache Hadoop
• Set of open source software technology components that
form a scalable system we can use to analyze Big Data
• Main features:
– Distributed storage and processing
• Data is too big for a single computer
– Runs on commodity hardware
– Fault tolerant
• Hardware failures are common and handle automatically
– Runs in Java Virtual Machine (JVM) environment
Sample Hadoop Stack
Source: Soong, K. (Feb 2016). Big Data Specialization. Retrieved from http://ksoong.org/big-data
Core Hadoop Components
• Yet Another Resource Negotiator (YARN)
– “Operating System” for Hadoop
– Controls how resources are allocated to different
applications and execution engines across cluster
Core Hadoop Components
• Hadoop Distributed File System (HDFS)
– Highly scalable storage system
Data File
Core Hadoop Components
• Hadoop Distributed File System (HDFS)
– Too big to fit on single machine => Partition
A B
C D
Core Hadoop Components
• Hadoop Distributed File System (HDFS)
– Split across multiple machines
– Data is protected against hardware failure
A B
C
A
D
A
C D
B
C D
Server 1 Server 2 Server 3 Server 4
Core Hadoop Components
• Hadoop Distributed File System (HDFS)
– Server goes down, we can still reconstruct data
A B
C
A
D
A
C D
B
C D
Server 1 Server 2 Server 3 Server 4
🔥
Core Hadoop Components
• Execution Engine
– Used when running analytic applications
– Distributed data allows us to perform parallel
computations
– MapReduce execution engine comes bundled with the
Hadoop core distribution
– Can plug-in different components
• Tez, Storm, Spark, etc
MapReduce Overview
Source: Eckroth, J. (n.d.). MapReduce. Retrieved from http://cinf401.artifice.cc/notes/mapreduce.html
HDFS
HDFS
MapReduce Example
Source: Zhang, X. (Jul 2013). A Simple Example to Demonstrate how does the
MapReduce work. Retrieved from http://xiaochongzhang.me/blog/?p=338
MapReduce Limitations
• Lot of read/writes
– I/O becomes bottleneck when performing analysis
• Machine Learning algorithms are iterative
– Many reads and writes cycles before convergence
– Slow runtime
• There must be a better way!
Apache Tez
• Optimizes workflow to limit number of writes
• Less I/O => faster execution
Apache Storm
• Execution engine for real-time streaming
applications
• Data is analyzed as it is generated BEFORE it is
stored
Apache Spark
• In-memory computational engine
• Read in data once, subsequent calculations
are done in-memory
Logistic Regression Runtime
Other Apache Projects
• Apache Hive
– SQL interface to data stored in HDFS
– Analysts with SQL experience can use Hadoop
Other Apache Projects
• Databases
– Apache HBase
– Apache Cassandra
Other Apache Projects
• Apache Kafka
– Messaging system for streaming data
Optimal Hadoop Workflow
• Depends on what you are trying to do
• Data Lake (HDFS)
– Storage repository that holds data in raw format
– Read into Spark to perform analysis
• Use Data Science and Machine Learning algorithms
• Demo will walkthrough this workflow
Dataset
• Texas Department of State Health Services
– Released State Inpatient / Outpatient data (link)
• Inpatient (IP) - 1999 to 2010
• Outpatient (OP) – Q42009 to 2010
– Data is de-identified and made available for free
– Tab-delimited text files (for each quarter)
• IP data – 450MB base table, 500MB charges
• OP data – 750MB base table, 700MB charges
Spark Background
• Java, Scala, Python, and R APIs (docs)
• Built around the concept of Resilient
Distributed Datasets (RDDs)
– Can perform MapReduce on RDD
OR
– Use the Spark DataFrame abstraction
*Recommended*
Spark DataFrame
• Distributed collection of rows and named
columns
– Think relational database or spreadsheet
– Akin to pandas DataFrame or R data.frame
# Displays the content of the DataFrame
df.show()
#
# +----+-------+
# | age| name|
# +----+-------+
# |null|Michael|
# | 30| Andy|
# | 19| Justin|
# +----+-------+
Questions?
• Slides and code available at
https://github.com/alysivji/talks

Mais conteúdo relacionado

Mais procurados

Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
Consumerization of BI - Bring Your Own Insight
Consumerization of BI - Bring Your Own InsightConsumerization of BI - Bring Your Own Insight
Consumerization of BI - Bring Your Own InsightJeroen ter Heerdt
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennaisethuraman R
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using HadoopSrikanth VNV
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeDataWorks Summit
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 

Mais procurados (20)

From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Consumerization of BI - Bring Your Own Insight
Consumerization of BI - Bring Your Own InsightConsumerization of BI - Bring Your Own Insight
Consumerization of BI - Bring Your Own Insight
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
 
Hadoop
HadoopHadoop
Hadoop
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 

Semelhante a Using The Hadoop Ecosystem to Drive Healthcare Innovation

Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundNidhiAhuja30
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 

Semelhante a Using The Hadoop Ecosystem to Drive Healthcare Innovation (20)

Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Big Data
Big Data Big Data
Big Data
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 

Mais de Dan Wellisch

Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Dan Wellisch
 
The Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsThe Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsDan Wellisch
 
Health Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesHealth Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesDan Wellisch
 
Driving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDriving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDan Wellisch
 
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...Dan Wellisch
 
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Dan Wellisch
 
Who Is A HIPAA Business Associate ?
Who Is A  HIPAA  Business  Associate ?Who Is A  HIPAA  Business  Associate ?
Who Is A HIPAA Business Associate ?Dan Wellisch
 
Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Dan Wellisch
 
Managing HIPAA Business Associate Relationships - April 24, 2018
Managing HIPAA Business Associate Relationships  -  April 24, 2018  Managing HIPAA Business Associate Relationships  -  April 24, 2018
Managing HIPAA Business Associate Relationships - April 24, 2018 Dan Wellisch
 
Using Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationUsing Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationDan Wellisch
 
Analyzing Breast Cancer Dataset with Azure Machine Learning Studio
Analyzing Breast Cancer Dataset with Azure Machine Learning StudioAnalyzing Breast Cancer Dataset with Azure Machine Learning Studio
Analyzing Breast Cancer Dataset with Azure Machine Learning StudioDan Wellisch
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepDan Wellisch
 
Helping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportHelping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportDan Wellisch
 
AWS Machine Learning Workshop
AWS Machine Learning WorkshopAWS Machine Learning Workshop
AWS Machine Learning WorkshopDan Wellisch
 
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?Dan Wellisch
 
HIPAA Panel Discussion
HIPAA Panel Discussion HIPAA Panel Discussion
HIPAA Panel Discussion Dan Wellisch
 
Using Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationUsing Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationDan Wellisch
 
Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Dan Wellisch
 
Driving to consumerism
Driving to consumerismDriving to consumerism
Driving to consumerismDan Wellisch
 

Mais de Dan Wellisch (19)

Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
 
The Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsThe Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health Goals
 
Health Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesHealth Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best Practices
 
Driving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDriving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare Costs
 
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
 
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
 
Who Is A HIPAA Business Associate ?
Who Is A  HIPAA  Business  Associate ?Who Is A  HIPAA  Business  Associate ?
Who Is A HIPAA Business Associate ?
 
Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018
 
Managing HIPAA Business Associate Relationships - April 24, 2018
Managing HIPAA Business Associate Relationships  -  April 24, 2018  Managing HIPAA Business Associate Relationships  -  April 24, 2018
Managing HIPAA Business Associate Relationships - April 24, 2018
 
Using Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationUsing Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural Transformation
 
Analyzing Breast Cancer Dataset with Azure Machine Learning Studio
Analyzing Breast Cancer Dataset with Azure Machine Learning StudioAnalyzing Breast Cancer Dataset with Azure Machine Learning Studio
Analyzing Breast Cancer Dataset with Azure Machine Learning Studio
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 
Helping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportHelping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision Support
 
AWS Machine Learning Workshop
AWS Machine Learning WorkshopAWS Machine Learning Workshop
AWS Machine Learning Workshop
 
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
 
HIPAA Panel Discussion
HIPAA Panel Discussion HIPAA Panel Discussion
HIPAA Panel Discussion
 
Using Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationUsing Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And Coordination
 
Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)
 
Driving to consumerism
Driving to consumerismDriving to consumerism
Driving to consumerism
 

Último

VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591adityaroy0215
 
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking ModelsDehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Modelsindiancallgirl4rent
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171Call Girls Service Gurgaon
 
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...Gfnyt.com
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipurgragmanisha42
 
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In FaridabadCall Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabadgragmanisha42
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...indiancallgirl4rent
 
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...chandigarhentertainm
 
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...Gfnyt
 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Udaipur Call Girls 📲 9999965857 Call Girl in Udaipur
Udaipur Call Girls 📲 9999965857 Call Girl in UdaipurUdaipur Call Girls 📲 9999965857 Call Girl in Udaipur
Udaipur Call Girls 📲 9999965857 Call Girl in Udaipurseemahedar019
 
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availableCall Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availablegragmanisha42
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Vipesco
 
VIP Call Girl Sector 32 Noida Just Book Me 9711199171
VIP Call Girl Sector 32 Noida Just Book Me 9711199171VIP Call Girl Sector 32 Noida Just Book Me 9711199171
VIP Call Girl Sector 32 Noida Just Book Me 9711199171Call Girls Service Gurgaon
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...Gfnyt.com
 
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅gragmanisha42
 

Último (20)

VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
 
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking ModelsDehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
Dehradun Call Girls Service 08854095900 Real Russian Girls Looking Models
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171
 
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Chandigarh Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
 
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In FaridabadCall Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
 
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
 
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...
👯‍♀️@ Bangalore call girl 👯‍♀️@ Jaspreet Russian Call Girls Service in Bangal...
 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
 
Udaipur Call Girls 📲 9999965857 Call Girl in Udaipur
Udaipur Call Girls 📲 9999965857 Call Girl in UdaipurUdaipur Call Girls 📲 9999965857 Call Girl in Udaipur
Udaipur Call Girls 📲 9999965857 Call Girl in Udaipur
 
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availableCall Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510
 
VIP Call Girl Sector 32 Noida Just Book Me 9711199171
VIP Call Girl Sector 32 Noida Just Book Me 9711199171VIP Call Girl Sector 32 Noida Just Book Me 9711199171
VIP Call Girl Sector 32 Noida Just Book Me 9711199171
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF  ...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Jaispreet Call Girl Services in Jaipur QRYPCF ...
 
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
 

Using The Hadoop Ecosystem to Drive Healthcare Innovation

  • 1. Using the Hadoop Ecosystem to Drive Healthcare Innovation Aly Sivji April 25, 2017
  • 2. About Me • Aly Sivji – Twitter: @CaiusSivjus – Blog: http://alysivji.github.io • Senior Analyst @ IBM Watson Health – Value-Based Care: Planning Solutions • Grad Student @ Northwestern University – Medical Informatics • Interests: – Technology 🐍 – Data 📈 – Star Trek 🖖🖖
  • 3. Overview • Big Data drives most industries
  • 4. Overview • What about Healthcare? – Machine Learning • Fraud detection ($65+ billion lost every year) – Wired Article – dataiku - Detecting Medicare Fraud • Preventing unnecessary procedures – Data Mining • Identifying medication prescribed together – Recommender Systems • Finding similar patients
  • 5. Overview Healthcare is Different. People who work in healthcare Additional Reading • John Halamka (The Health Care Blog) • Health Catalyst
  • 6. Overview • Data Analytics / Data Science – Retrospective versus Predictive • Machine Learning – Types of Algorithms • Healthcare Analytics
  • 7. Overview • Apache Hadoop Ecosystem – Big Data framework – Distributed computation on commodity hardware – Demo!
  • 8. Road to Electronic Health Records 1920s – Modern record keeping begins 1960s – Dr. Larry Weed introduces problem- oriented medical records 1972 – Regenstrief Institute develops first EMR System 1980s-90s – Siloed adoption by departments & admin 1996 – HIPAA establishes national standards for electronic health records 2004 – President Bush calls for Computerized Health Records
  • 9. 2009: EHRs Go Mainstream • HITECH Act passed by President Obama – $25.9 billion to expand Health IT (HIT) adoption • Meaningful Use (MU) program – Incentive payments for using HIT to • Improve quality, safety, efficiency of care • Engage patients • Increase care co-ordination – Goal: MU compliance => better outcomes
  • 10. EHR Adoption: Doubled Since 2008 Office-based Physician Electronic Health Record Adoption (2005-2015) Source: Office of the National Coordinator for Health Information Technology. 'Office-based Physician Electronic Health Record Adoption,' Health IT Quick-Stat #50. dashboard.healthit.gov/quickstats/pages/physician-ehr-adoption-trends.php. Dec 2016.
  • 11. Health Data Today • Electronic Health Records • Genomic Data ($1000 genome) • Medical Internet of Things (mIoT) • Wearable devices • Bottom Line: Data is growing Big Data = 'Bigger Data' in Healthcare (article)
  • 12. Data Analytics • Businesses collect lots of data – IBM: 90% of world’s data created in last 2 years • How can we find hidden patterns in the data and make information actionable? Data Science!
  • 13. Types of Analytics • Retrospective Analytics – Summarizing historical activity / performance – Limited scope for making future plans • Better than nothing
  • 14. Types of Analytics • Predictive Analytics – Finding patterns (correlations) between historical environment and results – Apply to current environment to make predictions
  • 15. Predictive Analytics "Once you have enough data, you start to see patterns. You can then build a model of how these data work. Once you build a model, you can predict.” Michael Wu Chief Scientist, Lithium Technologies
  • 17. Machine Learning (ML) “Field of study that gives computers the ability to learn without being explicitly programmed” Arthur Samuel Artificial Intelligence Pioneer
  • 18. Machine Learning Algorithms • A probabilistic framework to create models used for predictions • Predictive models are developed iteratively • Models are refined until they converge – i.e. output gets close to a specific value
  • 19. Types of ML Algorithms • Unsupervised Learning – Group objects by similar characteristics – Given inputs (X), find label for each observation • Supervised Learning – Given inputs (X) and output (Y) – Find function f that maps X to Y – Given new inputs (Xnew), predict value/label (Ynew)
  • 20. Types of Supervised Learning • Regression – Try to predict a value (continuous variable) • Classification – Try to predict a label (discrete variable)
  • 21. Analytics in Healthcare “Advanced analytics can be used to improve medical outcomes, increase financial performance, deepen relationships with customers and patients, and drive new medical innovations” Jason Burke Author of Health Analytics
  • 22. Healthcare Challenges • US Healthcare spending = $3.4 trillion / year
  • 23. Healthcare Challenges • US system wastes $750 billion annually Source: Washington Post (Sept 2012). Retrieved from https://www.washingtonpost.com/news/wonk/wp/2012/09/07/we-spend- 750-billion-on-unnecessary-health-care-two-charts-explain-why/
  • 24. Healthcare Challenges • Low quality – To Err is Human Report: • 44,000 - 98,000 deaths to preventable medical errors – Rates poorly when compared to other countries • Last in 2014 Commonwealth Fund survey on: – Quality of care – Access to doctors – Equity
  • 25. Solution: Big Data! • Use data analytics and machine learning to improve outcomes & lower costs
  • 26. Types of Healthcare Analytics
  • 27. Good News • Most of the analytical and software capabilities needed to drive systemic changes in healthcare are already available as: – Commercial software – Open Source solutions 🎉 • Hadoop ecosystem
  • 28. Big Data • Characteristics (4 V’s of Big Data) – Volume • Scale of data – Variety • Diversity of data (many sources) – Velocity • Speed of data – Veracity • Certainty of data • 5th V: Value?
  • 29. Types of Data • Structured – Highly organized information that fits neatly into a relational database (columns and rows) • Unstructured – Has internal structure, but does not fit into a traditional database (or spreadsheet) – Most data is unstructured (>80%) – Can use Extract-Transform-Load (ETL) Processing to turn unstructured data into structured data
  • 30. Apache Hadoop • Set of open source software technology components that form a scalable system we can use to analyze Big Data • Main features: – Distributed storage and processing • Data is too big for a single computer – Runs on commodity hardware – Fault tolerant • Hardware failures are common and handle automatically – Runs in Java Virtual Machine (JVM) environment
  • 31. Sample Hadoop Stack Source: Soong, K. (Feb 2016). Big Data Specialization. Retrieved from http://ksoong.org/big-data
  • 32. Core Hadoop Components • Yet Another Resource Negotiator (YARN) – “Operating System” for Hadoop – Controls how resources are allocated to different applications and execution engines across cluster
  • 33. Core Hadoop Components • Hadoop Distributed File System (HDFS) – Highly scalable storage system Data File
  • 34. Core Hadoop Components • Hadoop Distributed File System (HDFS) – Too big to fit on single machine => Partition A B C D
  • 35. Core Hadoop Components • Hadoop Distributed File System (HDFS) – Split across multiple machines – Data is protected against hardware failure A B C A D A C D B C D Server 1 Server 2 Server 3 Server 4
  • 36. Core Hadoop Components • Hadoop Distributed File System (HDFS) – Server goes down, we can still reconstruct data A B C A D A C D B C D Server 1 Server 2 Server 3 Server 4 🔥
  • 37. Core Hadoop Components • Execution Engine – Used when running analytic applications – Distributed data allows us to perform parallel computations – MapReduce execution engine comes bundled with the Hadoop core distribution – Can plug-in different components • Tez, Storm, Spark, etc
  • 38. MapReduce Overview Source: Eckroth, J. (n.d.). MapReduce. Retrieved from http://cinf401.artifice.cc/notes/mapreduce.html HDFS HDFS
  • 39. MapReduce Example Source: Zhang, X. (Jul 2013). A Simple Example to Demonstrate how does the MapReduce work. Retrieved from http://xiaochongzhang.me/blog/?p=338
  • 40. MapReduce Limitations • Lot of read/writes – I/O becomes bottleneck when performing analysis • Machine Learning algorithms are iterative – Many reads and writes cycles before convergence – Slow runtime • There must be a better way!
  • 41. Apache Tez • Optimizes workflow to limit number of writes • Less I/O => faster execution
  • 42. Apache Storm • Execution engine for real-time streaming applications • Data is analyzed as it is generated BEFORE it is stored
  • 43. Apache Spark • In-memory computational engine • Read in data once, subsequent calculations are done in-memory Logistic Regression Runtime
  • 44. Other Apache Projects • Apache Hive – SQL interface to data stored in HDFS – Analysts with SQL experience can use Hadoop
  • 45. Other Apache Projects • Databases – Apache HBase – Apache Cassandra
  • 46. Other Apache Projects • Apache Kafka – Messaging system for streaming data
  • 47. Optimal Hadoop Workflow • Depends on what you are trying to do • Data Lake (HDFS) – Storage repository that holds data in raw format – Read into Spark to perform analysis • Use Data Science and Machine Learning algorithms • Demo will walkthrough this workflow
  • 48.
  • 49. Dataset • Texas Department of State Health Services – Released State Inpatient / Outpatient data (link) • Inpatient (IP) - 1999 to 2010 • Outpatient (OP) – Q42009 to 2010 – Data is de-identified and made available for free – Tab-delimited text files (for each quarter) • IP data – 450MB base table, 500MB charges • OP data – 750MB base table, 700MB charges
  • 50. Spark Background • Java, Scala, Python, and R APIs (docs) • Built around the concept of Resilient Distributed Datasets (RDDs) – Can perform MapReduce on RDD OR – Use the Spark DataFrame abstraction *Recommended*
  • 51. Spark DataFrame • Distributed collection of rows and named columns – Think relational database or spreadsheet – Akin to pandas DataFrame or R data.frame # Displays the content of the DataFrame df.show() # # +----+-------+ # | age| name| # +----+-------+ # |null|Michael| # | 30| Andy| # | 19| Justin| # +----+-------+
  • 52. Questions? • Slides and code available at https://github.com/alysivji/talks

Notas do Editor

  1. Before we get to what we’re talking about. I’ll talk about me.
  2. Data has been making a huge difference in other industries Chase uses machine learning algorithms to flag purchases that could be fraudulent. Last time this happened, I booked my flight using my American Airlines card and booked my hotel and conference on my United card. Chase didn’t know about the flight so it asked for my confirmation. Saves them money for having to pay for fraudulent purchases. Amazon uses data mining to find products purchased together and makes suggestions to increase revenue. Spark was created in Scala and most people who learn Scala do so in order to use Spark in its native language. Amazon doesn’t know this, but it can use data to figure this out. Netflix’s recommendation system finds users who are similar to you and uses their ratings to make predictions for media for you to watch
  3. Medical fraud dedection could be more robust or similar algorithms can find unnecessary procedures (purchases that do not match my profile) Data mining to suggest medication that is always prescribed together if an order is missing it Recommendation system to find similar patients. Group them by the treatment prescribed, rate their outcomes and use that information to suggest optimal course of action Why is this not widespread in healthcare?
  4. People who work in healthcare know, healthcare is different. We won’t really go into too many details why, but you can find out more at the links provided. I will spend some time discussing how healthcare has changed and made it easier to facilitate a data revolution
  5. What do we mean by data revolution? Data is ubiquitous... We’ll explore data science in some depth to understand the basic principles of the field and get a grasp on how we can make our information actionable Bee is Buzzword Bee! I’ll try to include him every time I use a buzzword
  6. Next we’ll talk about we can use the Hadoop ecosystem to analyze healthcare data
  7. Is paved with good intentions ;) 1920s [1] Healthcare professionals realized that documenting patient care benefited both providers and patients. Patient records established the details, complications and outcomes of patient care. Once healthcare providers realized that they were better able to treat patients with complete and accurate medical history, documentation became wildly popular. Health records were soon recognized as being critical to the safety and quality of the patient experience. 1960s [2] Charting how we currently know it. First, a patient database is collected. Then use that information to start the diagnosis process. Database is very thorough contains: Family history Prior encounter information Lab results Current health status 1972 [1, 2] There are quite a few cases of electronic record system pilots (thru universities and large healthcare facilities), this is the first major system that was developed. Did not attract many physicians 1980s-90s [1, 2] Computers made their way into hospitals, like they did in every other professional environment, but systems did not speak to each other 1996 HIPAA was passed and national standards for electronic health records was established 2004 [1, 3] In his 2004 State of the Union, President George W Bush calls for computerized health records. Established the Office of the National Coordinator for Health Information Technology. It coordinates nationwide efforts to implement HealthIT and electronic exchange of health information. References [1] http://www.rasmussen.edu/degrees/health-sciences/blog/health-information-management-history/ [2] http://www.nethealth.com/a-history-of-electronic-medical-records-infographic/ [3] https://en.wikipedia.org/wiki/Office_of_the_National_Coordinator_for_Health_Information_Technology
  8. Meaningful Use provided incentive payments to healthcare providers who could demonstrate they used health information technology in a ‘meaningful way’ to improve quality, engage patients, increase care coordination. Goal is that MU compliance will result in: Better clinical outcomes Improved population health outcomes Increased transparency and efficiency Empowered individuals https://en.wikipedia.org/wiki/Health_Information_Technology_for_Economic_and_Clinical_Health_Act https://www.healthit.gov/providers-professionals/meaningful-use-definition-objectives
  9. Did it work? Well… it did increase EHR adoption
  10. * EHR systems have a wealth of data and are collecting more each day * Genomic sequencing costs less than $1000 dollar, I’ve heard about a race to $100 as well * Medical sensors are collecting information at a dizzying pace. One big application is patient sensors in post-acute care environments where patients are hooked up to machines collecting real-time data * People are more concerned about their health than ever before and the consumer wearable industry is growing.
  11. But we’re getting ahead of ourselves. I need to introduce the topic of data analytics References [1] https://datascience.berkeley.edu/about/what-is-data-science/
  12. References [1] http://blog.datagravity.com/the-transition-to-predictive-analytics/
  13. References [1] http://blog.datagravity.com/the-transition-to-predictive-analytics/
  14. References [1] http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-id/1113279
  15. References [1] https://marketoonist.com/2016/12/predictive-analytics.html
  16. This leads nicely into the topic of Machine Learning References http://www.ibmbigdatahub.com/blog/how-does-machine-learning-work?cm_mmc=OSocial_Twitter-_-IBM+Analytics_Inbound+Marketing-_-WW_WW-_-B+Yelland+3-20-2017&cm_mmca1=000000VQ&cm_mmca2=10000779&
  17. References [1] http://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/ [2] http://www.ibmbigdatahub.com/blog/how-does-machine-learning-work
  18. Why is this relevant to us in healthcare?
  19. Analytics is suited to the specific challenges in healthcare References [1] http://www.pbs.org/newshour/rundown/new-peak-us-health-care-spending-10345-per-person/ [2] http://www.pgpf.org/chart-archive/0006_health-care-oecd
  20. References [1] https://en.wikipedia.org/wiki/To_Err_is_Human_(report) [2] http://time.com/2888403/u-s-health-care-ranked-worst-in-the-developed-world/
  21. Healthcare analytics is broad as we can see from this diagram. Lots of areas where a little bit of deliberate data science and machine learning to make a difference
  22. Worth noting that most of the analytical capabilities needed to drive systemic changes in healthcare are already available in commercial software
  23. So let’s start talking about Big Data. What is big data? In healthcare, there is a lot of data… each genome is around 200GB of raw data. Lots of different information… clinical, notes, lab information, demographic result data, patient generated data Velocity data... Real time sensors monitoring patients Veracity... How sure are we that the data we get is correct? References [1] http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data
  24. References [1] https://www.trifacta.com/blog/structured-unstructured-data/ [2] http://sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it/
  25. How can we deal with all this data? Hadoop Ecosystem!
  26. References [1] http://www.littlebeelibrary.com/pdfs/Apache_Hadoop.pdf
  27. Execution engine is used to perform calculations on the underlying data
  28. The MapReduce engine runs the map step on all nodes in the cluster to produce a set of intermediate output files. It then sorts these intermediate les and then runs a reduce step to take the sorted intermediate les and aggregate the data to get a final result. This process is scalable but relatively slow because of the need to write lots of intermediate les to disk and then read them again.
  29. The key takeaway from this presentation: Use Spark to do all calculations