Mais conteúdo relacionado
Semelhante a SAS M2007 Presentation (20)
SAS M2007 Presentation
- 1. SAS M2007 Data Mining
Conference
October 12, 2007
Data Preparation for Las Vegas
Data Mining in Health Care
using SAS
S. Greg Potts, MBA
1
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 2. Introduction
Data mining practitioners are well aware that most of the total effort required to Data Preparation for
Data Mining in Health
complete a data mining project is not spent in the “trendier” aspects of the project Care using SAS
such as problem definition or algorithm/technique selection, application, and
interpretation of the results.
<< Previous | Next >>
Unfortunately, most time – up to 80% often cited – is spent “in the trenches” Contents
acquiring the data (i.e., most business data today is stored in transactional data AFMC and QI in Medicare
and Medicaid . . . . . . . . . 3
warehouses where data elements essential for mining are dispersed across
AFMC as QIO and
multiple tables), getting to know the data (i.e., conducting exploratory data EQRO . . . . . . . . . . . . . . .4
analysis), and preparing the data mining table (i.e., summarizing data to the “unit Data Preparation for
of analysis” and creating derived variables to be used as targets and inputs in the Directed Data Mining. . . . 6
modeling analysis) in the form required by the data mining algorithm (i.e., most Data Preparation for
Undirected Data Mining .16
current algorithms require data in the form of a a onerowpersubject data table).
References . . . . . . . . .28
This presentation will present two case studies in using SAS to extract and
prepare data for data mining. The first case will explore how to extract and
prepare transactional (Medicaid claims) data for directed data mining, where the
goal is to explain or predict the value(s) of a particular target variable. The
second case will explore data extraction and preparation for undirected data
mining (cluster analysis) using hospitallevel data supplied to Medicare Quality
Improvement Organizations (QIOs). 2
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 3. AFMC and QI in Medicare and
Data Preparation for
Medicaid Data Mining in Health
Care using SAS
• Medicare provides health insurance for people << Previous | Next >>
age 65 and over, those with permanent kidney
failure and certain people with disabilities (more
than 400,000 individuals in Arkansas)
• Medicaid is a jointlyfunded, FederalState
medical assistance program for certain low
income and needy people (more than 600,000
individuals in Arkansas).
3
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 4. AFMC as QIO and EQRO
Data Preparation for
• Centers for Medicare & Medicaid Services (CMS) designated Data Mining in Health
Care using SAS
Quality Improvement Organization (QIO) for the state of Arkansas.
Assist providers (Hospitals, Physicians, Nursing Homes, etc.) with << Previous | Next >>
measuring and reporting quality measures and redesigning care
processes; provide statistical support and assistance in interpreting
data results.
• External Quality Review Organization (EQROlike) & Review Agent for
the Arkansas Medicaid Program.
Prior Authorization Reviews, Retrospective Reviews, HEDIS
measures, Patient Satisfaction Surveys, and Data Mining.
• Multidisciplinary team of clinicians, statisticians, and consultants
• Goal: To ensure that everyone receives the right care, at the right
place, at the right time – every time.
4
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 5. Today’s Health Care Environment
Data Preparation for
Data Mining in Health
Care using SAS
EMPLOYER/CONSUMER
RISING DEMANDS FOR
COSTS << Previous | Next >>
ACCOUNTABILITY &
$$$$$$$$$ TRANSPARENCY
Many challenges
exist in today’s
dynamic health
EXPLOSION QUALITY WIDE VARIATIONS
care environment.
IN CLINICAL IN QUALITY
KNOWLEDGE IMPROVEMENT
PROVIDER/PAYER
INDUSTRY DATA SILOS
FRAGMENTATION
5
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 6. Data Preparation for
Data Mining in Health
Care using SAS
Data Preparation for << Previous | Next >>
Directed Data Mining
6
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 7. Directed Data Mining in Medicaid
Data Preparation for
Data Mining in Health
• AFMC Medicaid Data Mining Projects are often conducted with an Care using SAS
eye toward identifying high cost drivers for utilization review and/or
cost containment or to identify care coordination inefficiencies, << Previous | Next >>
which can be opportunities for quality improvement.
• Projects are client and/or literaturebased.
• In directed data mining, the goal is to explain or predict the
value(s) of a target variable.
• Recipient/Member is the often the “Unit of Analysis”. Target is
(usually) total costs ($) per member while inputs are (usually)
binary and represent diagnosis, procedure, drug, and provider
type code classes.
• Techniques used include decision trees and (infrequently)
regression.
7
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 8. Medicaid Data Source
Data Preparation for
Data Mining in Health
• Arkansas Medicaid Decision Support System (DSS) Data Care using SAS
Warehouse
contains over 6 years of historical medical claims data among 500+ << Previous | Next >>
data tables (some containing millions of rows of data) at granular
levels of detail along with eligibility and demographic data.
Claims Analysis
Columns Clm Num
Dtl Num
Recip ID
Amt Paid
NDC Code
Drug Detail Eligibility Enrollment
Columns NDC Code Columns Recip ID Columns Recip ID
Drug Nam Plan Code County
Drug Class Elig Curr DOB
…….. Elig Beg Gender
Elig End Race
……. ……
8
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 9. Data Acquisition: Medicaid Data
Data Preparation for
Queries Data Mining in Health
Care using SAS
® ®
• Access Oracle Data Warehouse through BusinessObjects to learn table
®
relationships and build Oracle SQL queries when necessary.
<< Previous | Next >>
• ®
AFMC licenses SAS/ACCESS Interface to ODBC.
® ®
• Established a Windows ODBC connection with Oracle Data Warehouse. Data is acquired
via SQL queries.
®
• Copy and Paste Oracle SQL code into SAS program and execute.
• Advantages:
1) Pulls data back in SAS Data set and,
®
2) Bypasses BusinessObjects’ query size limitations. 9
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 10. Medicaid Transactional Claims
Data Preparation for
Data Example Data Mining in Health
Care using SAS
Recipient Clm Num DTL Prov Dx Proc Amt_ NDC Code Drug …
ID Class
<< Previous | Next >>
Num Type Paid
001 050600407 1 Phys 250.00 99212 $25.00 …
001 050600407 2 Phys 640.01 81000 $43.38 …
002 050600408 1 Pharm $106.45 00406035705 280808 …
002 050600409 1 Phys 250.10 99212 $45.13 …
003 050600410 1 Phys 427.0 A0426 $240.46 …
003 050600411 2 Phys 427.0 A0390 $225.72 …
• Multiple rows of claim detail data make up one claim per recipient/member.
• Challenge is to summarize data to the recipient level and to create target and
input variables to be used in modeling analysis.
10
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 11. SAS Procedures for Transactional Data
Data Preparation for
Preparation – Summing Costs Data Mining in Health
Care using SAS
• Use PROC SQL to summarize paid amounts (AMT_PAID) to recipient
level and to break claim costs out by medical vs. pharmacy. << Previous | Next >>
• Total Costs (target) are calculated from the variables created in these
tables using a DATA step later in the data prep program.
11
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 12. SAS Procedures for Transactional Data
Data Preparation for
Preparation – Creating Binary Inputs Data Mining in Health
Care using SAS
• Use PROC SQL to create table of recipient IDs with nonmissing Dx codes.
<< Previous | Next >>
• Use FIRST. byprocessing to create an enumeration variable. As shown, a
number of SAS
procedures and
language
• Sort Data Set statements are
needed to
• Convert sorted data from rows to columns using PROC TRANSPOSE
transform
transactional data
into a onerowper
• Array to create binary input variables (0,1) for code classes. unitofanalysis
format suitable for
directed data
mining
• Repeat process for Procedure Class, Drug Class, and Provider Types techniques.
12
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 13. Sample Directed Data Mining
Data Preparation for
Modeling Data Set Data Mining in Health
Care using SAS
Total Cost = f (Clinical Diagnoses, Procedures, Prescription Drugs, Providers Seen, Age, Gender)
per Recipient
<< Previous | Next >>
Input
Variables
Target
Mining data set
Variable
contains both
Dx Dx
Interval (i.e., Total
Total Group1
(001139
Group2
(140239
Procedure Costs) and binary
Recipient SFY … Code Group1 … Thera Class1 …
ID 2004
Infectious &
Parasitic
Neoplasms)
(0010001999 (04:00.00
variables (i.e.,
Costs Anesthesiology) Antihistamines)
Diseases)
DX_GROUP_1)
001 $15,232.84 1 1 … 1 … 1 …
002 $2,006.72 1 0 … 1 … 0 …
003 $8,354.89 0 1 … 1 … 1 …
. . . . … . … . …
. . . . … . … . …
13
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 14. Why use Decision Trees?
Data Preparation for
Data Mining in Health
• Excellent Tool for Data Mining Profiling Care using SAS
• Allow you to easily see patterns in data with respect << Previous | Next >>
to a target variable (i.e., Interval – total cost $ per recipient).
• Decision Tree algorithm “reads” the data and determines the best
variable on which to “split” the data.
• Splits continue as long as they are statistically significant.
14
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 15. Sample Decision Tree Results
Data Preparation for
Data Mining in Health
SFY 2004 ContinuouslyEnrolled Medicaid Recipients Care using SAS
with a Diabetes Dx (250.0X250.9X)
(Partial Tree Results)
<< Previous | Next >>
This group of recipients with a Diabetes Dx
accrued average total costs more than three
times other recipients with the same diagnosis.
Why?
Child nodes from decision tree may require
more drilldown analysis or model tuning in the
form of variable reduction and reapplication.
15
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 16. Data Preparation for
Data Mining in Health
Care using SAS
<< Previous | Next >>
Data Preparation for
Undirected Data Mining
16
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 17. Undirected Data Mining in Medicare
Data Preparation for
Data Mining in Health
• AFMC Medicare Data Mining Projects are often conducted with an Care using SAS
eye toward identifying opportunities for quality improvement and
safeguarding Medicare program funds. << Previous | Next >>
• In undirected data mining, the goal is to uncover the hidden
structure in data without respect to a target variable.
• Cluster analysis (PROC FASTCLUS) is technique often used.
17
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 18. Hospital Payment Monitoring
Data Preparation for
Program (HPMP) Data Mining in Health
Care using SAS
• The Centers for Medicare and Medicaid Services (CMS) << Previous | Next >>
developed the Hospital Payment Monitoring Program
(HPMP) primarily to calculate and monitor the Medicare HPMP is a QIO
feeforservice paid claims error rate for inpatient acute Statement of
Work (SOW)
care hospital services.
priority.
• Under contracts with CMS, several companies –
including QIOs like AFMC – are responsible for
operating the HPMP.
18
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 19. HPMP, QIOs, and Reporting
Data Preparation for
• Each quarter, QIOs receive state hospitallevel data containing the Data Mining in Health
calculated paid claims rates and other summary measures for 14 Care using SAS
different target areas identified by CMS as prone to payment errors.
This quarterly report is known as the FirstLook Analysis Tool for << Previous | Next >>
Hospital Outlier Monitoring (FATHOM).
Example target area paid claims rate:
Target 12: Three Day Transfer to SNF
Numerator: count of discharges to a SNF with a threeday length of stay
Denominator: count of all discharges to a SNF or swing bed
• From the FATHOM data, QIOs can generate and distribute the Program
for Evaluating Payment Patterns Electronic Report (PEPPER). The
PEPPERs are hospitalspecific reports that allow Inpatient Prospective
Payment System (IPPS) hospitals to compare their own billing practices
in the 14 target areas with other IPPS hospitals within the state.
• Each QIO uses these data tools to work with hospitals in their state to
reduce improper admissions and DiagnosisRelated Group (DRG)
payment errors. 19
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 20. Resource Challenge and Answer
Data Preparation for
• With 50+ IPPS Arkansas hospitals and limited staff, AFMC Data Mining in Health
Care using SAS
desired to find a way to identify hospitals that have been “extreme
th
outliers” (95 percentile or above on two measures) over time, << Previous | Next >>
thus indicating a need for close monitoring and possible
notification.
• Answer: Cluster Analysis to segment
hospitals into 3 “like” groups based
on 5 of the 11 hospital/targetlevel
calculated measures in the data
(see right), then graphing the
results of two of the 5 measures
to determine “extreme outliers”.
20
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 21. Data Acquisition: Medicare
Data Preparation for
HospitalLevel Data Data Mining in Health
Care using SAS
• The FATHOM hospitallevel data is provided to each
<< Previous | Next >>
QIO from another CMS/HPMP contractor in a
®
Microsoft Access database (*.mdb).
• PROC IMPORT is used to import data table from the
*.mdb file.
21
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 22. SAS Procedures for Data Preparation
Data Preparation for
at UAlevel – Standardization Data Mining in Health
Care using SAS
• Per SAS/STAT 9.1.3 Online Documentation regarding
PROC FASTCLUS, Pg. 13821383: << Previous | Next >>
“Variables with larger variances exert a larger
influence in calculating the clusters...Therefore it
is necessary to standardize the variables before
performing the cluster analysis.”
• PROC STANDARD is used to standardize all variables
used in the cluster analysis to mean=0, std=1 prior to
invoking PROC FASTCLUS.
22
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 23. Cluster Analysis
Data Preparation for
Data Mining in Health
• Run PROC FASTCLUS against output data set Care using SAS
containing standardized variables.
<< Previous | Next >>
23
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 24. Plot to Produce Visual Representation
Data Preparation for
Data Mining in Health
• Run PROC GPLOT to plot two of the five variables used Care using SAS
in cluster analysis to determine outliers (note: highlighted
<< Previous | Next >>
syntax produces red vertical and horizontal references to
th
the 95 percentile value for each variable being plotted):
24
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 25. Plot Results
Data Preparation for
Data Mining in Health
Care using SAS
<< Previous | Next >>
• (Standardized) Variables used in cluster analysis being plotted are:
Outlier Value: A single number from 10 to 10, describing how unusual a
hospital is compared to all IPPS hospitals in the state.
Outlier Times Count: Outlier value weighted by number of discharges. This
measure captures both the unusualness of the hospitals’ target outlier value
and the volume of target discharges.
• Outliers facilities are those facilities that consistently fall in the
upperrightmost quadrant (>95th percentile on BOTH measures).
25
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 26. Sample Letter to Hospitals
Data Preparation for
Regarding Outlier Status Data Mining in Health
Care using SAS
<< Previous | Next >>
26
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 27. Conclusions
Data Preparation for
Data Mining in Health
• Most time spent in a Data Mining project is spent Care using SAS
acquiring and preparing data, not on
<< Previous | Next >>
algorithm/technique selection and application.
• SAS has a vast arsenal of tools to help you
acquire, prepare/transform, and mine your data.
27
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 28. References
Data Preparation for
Data Mining in Health
Data Preparation for Data Mining Using SAS Decision Trees in Enterprise Miner
Care using SAS
Mamdouh Refaat (Morgan Kaufmann/Elsevier) http://support.sas.com/documentation/onlin
http://books.elsevier.com/us/mk/us/subindex.asp?isbn=97 edoc/91pdf/sasdoc_91/em_gs_7281.pdf
80123735775&country=United+States&community=mk&r
<< Previous | Next >>
ef=&mscssid=KHFKVKDF6HNU8HJND9V1RB81QR712R SAS/STAT Procedures (PROC
2E or FASTCLUS)
http://www.amazon.com/PreparationMiningKaufmann http://support.sas.com/documentation/onlin
Management edoc/91pdf/sasdoc_91/stat_ug_7313.pdf
Systems/dp/0123735777/ref=pd_bbs_sr_1/1025583212 or
7763329?ie=UTF8&s=books&qid=1179937157&sr=11 http://www2.sas.com/proceedings/sugi24/
Stats/p27024.pdf
Base SAS Procedures and syntax (PROC IMPORT,
PROC SQL, PROC TRANSPOSE) Hospital Payment Monitoring Program
http://support.sas.com/documentation/onlinedoc/91pdf/sas (HPMP)
doc_913/base_proc_8977_new.pdf http://oig.hhs.gov/oas/reports/region3/3050
0007.pdf
SAS Language Reference (Arrays)
http://support.sas.com/documentation/onlinedoc/91pdf/sas
doc_913/base_lrconcept_9196.pdf
28
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 29. Acknowledgements
Data Preparation for
Data Mining in Health
• M2007 CoChairs: Care using SAS
•Jerry Oglesby, SAS Institute << Previous | Next >>
•Goutam Chakraborty, Oklahoma State
University
• Rona Bellinger, AFMC Manager of Web &
Graphic Services
• Karen Gabel and Tori Gammill, AFMC
HPMP Team Members
• AFMC’s Data Mining Team
29
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas
- 30. Questions???
Contact:
S. Greg Potts, MBA
Data Mining Team Leader
Arkansas Foundation for Medical Care
Office of Projects and Analysis
401 West Capitol, Suite 410
Little Rock, AR 72201
(501) 2128734 Phone
(501) 3751201 Fax
Email: spotts@afmc.org
30
©2007 Arkansas Foundation for Medical Care, Inc. SAS M2007 Data Mining Conference | October 12, 2007 | Las Vegas