As part of the 2018 HPCC Systems Community Day event:
Archway Health shares their experience with using HPCC Systems alongside SAS for supporting a bundled payments program solution in the health industry.
Luke Pezet is a solution and software architect with over 10 years of experience in pioneering web analytic tools and complex data management projects. His expertise includes designing and implementing big data solutions to process millions of data inputs on a daily basis to monitor, assess, and improve performance. Mr. Pezet is a successful technology entrepreneur who was an early employee of IgoUgo.com, which was sold to Travelocity, and co-founder of Tripfilms, one of the largest databases of travel videos on the web. He also has served as interim CTO for The Achievement Network (ANET), a non-profit education company that helps schools use real time assessment data to improve student performance. At ANET, he implemented web tools for staff to help scale their operations and end-user web sites for teachers and principals to access reports and analysis. Within just a few years, this platform has helped ANET grow from 13 schools in the Boston area to over 480 schools and 145,000 students across 10 states. ANET has been recognized as a pioneer in education innovation and was named “New Schools Ventures Organization of the Year” in 2011. Mr. Pezet has also worked on data management projects with USA Today, Rand McNally, Microsoft, Samsung, and many others. Mr. Pezet holds a master’s degree in computer science from Rennes University in France.
2. “Change is the only constant in life”
HPCC Systems vs SAS: The Final Countdown 2
— Heraclitus
3. Me, Me and Me...at Archway
• Solution Architect with over 15 years of experience
• Worked for Archway Health Advisors ~ 5 years
• Archway helps care providers manage bundled payment programs.
• Needed to process medical claims 5 years ago and chose HPCC Systems over SAS,
Hadoop*, etc.
• New employees brought other technologies, including SAS
3HPCC Systems vs SAS: The Final Countdown
4. Introduction
HPCC Systems
• Open-source data-intensive computing system platform developed by
LexisNexis Risk Solutions.
• Development started before 2000.
• Scalable Data refinery called Thor and scalable rapid data delivery engine
called ROXIE.
SAS (“Statistical Analysis System”)
• Proprietary software suite developed by SAS Institute that provides advanced
analytics.
• Development started in 1966.
HPCC Systems vs SAS: The Final Countdown 4
5. Use Case
• Based on Regression With SAS Chapter 1 - Simple And Multiple Regression web book
from Institute for Digital Research and Education at UCLA.
• It's about data analysis and demonstrates how to use software for regression
analysis. This is not about the statistical basis of multiple regression or which
criterion is best to choose models, etc.
• Data was created by randomly sampling 400 elementary schools from the California
Department of Education's API 2000 dataset.
• Contains a measure of school academic performance as well as other attributes such
as class size, enrollment, poverty, etc.
5HPCC Systems vs SAS: The Final Countdown
6. Helper
SASsy ECL bundle
ecl-bundle install https://github.com/lpezet/SASsy.git
Usage:
IMPORT SASsy;
// OR
IMPORT SASsy.PROC;
6HPCC Systems vs SAS: The Final Countdown
7. Loading data
SAS
DATA scores;
INFILE datalines dsd;
INPUT Name : $9. Score1-Score3 Team ~ $25.
Div $;
DATALINES;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
ECL
layout := { STRING Name; UNSIGNED Score1;
UNSIGNED Score2; UNSIGNED Score3; STRING
Team; STRING Div; };
scores := DATASET( [ { ‘Smith’,12,22,46,’Green
Hornets, Atlanta’, ‘AAA’ }, { ‘Mitchel’,
23,19,25,’High Volts, Portland’, ‘AAA’ }, { ‘Jones’,
09, 17, 54, ‘Vulcans, Las Vegas’, ‘AA’ } ], layout );
HPCC Systems vs SAS: The Final Countdown 7
8. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 8
PROC PRINT data=”elemapi” (obs=5);
run;
9. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 9
IMPORT SASsy.PROC;
PROC.PRINT( ElemAPIDS, 5 );
// CHOOSEN( ElemAPIDS, 5 );
10. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 10
PROC CONTENTS data=”elemapi”;
run;
11. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 11
IMPORT SASsy.PROC;
PROC.CONTENTS( ElemAPIDS );
12. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 12
PROC MEANS data=”elemapi”;
var api00 acs_k3 meals full;
run;
13. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 13
IMPORT SASsy.PROC;
PROC.MEANS( oMeans, ElemAPIDS,
'api00,acs_k3,meals,full' );
OUTPUT( oMeans, NAMED('MEANS'));
14. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 14
IMPORT DataPatterns;
DataPatterns.Profile( ElemAPIDS,
features :=
‘fill_rate,best_ecl_types,cardinali
ty,lengths,min_max,mean,std_dev,qua
rtiles,correlations’ );
15. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 15
PROC UNIVARIATE data=”elemapi”;
var acs_k3;
run;
16. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 16
IMPORT SASsy.PROC;
PROC.UNIVARIATE( ElemAPIDS,
'acs_k3' );
Extreme - Lowest Extreme - Highest
Missing Values
Basics
17. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 17
PROC FREQ data=”elemapi”;
tables acs_k3;
run;
18. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 18
IMPORT SASsy.PROC;
PROC.FREQ( ACSK3Freq, ElemAPIDS,
'acs_k3' );
OUTPUT( ACSK3Freq, NAMED(‘Frequency’));
19. Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 19
PROC UNIVARIATE data=”elemapi”;
var acs_k3;
histogram / cfill=gray;
run;
20. Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 20
IMPORT Visualizer;
PlotData := TABLE( SORT( ElemAPIDS,
acs_k3 ), { STRING label := acs_k3;
COUNT(GROUP); }, acs_k3 );
OUTPUT(oPlotData,
NAMED('PlotData'));
Visualizer.MultiD.Column('myChart',,
'PlotData');
21. MACROs
SAS
%MACRO MISSINGCHECK(VAR, TYPE);
PROC SQL;
CREATE TABLE &VAR._&TYPE. AS
SELECT DISTINCT CLM_TYPE_1, COUNT(SYSKEY) AS
&VAR._MISSING
FROM OUTPUT.&TYPE.
WHERE &VAR. IS MISSING
GROUP BY CLM_TYPE_1
ORDER BY CLM_TYPE_1;
QUIT;
%MEND MISSINGCHECK;
%MISSINGCHECK(MEMBER_ID, &EPI.GENERAL);
%MISSINGCHECK(CLAIM_ID, &EPI.GENERAL);
%MISSINGCHECK(MS_DRG, &EPI.GENERAL);
%MISSINGCHECK(ADM_DGNS, &EPI.GENERAL);
ECL
MissingCheck( pDS, pField, pMissingValue, pByField ) :=
FUNCTIONMACRO
#UNIQUENAME(tabled)
%tabled% := TABLE( pDS( pField = pMissingValue ), {
pByField; COUNT(GROUP); }, pByField );
#UNIQUENAME(sorted)
%sorted% := SORT( %tabled%, pByField);
RETURN %sorted%;
ENDMACRO;
MissingCheck( ElemAPIDS, meals, ‘’, dnum );
MissingCheck( ElemAPIDS, acs_k3, ‘’, dnum );
MissingCheck( ElemAPIDS, api00, ‘’, dnum );
HPCC Systems vs SAS: The Final Countdown 21
22. Multiple Regression (SAS)
HPCC Systems vs SAS: The Final Countdown 22
PROC REG data="c:sasregelemapi"
model api00 = acs_k3 meals full;
run;
24. More
ECL Machine Learning Library
• Statistics (e.g. Means, Std Deviation, Modes, Medians, NTiles, etc.)
• Regression
• Clustering (e.g. K-Means)
• Classification (e.g. Logistic Regression, Decision Trees, Perceptron, etc.)
• Unstructured Data (Tokenize, Transform, CoLocation)
• Association (e.g. AprioriN)
• Matrix Manipulation
HPCC Systems vs SAS: The Final Countdown 24
25. Today
HPCC Systems used to process data at scale and on a more frequent basis
• Process Medical Claims using Thor and deliver results using Roxie
• Run ETL/ELT processes to load, clean, prepare data
• Run more advanced processing to generate outputs (Bundle Engine)
• Clusters of 8+ nodes
SAS used to run research, exploratory data analysis and modeling.
• Uses HPCC outputs as input
• Single instance
• Restricted on CPU/RAM
25HPCC Systems vs SAS: The Final Countdown
26. Tomorrow
HPCC Systems
• Still run ETL/ELT processes to load, clean, prepare data
• Run processes that need to happen more frequently
• Porting more Advanced Data Analysis And Modeling features to ECL
• Make it easier to create clusters to make experimentation effortless
SAS
• 1 server
• R&D for now
• Validate/compare results with HPCC Systems
26HPCC Systems vs SAS: The Final Countdown