The integration of SAS and Tableau can have significant business benefits. SAS and Tableau are ‘best of breed’ in their own areas: SAS in the area of Analytics and ‘Analytical Data Preparation’; Tableau in the area of data discovery, visualization and intuitive, interactive dashboarding. Consequently, it makes sense to find ways to combine these technologies to deliver an Integrated Information Framework which leverages the strengths of both solutions.
1. SAS/Tableau Integration
10 Steps for a Seamless SAS/Tableau
Experience
Patrick Spedding
Strategic Advisor, Business Intelligence & Analytics
See-Change Solutions Ltd
patrick@see-change.com.au
US: (949) 528-6665
Australia: (02) 8005-6148
au.linkedin.com/in/spedding
Copyright @ 2012 See-Change Solutions
@spedding
http://www. see-change.com.au
2. 10 Steps for a Seamless SAS/Tableau Experience
For the many organizations who have both SAS and Tableau, it makes sense to find ways to
integrate these technologies to provide an Integrated Information Framework which
leverages the strengths of both solutions.
This presentation covers 10 techniques for integrating SAS and Tableau, using SAS as a data
source and data preparation environment. SAS routines developed to feed the Tableau
solution will also be demonstrated.
Taking advantage of SAS and Tableau capabilities in this manner can provide a way to ‘rapid
prototype’ business reporting requirements, without the costs and delays typically seen
when attempting to model emerging business requirements in the Data Warehouse via
traditional ETL methods. In addition, this approach suggests a way to protect existing
investments in analytical reporting as developed by your SAS team, by providing a platform
to publish those reports for easy consumption, plus easy re-formatting and ‘slice & dice’ of
these reports in the Tableau environment.
Techniques covered will include commonly requested topics such as data and currency
formatting, relative date calculations, longitudinal data analysis, integrating SAS Web Stored
Processes and considerations for the use of SAS ACCESS and SAS ODBC/ OLE-DB.
Copyright @ 2012 See-Change Solutions
3. Agenda
• SAS/Tableau Integration:
1.
2.
3.
4.
5.
Extract formats
Integrating SAS Web Stored Processes
Use of SAS ODBC/OLE-DB
Date Formats
Extract useful elements of date fields/relative date
calculations
6. Currency Formats
7. Rename Column Names/columns used for levels in
hierarchies
8. Add descriptors for coded fields
9. Use SAS formats for more complex formats
10.Merge Disparate Data Sources
• Q&A
Copyright @ 2012 See-Change Solutions
4. SAS Data as a Source for Tableau:
Approaches
•
•
•
•
•
•
SAS Dataset -> CSV -> Tableau
SAS Report -> CSV -> Tableau
SAS Stored Process -> Tableau
SAS Data -> Datasource (RDBMS Connection) -> Tableau
SAS Data -> Datasource (via ODBC) -> Tableau
SAS Dataset -> OLE-DB -> Excel -> Tableau
Note: CSV is typically around 10X smaller than SAS7BDAT format
Copyright @ 2012 See-Change Solutions
5. SAS Dataset -> CSV -> Tableau
This method uses a SAS dataset to feed the Tableau environment. For
example, complex business logic can be built into a SAS (E. Guide) process,
then value can be added in Tableau Desktop (e.g. drill-downs, relative time
calculations, ratios), before displaying via Tableau Server). This is a good
approach for both prototyping BI requirements as well as ‘Analytical Data
Preparation’.
proc export
data=WORK.COMPARATIVE_PERFORMANCE
outfile=
"corpdfsSASOutputComparative
_Performance.txt"
dbms=dlm replace ;
delimiter = '|' ;
run ;
Copyright @ 2012 See-Change Solutions
6. SAS Report -> CSV -> Tableau
This method takes the output of a SAS report (eg Enterprise Guide report)
and ‘pivots’ the data in such a way as to provide a data input into Tableau.
Copyright @ 2012 See-Change Solutions
7. SAS Stored Process -> Tableau
This method takes a SAS report and enables it as a SAS Web Stored Process,
which can then be linked and run within Tableau. Security can be integrated
via ‘Single Signon’ if required. (Note: SAS Integration Technologies required)
http://<SAS Server>:8080/SASStoredProcess/do?_program=<Report
Name>&_action=properties
Note: For SAS Web Stored Processes with Prompts, need to add
&_action=properties to the URL
Copyright @ 2012 See-Change Solutions
8. SAS Data -> RDBMS-> Tableau
This method uses the SAS ‘PROC SQL’ method to output SAS results directly to
a relational table, for example a table within the Data Warehouse. With the
SAS/ACCESS interface, you reference database objects directly in a DATA step
or SAS procedure using the SAS LIBNAME statement. PROC SQL can be used
to update, delete or insert data into a relational table, for example via Bulk
Load.
Copyright @ 2012 See-Change Solutions
9. SAS Data -> ODBC -> Tableau
This method can use an ODBC Connection to allow any SAS dataset to be a
source for Tableau. The SAS ODBC driver can be used to create an ODBC
connection, from which a data source connection can be defined within
Tableau to point to the SAS data set.
http://support.sas.com/demosdownloads/setupcat.jsp?cat=ODBC%20Drivers
Note: SAS profile required to access product downloads
Note: Date fields are not properly interpreted unless it is a Tableau extract
Copyright @ 2012 See-Change Solutions
(400Mb download)
10. SAS Data -> OLE-DB -> Excel -> Tableau
This method can use an OLE-DB Connection to allow any SAS dataset to be a
source for Tableau. The SAS OLE-DB provider can be used to create an OLEDB, from which a data source connection can be defined within Excel. Tableau
can then point to the Excel file to retrieve the SAS data.
Copyright @ 2012 See-Change Solutions
11. Dealing with SAS Dates
In this example, we have a number of dates in our SAS dataset:
t1.rsmwrkdt FORMAT=DDMMYYS8. LABEL="Resumed Work Date"
AS 'Resumed Work Date'n
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_POLICY1 AS
SELECT t1.trandate FORMAT=DDMMYYS8.,
t1.polexpdt FORMAT=DDMMYYS8.,
t1.commdate FORMAT=DDMMYYS8.
FROM WORK.QUERY_FOR_POLICY t1
QUIT;
Copyright @ 2012 See-Change Solutions
12. Dealing with SAS Dates - Notes
If you're going to work with a date as a string type it's better to use ISO-8601 format of
YYYY-MM-DD. This is locale insensitive so you don't need to worry about
DD/MM/YYYY vs. MM/DD/YYYY. Your formula would then read:
DATE(LEFT([Period],4)
+ “-“ + MID([Period],5,2)
+ “-“ + RIGHT([Period],2))
This is an improvement, but string logic is much slower than numeric logic, so it would
be even better to work with this as numbers. Convert the [Period] field to be a number
instead of a string, then use the following:
DATEADD(‘DAY’, [YYYYMMDD]%100-1,
DATEADD(‘MONTH’, INT(([YYYYMMDD]%10000)/100)-1,
DATEADD(‘YEAR’, INT([YYYYMMDD]/10000)-1900, #1900-01-01#)))
Note that the performance gains can be remarkable with large data sets. In a test
conducted over a 1 billion record sample, the first calculation took over 4 hours to
complete, while the second took about a minute.
Copyright @ 2012 See-Change Solutions
13. Dealing with SAS Dates - Examples
/* Day of Injury */
(CASE
WHEN(WEEKDAY(t1.injdate)) = 1 THEN
'Sun'
WHEN(WEEKDAY(t1.injdate)) = 2 THEN
'Mon'
WHEN(WEEKDAY(t1.injdate)) = 3 THEN
'Tue'
WHEN(WEEKDAY(t1.injdate)) = 4 THEN
'Wed'
WHEN(WEEKDAY(t1.injdate)) = 5 THEN
'Thu'
WHEN(WEEKDAY(t1.injdate)) = 6 THEN
'Fri'
WHEN(WEEKDAY(t1.injdate)) = 7 THEN
'Sat'
END) AS 'Day of Injury'n,
Copyright @ 2012 See-Change Solutions
14. Dealing with SAS Dates - Examples
/* Elapsed Weeks */
intnx('week',t1.csdwrkdt,t1.rsmwrkdt,'e')
LABEL="Elapsed Weeks" AS 'Elapsed Weeks'n,
/* Claim Age (Months) */
intck('month',t1.clamdate,t1.sysdate) LABEL="Claim
Age (Months)" AS 'Claim Age (Months)'n,
/* Reporting Delay (days) */
intck('day',t1.injdate,t1.clamdate) ) AS 'Reporting
Delay (days)'n,
SAS Lag functions can also be
very useful
(See “Longitudinal Data
Techniques”
http://www.ats.ucla.edu/stat/sas
/library/nesug00/ad1002.pdf)
Copyright @ 2012 See-Change Solutions
15. Currency Formats
t1.totpayc FORMAT=BEST12. LABEL="Claim Payments to
Date" AS 'Claim Payments to Date'n
Use BEST12. to avoid
issues when
importing/displaying SAS
currency data in Tableau.
Copyright @ 2012 See-Change Solutions
16. Rename Column Names
Rename cryptic SAS field names:
data claims ;
set mth.claims ;
keep insurer claim teed deis injdate injdsnat
injnatc injresc injdislc clmclosf clmclodt
workpc csdwrkdt rsmwrkdt hrswrkwk hrstincc;
run ;
t1.emplnam1 LABEL="Employer Name" AS
'Employer Name'n,
Label hierarchy levels appropriately:
/* Industry - Level 1*/
t1.indgroup LABEL="Industry - Level 1"
/* Industry - Level 2*/
t1.indsubgrp LABEL="Industry - Level 2"
Copyright @ 2012 See-Change Solutions
17. Add descriptors for coded SAS fields
/* Insurer Name */
(CASE
WHEN t1.insurer =
WHEN t1.insurer =
WHEN t1.insurer =
WHEN t1.insurer =
ELSE 'Not Known'
END) AS 'Insurer Name'n
1
2
3
4
THEN
THEN
THEN
THEN
'Insurer
'Insurer
'Insurer
'Insurer
1'
2'
3'
4'
/* Liability Status */
(CASE
WHEN t1.clmliab = 1 THEN 'Notification of work related
injury'
WHEN t1.clmliab = 2 THEN 'Liability accepted'
WHEN t1.clmliab = 5 THEN 'Liability not yet determined'
WHEN t1.clmliab = 6 THEN 'Administration error'
WHEN t1.clmliab = 7 THEN 'Liability denied'
WHEN t1.clmliab = 8 THEN 'Provisional liability accepted weekly and medical payments'
WHEN t1.clmliab = 9 THEN 'Reasonable excuse'
WHEN t1.clmliab = 10 THEN 'Provisional liability
discontinued'
WHEN t1.clmliab = 11 THEN 'Provisional liability accepted medical only, weekly payments not applicable'
WHEN t1.clmliab = 12 THEN 'No action after notification'
ELSE 'Not Known'
END) AS 'Liability Status'n,
Copyright @ 2012 See-Change Solutions
18. Use SAS formats for more complex formats
CASE WHEN t1.deis le '30jun2011'd
THEN put(t1.occncode,asc21dgn.)
ELSE put(t1.occncode,anzsco1n.)
END AS 'Occupation - Level 1'n,
/* Occupation - Level 2*/
CASE WHEN t1.deis le '30jun2011'd
THEN put(t1.occncode,asc22dgn.)
ELSE put(t1.occncode,anzsco2n.)
END AS 'Occupation - Level 2'n,
Often in SAS, a single field will be set up with several informats, relating
to different levels of a hierarchy.
Connecting to the SAS dataset via SAS ODBC would lose this
information, therefore it is advisable to apply each SAS informat to
create multiple fields in the SAS extract, prior to importing into Tableau.
Copyright @ 2012 See-Change Solutions
19. Merge Disparate Data Sources
This is particularly
useful when rows may
not all match across
sources
Also, this approach avoids having to try to join
all sources in real time in one or several outer
join SQL statements (as would be the approach
in traditional BI tools such as Cognos)
Copyright @ 2012 See-Change Solutions