SlideShare uma empresa Scribd logo
1 de 35
This slide is for video use only.
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS on Your
(Apache)
Cluster, Serving
your Data
(Analysts)
Chalk and Cheese?
Fit for each Other?
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
Paul Kent
VP Bigdata
SAS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
AGENDA
1. Two ways to push work to the cluster…
1. Using SQL
2. Using a SAS Compute Engine on the cluster
2. Data Implications
1. Data in SAS Format, produce/consume with other tools
2. Data in other Formats, produce/consume with SAS
3. HDFS versus the Enterprise DBMS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
AGENDA
1. Two ways to push work to the cluster…
1. Using SQL
2. Using a SAS Compute Engine on the cluster
2. Data Implications
1. Data in SAS Format, produce/consume with other tools
2. Data in other Formats, produce/consume with SAS
3. HDFS versus the Enterprise DBMS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
USING SQL
LIBNAME olly HADOOP
SERVER=mycluster.mycompany.com
USER=“kent” PASS=“sekrit”;
PROC DATASETS LIB=OLLY;
RUN;
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS Server
LIBNANE olly HADOOP
SERVER=hadoop.company.com
USER=“paul” PASS=“sekrit”
PROC XYZZY DATA=olly.table;
RUN;
Hadoop Cluster
Select *
From olly_slice
Select *
From olly
Controller Workers
Hadoop
Access
Method
Select *
From olly
Potentially
Big Data
USING SQL
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS Server
LIBNANE olly HADOOP
SERVER=hadoop.company.com
USER=“paul” PASS=“sekrit”
PROC MEANS DATA=olly.table;
BY GRP; RUN;
Hadoop Cluster
Select sum(x),
min(x) ….
From olly_slice
Group By GRP
Select sum(x),
min(x) …
From olly
Group By GRP
Controller Workers
Hadoop
Access
Method
Select sum(x),
min(x) ….
From olly
Group By GRP
Aggregate Data
ONLY
USING SQL
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
USING SQL
Advantages
Same SAS syntax. (people skills)
Convenient
Gateway Drug 
Disadvantages
Not really taking advantage of
cluster
Potentially Large datasets still
transferred to SAS Server
Not Many Techniques Passthru
Basic Summary Statistics – YES
Higher Order Math – NO
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
AGENDA
1. Two ways to push work to the cluster…
1. Using SQL
2. Using a SAS Compute Engine on the cluster
2. Data Implications
1. Data in SAS Format, produce/consume with other tools
2. Data in other Formats, produce/consume with SAS
3. HDFS versus the Enterprise DBMS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
HDFS
MAP
REDUCE
Storm
Spark
IMPALA
Tez
SAS
Yarn, or better resource management
Many talks at #HadoopSummit on “Beyond MapReduce”
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS ON YOUR CLUSTER
Controller
Client
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS Server
libname joe sashdat "/hdfs/..";
proc hpreg data=joe.class;
class sex;
model age = sex height
weight;
run;
Appliance
Controller Workers
tkgrid
Access
Engine
General Captains
TK TK TK TK TK
MPI
BLKsHDFS
BLKs
BLKs BLKs BLKs
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS Server
libname joe sashdat "/hdfs/..";
proc hpreg data=joe.class;
class sex;
model age = sex height
weight;
run;
Appliance
Controller Workers
tkgrid
Access
Engine
General Captains
TK TK TK TK TK
MPI
BLKsHDFS
BLKs
BLKs BLKs BLKs
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS Server
libname joe sashdat "/hdfs/..";
proc hpreg data=joe.class;
class sex;
model age = sex height
weight;
run;
Appliance
Controller Workers
tkgrid
Access
Engine
General Captains
TK TK TK TK TK
MPI
MAPrMAP
REDUCE
JOB
MAPr MAPr MAPr
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
Single / Multi-threaded
Not aware of distributed computing
environment
Computes locally / where called
Fetches Data as required
Memory still a constraint
Massively Parallel (MPP)
Uses distributed computing environment
Computes in massively distributed mode
Work is co-located with data
In-Memory Analytics
40 nodes x 96GB almost 4TB of memory
proc logistic data=TD.mydata;
class A B C;
model y(event=„1‟) = A B B*C;
run;
proc hplogistic data=TD.mydata;
class A B C;
model y(event=„1‟) = A B B*C;
run;
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
SAS® IN-MEMORY
ANALYTICS
• Common set of HP procedures will be included in each of the individual SAS HP “Analytics” products
• New in June release
SAS®
High-
Performance
Statistics
SAS®
High-
Performance
Econometrics
SAS®
High-
Performance
Optimization
SAS®
High-
Performance
Data Mining1
SAS®
High-
Performance
Text Mining
SAS®
High-
Performance
Forecasting2
HPLOGISTIC
HPREG
HPLMIXED
HPNLMOD
HPSPLIT
HPGENSELECT
HPCOUNTREG
HPSEVERITY
HPQLIM
HPLSO
Select features in
OPTMILP
OPTLP
OPTMODEL
HPREDUCE
HPNEURAL
HPFOREST
HP4SCORE
HPDECIDE
HPTMINE
HPTMSCORE
HPFORECAST
Common Set (HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR)
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
Scalability on a 12-Core Server
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
Acceleration by factor 106!
Configuration Workflow Step CPU Runtime Ratio
Client, 24 cores
Explore (100K) 00:01:07:17 4.2
Partition 00:07:54:04 19.5
Impute 00:01:19:84 7.7
Transform 00:09:45:01 13.2
Logistic Regression (Step) 04:09:21:61 131.5
Total 04:29:27:67 106.1
HPA Appliance,
32 x 24 = 768 cores
Explore 00:00:15:81
Partition 00:00:21:52
Impute 00:00:21:47
Transform 00:00:44:28
Logistic Regression 00:01:37:99
Total 00:02:21:07
32 X
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
Acceleration by factor 322!
Configuration Workflow Step CPU Runtime Ratio
Client, 24 cores
Explore 00:01:07:17 4.2
Partition 01:01:09:31 170.5
Impute 00:02:45:81 7.7
Transform 01:26:06:22 116.7
Neural Net 18:21:28:54 478.9
Total 20:52:37:05 313
HPA Appliance,
32 x 24 = 768 cores
Explore 00:00:15:81
Partition 00:00:21:52
Impute 00:00:21:47
Transform 00:00:44:28
Neural Net 00:02:17:40
Total 00:04:00:48
32 X
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
AGENDA
1. Two ways to push work to the cluster…
1. Using SQL
2. Using a SAS Compute Engine on the cluster
2. Data Implications
1. Data in SAS Format, produce/consume with other tools
2. Data in other Formats, produce/consume with SAS
3. HDFS versus the Enterprise DBMS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
DATA CHOICES
Hadoop
Format
Sequence
Avro
Trevni
ORC
Parquet
SAS
Format
SASHDAT
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
PROCESSING CHOICES
Hadoop
Format
Sequence
Avro
Trevni
ORC
Parquet
NorthEast and SouthWest Quadrants are the interoperability challenges!
SAS
Format
SASHDAT
Process with Hadoop Tools
Process with SAS
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
PROCESSING CHOICES
Hadoop
Format
Sequence
Avro
Trevni
ORC
Parquet
NorthEast and SouthWest Quadrants are the interoperability challenges!
SAS
Format
SASHDAT
Process with Hadoop Tools
Process with SAS
✔✔
✔
✔✔
✔
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
TEACH HADOOP (PIG) ABOUT SAS
register pigudf.jar, sas.lasr.hadoop.jar, sas.lasr.jar;
/* Load the data from sashdat */
B = load '/user/kent/class.sashdat' using
com.sas.pigudf.sashdat.pig.SASHdatLoadFunc();
/* perform word-count */
Bgroup = group B by $0;
Bcount = foreach Bgroup generate group, COUNT(B);
dump Bcount;
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
TEACH HADOOP (PIG) ABOUT SAS
register pigudf.jar, sas.lasr.hadoop.jar, sas.lasr.jar;
/* Load the data from a CSV in HDFS */
A = load '/user/kent/class.csv'
using PigStorage(',')
as (name:chararray, sex:chararray,
age:int, height:double, weight:double);
Store A into '/user/kent/class'
using com.sas.pigudf.sashdat.pig.SASHdatStoreFunc(
’bigcdh01.unx.sas.com',
'/user/kent/class_bigcdh01.xml');
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
TEACH HADOOP (MAP REDUCE) ABOUT SAS
Hot off the Presses… SERDEs for
Input Reader
Output Writer
…. Looking for interested parties to try this
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
PROCESSING CHOICES
Hadoop
Format
Sequence
Avro
Trevni
ORC
Parquet
NorthEast and SouthWest Quadrants are the interoperability challenges!
SAS
Format
SASHDAT
Process with Hadoop Tools
Process with SAS
✔✔
✔
✔✔
✔
✔✔
✔
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
HOW ABOUT THE
OTHER WAY? TEACH HADOOP (MAP/REDUCE) ABOUT SAS
/* Create HDMD file */
proc hdmd name=gridlib.people
format=delimited
sep=tab
file_type=custom_sequence
input_format='com.sas.hadoop.ep.inputformat.sequence.PeopleCustomSequenceInputFormat'
data_file='people.seq';
COLUMN name varchar(20) ctype=char;
COLUMN sex varchar(1) ctype=char;
COLUMN age int ctype=int32;
column height double ctype=double;
column weight double ctype=double;
run;
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS
• Alongside Hadoop (Symmetric)
SAS Server
libname joe sashdat "/hdfs/..";
proc hpreg data=joe.class;
class sex;
model age = sex height
weight;
run;
Appliance
Controller Workers
tkgrid
Access
Engine
General Captains
TK TK TK TK TK
MPI
MAPrMAP
REDUCE
JOB
MAPr MAPr MAPr
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
PROCESSING CHOICES
Hadoop
Format
Sequence
Avro
Trevni
ORC
Parquet
NorthEast and SouthWest Quadrants are the interoperability challenges!
SAS
Format
SASHDAT
Process with Hadoop Tools
Process with SAS
✔✔
✔
✔✔
✔
✔✔
✔
✔✔
✔
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
AGENDA
1. Two ways to push work to the cluster…
1. Using SQL
2. Using a SAS Compute Engine on the cluster
2. Data Implications
1. Data in SAS Format, produce/consume with other tools
2. Data in other Formats, produce/consume with SAS
3. HDFS versus the Enterprise DBMS
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
REFERENCE
ARCHITECTURE
TERADATA
CLIENT
ORACLE
HADOOP
GREENPLUM
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
HADOOP VS EDW
Hadoop Excels at
10x Cost/TB advantage
Not yet structured datasets
>2000 columns, no problems
Incremental growth “practical”
Discovery and Experimentation
Variable Selection
Model Comparison
EDW Still wins
SQL applications
Pushing analytics into LOB apps
Operational
CRM
Optimization
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
MOST IMPORTANT! SAS ON YOUR CLUSTER
Controller
Client
Company Confidential - For Internal Use Only
Copyright © 2013, SAS Institute Inc. All rights reserved.
SUPPORTED HADOOP DISTRIBUTIONS
Distribution Supported?
Apache 2.0 yes
Cloudera CDH4 yes
Horton HDP 2.0 yes
Horton HDP1.3 So close. Please See me…
Pivotal HD In Progress
MapR Work Remains
Intel 3.0 Optimistic…
Copyr ight © 2013, SAS Institute Inc. All rights reser ved.
THANK YOU
Paul.Kent @ sas.com
@hornpolish
paulmkent

Mais conteúdo relacionado

Mais procurados

Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataCloudera, Inc.
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Impala Performance Update
Impala Performance UpdateImpala Performance Update
Impala Performance UpdateCloudera, Inc.
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 

Mais procurados (20)

Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Impala Performance Update
Impala Performance UpdateImpala Performance Update
Impala Performance Update
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 

Semelhante a SAS on Your (Apache) Cluster, Serving your Data (Analysts)

Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web appsLior Bar-On
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Goetz Lessmann
 
Introduction to Data Science with Hadoop
Introduction to Data Science with HadoopIntroduction to Data Science with Hadoop
Introduction to Data Science with HadoopDr. Volkan OBAN
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
データベースMeetup vol2
データベースMeetup vol2データベースMeetup vol2
データベースMeetup vol2Koji Shinkubo
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonSumit Sarkar
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0vithakur
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 

Semelhante a SAS on Your (Apache) Cluster, Serving your Data (Analysts) (20)

Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web apps
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014
 
Introduction to Data Science with Hadoop
Introduction to Data Science with HadoopIntroduction to Data Science with Hadoop
Introduction to Data Science with Hadoop
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Sas Grid Migration and Roadmap
Sas Grid Migration and RoadmapSas Grid Migration and Roadmap
Sas Grid Migration and Roadmap
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
データベースMeetup vol2
データベースMeetup vol2データベースMeetup vol2
データベースMeetup vol2
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

SAS on Your (Apache) Cluster, Serving your Data (Analysts)

  • 1. This slide is for video use only. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS on Your (Apache) Cluster, Serving your Data (Analysts) Chalk and Cheese? Fit for each Other? Copyr ight © 2013, SAS Institute Inc. All rights reser ved. Paul Kent VP Bigdata SAS
  • 2. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. AGENDA 1. Two ways to push work to the cluster… 1. Using SQL 2. Using a SAS Compute Engine on the cluster 2. Data Implications 1. Data in SAS Format, produce/consume with other tools 2. Data in other Formats, produce/consume with SAS 3. HDFS versus the Enterprise DBMS
  • 3. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. AGENDA 1. Two ways to push work to the cluster… 1. Using SQL 2. Using a SAS Compute Engine on the cluster 2. Data Implications 1. Data in SAS Format, produce/consume with other tools 2. Data in other Formats, produce/consume with SAS 3. HDFS versus the Enterprise DBMS
  • 4. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. USING SQL LIBNAME olly HADOOP SERVER=mycluster.mycompany.com USER=“kent” PASS=“sekrit”; PROC DATASETS LIB=OLLY; RUN;
  • 5. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS Server LIBNANE olly HADOOP SERVER=hadoop.company.com USER=“paul” PASS=“sekrit” PROC XYZZY DATA=olly.table; RUN; Hadoop Cluster Select * From olly_slice Select * From olly Controller Workers Hadoop Access Method Select * From olly Potentially Big Data USING SQL
  • 6. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS Server LIBNANE olly HADOOP SERVER=hadoop.company.com USER=“paul” PASS=“sekrit” PROC MEANS DATA=olly.table; BY GRP; RUN; Hadoop Cluster Select sum(x), min(x) …. From olly_slice Group By GRP Select sum(x), min(x) … From olly Group By GRP Controller Workers Hadoop Access Method Select sum(x), min(x) …. From olly Group By GRP Aggregate Data ONLY USING SQL
  • 7. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. USING SQL Advantages Same SAS syntax. (people skills) Convenient Gateway Drug  Disadvantages Not really taking advantage of cluster Potentially Large datasets still transferred to SAS Server Not Many Techniques Passthru Basic Summary Statistics – YES Higher Order Math – NO
  • 8. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. AGENDA 1. Two ways to push work to the cluster… 1. Using SQL 2. Using a SAS Compute Engine on the cluster 2. Data Implications 1. Data in SAS Format, produce/consume with other tools 2. Data in other Formats, produce/consume with SAS 3. HDFS versus the Enterprise DBMS
  • 9. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. HDFS MAP REDUCE Storm Spark IMPALA Tez SAS Yarn, or better resource management Many talks at #HadoopSummit on “Beyond MapReduce”
  • 10. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS ON YOUR CLUSTER Controller Client
  • 11. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS Server libname joe sashdat "/hdfs/.."; proc hpreg data=joe.class; class sex; model age = sex height weight; run; Appliance Controller Workers tkgrid Access Engine General Captains TK TK TK TK TK MPI BLKsHDFS BLKs BLKs BLKs BLKs
  • 12. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS Server libname joe sashdat "/hdfs/.."; proc hpreg data=joe.class; class sex; model age = sex height weight; run; Appliance Controller Workers tkgrid Access Engine General Captains TK TK TK TK TK MPI BLKsHDFS BLKs BLKs BLKs BLKs
  • 13. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS Server libname joe sashdat "/hdfs/.."; proc hpreg data=joe.class; class sex; model age = sex height weight; run; Appliance Controller Workers tkgrid Access Engine General Captains TK TK TK TK TK MPI MAPrMAP REDUCE JOB MAPr MAPr MAPr
  • 14. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. Single / Multi-threaded Not aware of distributed computing environment Computes locally / where called Fetches Data as required Memory still a constraint Massively Parallel (MPP) Uses distributed computing environment Computes in massively distributed mode Work is co-located with data In-Memory Analytics 40 nodes x 96GB almost 4TB of memory proc logistic data=TD.mydata; class A B C; model y(event=„1‟) = A B B*C; run; proc hplogistic data=TD.mydata; class A B C; model y(event=„1‟) = A B B*C; run;
  • 15. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. SAS® IN-MEMORY ANALYTICS • Common set of HP procedures will be included in each of the individual SAS HP “Analytics” products • New in June release SAS® High- Performance Statistics SAS® High- Performance Econometrics SAS® High- Performance Optimization SAS® High- Performance Data Mining1 SAS® High- Performance Text Mining SAS® High- Performance Forecasting2 HPLOGISTIC HPREG HPLMIXED HPNLMOD HPSPLIT HPGENSELECT HPCOUNTREG HPSEVERITY HPQLIM HPLSO Select features in OPTMILP OPTLP OPTMODEL HPREDUCE HPNEURAL HPFOREST HP4SCORE HPDECIDE HPTMINE HPTMSCORE HPFORECAST Common Set (HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR)
  • 16. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. Scalability on a 12-Core Server
  • 17. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. Acceleration by factor 106! Configuration Workflow Step CPU Runtime Ratio Client, 24 cores Explore (100K) 00:01:07:17 4.2 Partition 00:07:54:04 19.5 Impute 00:01:19:84 7.7 Transform 00:09:45:01 13.2 Logistic Regression (Step) 04:09:21:61 131.5 Total 04:29:27:67 106.1 HPA Appliance, 32 x 24 = 768 cores Explore 00:00:15:81 Partition 00:00:21:52 Impute 00:00:21:47 Transform 00:00:44:28 Logistic Regression 00:01:37:99 Total 00:02:21:07 32 X
  • 18. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. Acceleration by factor 322! Configuration Workflow Step CPU Runtime Ratio Client, 24 cores Explore 00:01:07:17 4.2 Partition 01:01:09:31 170.5 Impute 00:02:45:81 7.7 Transform 01:26:06:22 116.7 Neural Net 18:21:28:54 478.9 Total 20:52:37:05 313 HPA Appliance, 32 x 24 = 768 cores Explore 00:00:15:81 Partition 00:00:21:52 Impute 00:00:21:47 Transform 00:00:44:28 Neural Net 00:02:17:40 Total 00:04:00:48 32 X
  • 19. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. AGENDA 1. Two ways to push work to the cluster… 1. Using SQL 2. Using a SAS Compute Engine on the cluster 2. Data Implications 1. Data in SAS Format, produce/consume with other tools 2. Data in other Formats, produce/consume with SAS 3. HDFS versus the Enterprise DBMS
  • 20. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. DATA CHOICES Hadoop Format Sequence Avro Trevni ORC Parquet SAS Format SASHDAT
  • 21. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. PROCESSING CHOICES Hadoop Format Sequence Avro Trevni ORC Parquet NorthEast and SouthWest Quadrants are the interoperability challenges! SAS Format SASHDAT Process with Hadoop Tools Process with SAS
  • 22. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. PROCESSING CHOICES Hadoop Format Sequence Avro Trevni ORC Parquet NorthEast and SouthWest Quadrants are the interoperability challenges! SAS Format SASHDAT Process with Hadoop Tools Process with SAS ✔✔ ✔ ✔✔ ✔
  • 23. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. TEACH HADOOP (PIG) ABOUT SAS register pigudf.jar, sas.lasr.hadoop.jar, sas.lasr.jar; /* Load the data from sashdat */ B = load '/user/kent/class.sashdat' using com.sas.pigudf.sashdat.pig.SASHdatLoadFunc(); /* perform word-count */ Bgroup = group B by $0; Bcount = foreach Bgroup generate group, COUNT(B); dump Bcount;
  • 24. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. TEACH HADOOP (PIG) ABOUT SAS register pigudf.jar, sas.lasr.hadoop.jar, sas.lasr.jar; /* Load the data from a CSV in HDFS */ A = load '/user/kent/class.csv' using PigStorage(',') as (name:chararray, sex:chararray, age:int, height:double, weight:double); Store A into '/user/kent/class' using com.sas.pigudf.sashdat.pig.SASHdatStoreFunc( ’bigcdh01.unx.sas.com', '/user/kent/class_bigcdh01.xml');
  • 25. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. TEACH HADOOP (MAP REDUCE) ABOUT SAS Hot off the Presses… SERDEs for Input Reader Output Writer …. Looking for interested parties to try this
  • 26. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. PROCESSING CHOICES Hadoop Format Sequence Avro Trevni ORC Parquet NorthEast and SouthWest Quadrants are the interoperability challenges! SAS Format SASHDAT Process with Hadoop Tools Process with SAS ✔✔ ✔ ✔✔ ✔ ✔✔ ✔
  • 27. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. HOW ABOUT THE OTHER WAY? TEACH HADOOP (MAP/REDUCE) ABOUT SAS /* Create HDMD file */ proc hdmd name=gridlib.people format=delimited sep=tab file_type=custom_sequence input_format='com.sas.hadoop.ep.inputformat.sequence.PeopleCustomSequenceInputFormat' data_file='people.seq'; COLUMN name varchar(20) ctype=char; COLUMN sex varchar(1) ctype=char; COLUMN age int ctype=int32; column height double ctype=double; column weight double ctype=double; run;
  • 28. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. HIGH-PERFORMANCE ANALYTICS • Alongside Hadoop (Symmetric) SAS Server libname joe sashdat "/hdfs/.."; proc hpreg data=joe.class; class sex; model age = sex height weight; run; Appliance Controller Workers tkgrid Access Engine General Captains TK TK TK TK TK MPI MAPrMAP REDUCE JOB MAPr MAPr MAPr
  • 29. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. PROCESSING CHOICES Hadoop Format Sequence Avro Trevni ORC Parquet NorthEast and SouthWest Quadrants are the interoperability challenges! SAS Format SASHDAT Process with Hadoop Tools Process with SAS ✔✔ ✔ ✔✔ ✔ ✔✔ ✔ ✔✔ ✔
  • 30. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. AGENDA 1. Two ways to push work to the cluster… 1. Using SQL 2. Using a SAS Compute Engine on the cluster 2. Data Implications 1. Data in SAS Format, produce/consume with other tools 2. Data in other Formats, produce/consume with SAS 3. HDFS versus the Enterprise DBMS
  • 31. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. REFERENCE ARCHITECTURE TERADATA CLIENT ORACLE HADOOP GREENPLUM
  • 32. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. HADOOP VS EDW Hadoop Excels at 10x Cost/TB advantage Not yet structured datasets >2000 columns, no problems Incremental growth “practical” Discovery and Experimentation Variable Selection Model Comparison EDW Still wins SQL applications Pushing analytics into LOB apps Operational CRM Optimization
  • 33. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. MOST IMPORTANT! SAS ON YOUR CLUSTER Controller Client
  • 34. Company Confidential - For Internal Use Only Copyright © 2013, SAS Institute Inc. All rights reserved. SUPPORTED HADOOP DISTRIBUTIONS Distribution Supported? Apache 2.0 yes Cloudera CDH4 yes Horton HDP 2.0 yes Horton HDP1.3 So close. Please See me… Pivotal HD In Progress MapR Work Remains Intel 3.0 Optimistic…
  • 35. Copyr ight © 2013, SAS Institute Inc. All rights reser ved. THANK YOU Paul.Kent @ sas.com @hornpolish paulmkent

Notas do Editor

  1. Server um Faktor 12, Appliance um Faktor 32 vergrössert.Würde man das NN zumVergleichhinzuziehen, so hat man ~19h zu 3 Min.
  2. Server um Faktor 12, Appliance um Faktor 32 vergrössert.Würde man das NN zumVergleichhinzuziehen, so hat man ~19h zu 3 Min.