Mais conteúdo relacionado Semelhante a BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus (20) BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus2. © Copyright 2015 – Keyrus 2
DIVING INTO WEBLOG DATA WITH SAS ON
HADOOP
Lisa Truyers, Data Scientist Consultant at Keyrus
March 24, 2016
Logo
3. © Copyright 2015 – Keyrus 3
Project summary
WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
4. © Copyright 2015 – Keyrus 4
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
5. © Copyright 2015 – Keyrus 5
PROS
Open-source software framework
Storage and large-scale data processing
Easy and economic scaling
Both structured and unstructured data
Low-cost commodity hardware
Starts multiple copies of the same task for
the same block of data
What is Hadoop?
51% OF COMPANIES THINKS ABOUT INTEGRATING
HADOOP IN THEIR COMPANY BY 2016
Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
6. © Copyright 2015 – Keyrus 6
CONS
Management and high-availability
capabilities are just starting to emerge
Data security is fragmented
MapReduce is very batch-oriented
No easy-to-use, full-feature tools for data
integration, data cleansing, governance
and metadata
Lacking skilled professionals
What is Hadoop?
MANAGE THE DATA AND USE ANALYTICS TO QUICKLY
IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS
THE DIFFERENT TOOLS OF SAS
7. © Copyright 2015 – Keyrus 7
WHAT ARE COMPANIES DOING WITH HADOOP?
The percentages mentioned here cover the whole world, not only Europe.
What is Hadoop?
What? Percentage
Data warehouse extensions 46 %
Data exploration and discovery 46 %
Data staging for data warehousing and data integration 39 %
Data lake 39 %
Queryable archive for non-traditional data 36 %
Computational platform and sandbox for advanced analytics 33 %
8. © Copyright 2015 – Keyrus 8
WHY IS HADOOP (NOT) IMPORTANT?
“Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.”
BI architect, telecom, Europe
“Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive
analytics.”
Vice president, food and beverage, Asia
“Our existing infrastructure cannot handle the tenfold increase in data volumes.”
Data strategy manager, hospitality, US
“It’s important to realize the potential of big data and to explore new business opportunities.”
Data specialist, consulting, Asia
What is Hadoop?
9. © Copyright 2015 – Keyrus 9
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
10. © Copyright 2015 – Keyrus 10
INTRODUCTION
Project summary
1. Discover web traffic data
• Discover web traffic data
• Sheer volume of data makes it impossible to analyse at the moment
• Prove the added value of a combined Hadoop – SAS environment
2. Lead generation
• More business oriented: scoring a neural network model takes one hour on daily basis
• Reducing this time
11. © Copyright 2015 – Keyrus 11
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
12. © Copyright 2015 – Keyrus 12
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
13. © Copyright 2015 – Keyrus 13
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
14. © Copyright 2015 – Keyrus 14
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
15. © Copyright 2015 – Keyrus 15
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
16. © Copyright 2015 – Keyrus 16
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
17. © Copyright 2015 – Keyrus 17
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
18. © Copyright 2015 – Keyrus 18
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
19. © Copyright 2015 – Keyrus 19
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
20. © Copyright 2015 – Keyrus 20
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
21. © Copyright 2015 – Keyrus 21
FULL PROCESS
Setup to load data
Day
A Partitioned, non-parsed for day-files
C Partitioned, parsed for day-files
Hour
B Partitioned, non-parsed for hour-files
D Partitioned, parsed for hour-files
23. © Copyright 2015 – Keyrus 23
PROCESS C
Setup to load data
Delete HIVE
Table
Transfer to
Hadoop
Parse data Merge Loop
24. © Copyright 2015 – Keyrus 24
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
25. © Copyright 2015 – Keyrus 25
HADOOP COMPARED TO SERVER
Server
Query test one day: 35 seconds
Parsing data on one day: 15 minutes
Parsing of one week: 4hours 30 minutes
Benchmarks
Hadoop
Query test on one day: 35 seconds
Parsing data on one day: 15 minutes
Parsing of one week: 53 minutes
MORE TIME NEEDED FOR EXTRA BENCHMARKS
26. © Copyright 2015 – Keyrus 26
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
27. © Copyright 2015 – Keyrus 27
Teamwork is key
• Set-up Hadoop cluster with
Hadoop-experts
• Install SAS with experts from
the company
SAS ON HADOOP
In SAS, take your time to set the correct
variable length
Choose the strength of the cluster
rationally
Create Benchmarks on both environments
(server VS Hadoop) early on so a good
comparison can be done and the correct
decision can be taken
Data must be large enough on Hadoop to
see a difference
Lessons learned
28. THANK YOU FOR YOUR ATTENTION
To contact us
www.keyrus.com
contact@keyrus.com