SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
BIG DATA ANALYTICS
BUSINESS INTELLIGENCE
INFORMATION MANAGEMENT
PERFORMANCE MANAGEMENT
© Copyright 2015 – Keyrus 2
DIVING INTO WEBLOG DATA WITH SAS ON
HADOOP
Lisa Truyers, Data Scientist Consultant at Keyrus
March 24, 2016
Logo
© Copyright 2015 – Keyrus 3
Project summary
WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
© Copyright 2015 – Keyrus 4
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 5
PROS
 Open-source software framework
 Storage and large-scale data processing
 Easy and economic scaling
 Both structured and unstructured data
 Low-cost commodity hardware
 Starts multiple copies of the same task for
the same block of data
What is Hadoop?
51% OF COMPANIES THINKS ABOUT INTEGRATING
HADOOP IN THEIR COMPANY BY 2016
Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
© Copyright 2015 – Keyrus 6
CONS
 Management and high-availability
capabilities are just starting to emerge
 Data security is fragmented
 MapReduce is very batch-oriented
 No easy-to-use, full-feature tools for data
integration, data cleansing, governance
and metadata
 Lacking skilled professionals
What is Hadoop?
MANAGE THE DATA AND USE ANALYTICS TO QUICKLY
IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS
THE DIFFERENT TOOLS OF SAS
© Copyright 2015 – Keyrus 7
WHAT ARE COMPANIES DOING WITH HADOOP?
The percentages mentioned here cover the whole world, not only Europe.
What is Hadoop?
What? Percentage
Data warehouse extensions 46 %
Data exploration and discovery 46 %
Data staging for data warehousing and data integration 39 %
Data lake 39 %
Queryable archive for non-traditional data 36 %
Computational platform and sandbox for advanced analytics 33 %
© Copyright 2015 – Keyrus 8
WHY IS HADOOP (NOT) IMPORTANT?
“Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.”
BI architect, telecom, Europe
“Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive
analytics.”
Vice president, food and beverage, Asia
“Our existing infrastructure cannot handle the tenfold increase in data volumes.”
Data strategy manager, hospitality, US
“It’s important to realize the potential of big data and to explore new business opportunities.”
Data specialist, consulting, Asia
What is Hadoop?
© Copyright 2015 – Keyrus 9
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 10
INTRODUCTION
Project summary
1. Discover web traffic data
• Discover web traffic data
• Sheer volume of data makes it impossible to analyse at the moment
• Prove the added value of a combined Hadoop – SAS environment
2. Lead generation
• More business oriented: scoring a neural network model takes one hour on daily basis
• Reducing this time
© Copyright 2015 – Keyrus 11
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 12
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 13
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 14
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 15
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 16
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 17
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 18
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 19
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 20
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 21
FULL PROCESS
Setup to load data
Day
A Partitioned, non-parsed for day-files
C Partitioned, parsed for day-files
Hour
B Partitioned, non-parsed for hour-files
D Partitioned, parsed for hour-files
© Copyright 2015 – Keyrus 22
Setup to load data
© Copyright 2015 – Keyrus 23
PROCESS C
Setup to load data
Delete HIVE
Table
Transfer to
Hadoop
Parse data Merge Loop
© Copyright 2015 – Keyrus 24
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 25
HADOOP COMPARED TO SERVER
Server
 Query test one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 4hours 30 minutes
Benchmarks
Hadoop
 Query test on one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 53 minutes
MORE TIME NEEDED FOR EXTRA BENCHMARKS
© Copyright 2015 – Keyrus 26
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 27
Teamwork is key
• Set-up Hadoop cluster with
Hadoop-experts
• Install SAS with experts from
the company
SAS ON HADOOP
 In SAS, take your time to set the correct
variable length
 Choose the strength of the cluster
rationally
 Create Benchmarks on both environments
(server VS Hadoop) early on so a good
comparison can be done and the correct
decision can be taken
 Data must be large enough on Hadoop to
see a difference
Lessons learned
THANK YOU FOR YOUR ATTENTION
To contact us
www.keyrus.com
contact@keyrus.com

Mais conteúdo relacionado

Mais procurados

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Pentaho
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
DataWorks Summit
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
Raghu Kashyap
 

Mais procurados (20)

Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
 
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Data – The New Raw Material for Business
Data – The New Raw Material for BusinessData – The New Raw Material for Business
Data – The New Raw Material for Business
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 

Destaque

Destaque (6)

BI congres 2016: programma
BI congres 2016: programmaBI congres 2016: programma
BI congres 2016: programma
 
Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
 
Performance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfPerformance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelf
 
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
 

Semelhante a BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 

Semelhante a BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus (20)

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
HP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudHP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government Cloud
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service Platform
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 

Último

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Último (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

  • 1. BIG DATA ANALYTICS BUSINESS INTELLIGENCE INFORMATION MANAGEMENT PERFORMANCE MANAGEMENT
  • 2. © Copyright 2015 – Keyrus 2 DIVING INTO WEBLOG DATA WITH SAS ON HADOOP Lisa Truyers, Data Scientist Consultant at Keyrus March 24, 2016 Logo
  • 3. © Copyright 2015 – Keyrus 3 Project summary WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
  • 4. © Copyright 2015 – Keyrus 4 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 5. © Copyright 2015 – Keyrus 5 PROS  Open-source software framework  Storage and large-scale data processing  Easy and economic scaling  Both structured and unstructured data  Low-cost commodity hardware  Starts multiple copies of the same task for the same block of data What is Hadoop? 51% OF COMPANIES THINKS ABOUT INTEGRATING HADOOP IN THEIR COMPANY BY 2016 Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
  • 6. © Copyright 2015 – Keyrus 6 CONS  Management and high-availability capabilities are just starting to emerge  Data security is fragmented  MapReduce is very batch-oriented  No easy-to-use, full-feature tools for data integration, data cleansing, governance and metadata  Lacking skilled professionals What is Hadoop? MANAGE THE DATA AND USE ANALYTICS TO QUICKLY IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS THE DIFFERENT TOOLS OF SAS
  • 7. © Copyright 2015 – Keyrus 7 WHAT ARE COMPANIES DOING WITH HADOOP? The percentages mentioned here cover the whole world, not only Europe. What is Hadoop? What? Percentage Data warehouse extensions 46 % Data exploration and discovery 46 % Data staging for data warehousing and data integration 39 % Data lake 39 % Queryable archive for non-traditional data 36 % Computational platform and sandbox for advanced analytics 33 %
  • 8. © Copyright 2015 – Keyrus 8 WHY IS HADOOP (NOT) IMPORTANT? “Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.” BI architect, telecom, Europe “Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive analytics.” Vice president, food and beverage, Asia “Our existing infrastructure cannot handle the tenfold increase in data volumes.” Data strategy manager, hospitality, US “It’s important to realize the potential of big data and to explore new business opportunities.” Data specialist, consulting, Asia What is Hadoop?
  • 9. © Copyright 2015 – Keyrus 9 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 10. © Copyright 2015 – Keyrus 10 INTRODUCTION Project summary 1. Discover web traffic data • Discover web traffic data • Sheer volume of data makes it impossible to analyse at the moment • Prove the added value of a combined Hadoop – SAS environment 2. Lead generation • More business oriented: scoring a neural network model takes one hour on daily basis • Reducing this time
  • 11. © Copyright 2015 – Keyrus 11 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 12. © Copyright 2015 – Keyrus 12 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 13. © Copyright 2015 – Keyrus 13 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 14. © Copyright 2015 – Keyrus 14 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 15. © Copyright 2015 – Keyrus 15 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 16. © Copyright 2015 – Keyrus 16 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 17. © Copyright 2015 – Keyrus 17 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 18. © Copyright 2015 – Keyrus 18 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 19. © Copyright 2015 – Keyrus 19 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 20. © Copyright 2015 – Keyrus 20 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 21. © Copyright 2015 – Keyrus 21 FULL PROCESS Setup to load data Day A Partitioned, non-parsed for day-files C Partitioned, parsed for day-files Hour B Partitioned, non-parsed for hour-files D Partitioned, parsed for hour-files
  • 22. © Copyright 2015 – Keyrus 22 Setup to load data
  • 23. © Copyright 2015 – Keyrus 23 PROCESS C Setup to load data Delete HIVE Table Transfer to Hadoop Parse data Merge Loop
  • 24. © Copyright 2015 – Keyrus 24 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 25. © Copyright 2015 – Keyrus 25 HADOOP COMPARED TO SERVER Server  Query test one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 4hours 30 minutes Benchmarks Hadoop  Query test on one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 53 minutes MORE TIME NEEDED FOR EXTRA BENCHMARKS
  • 26. © Copyright 2015 – Keyrus 26 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 27. © Copyright 2015 – Keyrus 27 Teamwork is key • Set-up Hadoop cluster with Hadoop-experts • Install SAS with experts from the company SAS ON HADOOP  In SAS, take your time to set the correct variable length  Choose the strength of the cluster rationally  Create Benchmarks on both environments (server VS Hadoop) early on so a good comparison can be done and the correct decision can be taken  Data must be large enough on Hadoop to see a difference Lessons learned
  • 28. THANK YOU FOR YOUR ATTENTION To contact us www.keyrus.com contact@keyrus.com