SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
Hand in Hand—Optimizing the Data Warehouse for Big Data 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com 
@eric_kavanagh
! Reveal the essential characteristics of enterprise software, 
good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and get 
answers! 
Twitter Tag: #briefr 
The Briefing Room 
Mission
Twitter Tag: #briefr 
The Briefing Room 
Topics 
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
What Is Hadoop and Where Is It Going?
Twitter Tag: #briefr 
The Briefing Room 
Analyst: Claudia Imhoff 
Claudia Imhoff is 
President & Founder of 
Intelligent Solutions, Inc.
Twitter Tag: #briefr 
The Briefing Room 
Pentaho 
! Pentaho offers a suite of open source business intelligence 
products called Pentaho Business Analytics 
! Pentaho’s big data solution provides access to any data 
source, and includes data integration, discovery, analysis 
and visualization 
! Pentaho’s solutions are available in community or enterprise 
editions
Twitter Tag: #briefr 
The Briefing Room 
Guest: Chuck Yarbrough 
Chuck is the Director of Big Data Product 
Marketing at Pentaho, a leading big data 
analytics company that helps organizations 
engineer big data connections, blend data 
and report and visualize all of their data. 
Much of Chuck's focus at Pentaho is in 
educating organizations on how big data 
can help win, serve and retain customers, 
lower costs and grow revenue through the 
proper use of big data. A life-long 
participant in the data game, Chuck has 
held leadership roles at Deloitte 
Consulting, SAP Business Objects, Hyperion 
and National Semiconductor.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 10 +1 (866) 660-7555 
April 29, 2014 
Director, Big Data Product Marketing 
@cyarbrough 
Data Warehouse 
Optimization 
Blueprint 
Chuck Yarbrough
OUR VISION 
The New Reality: 
Powerful yet simplified analytics for all users 
Billing 
Social 
Media 
Location 
Customer 
Web 
Network 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 11 +1 (866) 660-7555 
Analytics 
ANY Analytics 
• Reports 
• Dashboards 
• Visualizations 
• Discovery 
• Predictive 
• Any role 
Existing & New Data 
Infrastructure & 
Processes 
ANY Environment 
• Data warehouses 
• Data marts 
• Stack vendors 
• Cloud 
• Embedded 
ANY Data 
• Relational 
• Operational 
• Big Data 
• Data sources not 
yet anticipated
Emerging big data use cases demand 
blending multiple data sources 
Improve operational 
effectiveness 
Machines/sensors: 
predict failures, network 
attacks 
Financial risk management: 
reduce fraud, increase 
security 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 12 +1 (866) 660-7555 
Reduce data 
warehouse cost 
Integrate new data 
sources without 
increased database cost 
Provide online access 
to ‘dark data’ 
Drive incremental 
revenue 
Predict customer 
behavior across all channels 
Understand and 
monetize customer behavior 
Begin to monetize data 
as a service
A Spectrum of Big Data Use Cases 
What the Market is Deploying Today and Planning for Tomorrow 
Entry 
Transform 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 13 +1 (866) 660-7555 
Advanced 
Optimize 
Streamlined 
Data 
Refinery 
Big Data 
Exploration 
Customer 
360 Degree 
View 
Harnessing 
Machine & 
Sensor Data 
Next 
Generation 
Applications 
Internal Big 
Data as a 
Service 
On-Demand 
Big Data 
Blending 
Big Data 
Predictive 
Analytics 
Use Case Complexity 
Business Impact 
Monetize My 
Data 
Data 
Warehouse 
Optimization
A Spectrum of Big Data Use Cases 
What the Market is Deploying Today and Planning for Tomorrow 
Entry 
Transform 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 14 +1 (866) 660-7555 
Advanced 
Optimize 
Streamlined 
Data 
Refinery 
Big Data 
Exploration 
Customer 
360 Degree 
View 
Harnessing 
Machine & 
Sensor Data 
Next 
Generation 
Applications 
Internal Big 
Data as a 
Service 
On-Demand 
Big Data 
Blending 
Big Data 
Predictive 
Analytics 
Use Case Complexity 
Business Impact 
Monetize My 
Data 
Data 
Warehouse 
Optimization
Data Warehouse Optimization 
Remove the clutter and connect to Big Data 
Cut Downtime and Focus 
on Product Creation 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 15 +1 (866) 660-7555 
Remove Costly 
Legacy Systems 
Simplicity Empowers 
Business Users 
“Using Pentaho in our data warehouse, it 
now takes about 20 minutes to break down a 
metric and do specific analysis to identify 
performance issues. In the past, similar 
queries would take all night.” 
Greg Allen, Business Analyst, Kiva 
“Pentaho Data Integration not only simplifies 
the data delivery process but also enables us 
to gather the high-quality data. Ultimately 
Pentaho has enabled us to reach our goal of 
making the Swiss real estate market more 
transparent.” 
Prof. Dr. Peter IlG, Managing Director, Swiss 
Real Estate Datapool 
“We needed fully functional reporting and 
data integration tools but wanted to cut the 
cost burden experienced with Oracle. After 
looking at what was out there, Pentaho had 
the complete tool set, and after further 
testing, our users noticed no difference in 
the features they need.” 
Uwe Geercken. IT Manger, Swissport
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 16 +1 (866) 660-7555 
Why Do It? 
• Save data capacity & 
management costs 
• Empower IT and business 
users to meet goals on time 
Key Considerations 
• Normally leverages Hadoop 
• Relevant across industries 
• May require new coding skillsets 
that are hard to find 
What is it? 
• Existing DW infrastructure can’t 
support data explosion, & adding DW 
capacity is costly 
• So offload low priority data to Big 
Data store to extend capacity
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 17 +1 (866) 660-7555
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 18 +1 (866) 660-7555 
PDI 
Hadoop 
Cluster
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 19 +1 (866) 660-7555 
PDI 
Hadoop 
Cluster
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
Other Data 
Sources 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 20 +1 (866) 660-7555 
PDI 
Hadoop 
Cluster
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
Other Data 
Sources 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 21 +1 (866) 660-7555 
PDI 
Hadoop 
Cluster 
Analytic 
Data Mart 
PDI
Data Warehouse Optimization 
Shrink Data Costs & Boost Analytics Performance for Business Users 
CRM & ERP Systems 
Data Warehouse 
PDI 
Other Data 
Sources 
PDI 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 22 +1 (866) 660-7555 
PDI 
Hadoop 
Cluster 
Analytic 
Data Mart 
PDI 
Relational 
Layer
Data Warehouse Optimization 
Cost effective, fast processing 
Business Challenge 
• Gain competitive advantage through intraday 
balance reporting for commercial customers 
• Use Hadoop and relational data stores to 
process huge volumes 15x faster 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 23 +1 (866) 660-7555 
to develop 
10x faster 
to execute 
No coding 
Integrate 
with existing 
Easy to find 
resources 
Pentaho Benefits 
• Graphical orchestration for Hadoop, Hbase & 
DB2 data integration workloads 
• 15x faster to develop, 10x faster to execute 
A Major 
Financial 
Institution
Optimize data infrastructure to connect hundreds of 
interdependent banking applications 
Cash 
Processing 
Data 
Scalable Enterprise 
Data Hub 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 24 +1 (866) 660-7555 
Internal User 
Reporting & Data 
mining 
Clients 
Statements, 
Balance, 
Transaction 
Reporting & 
Analytics 
A Major 
Financial 
Institution 
Hadoop 
Cluster 
Historical 
Data Mart 
Data Marts 
Customer & 
Account 
Master Data 
Payments 
Data 
Other 
Financial 
Apps 
PDI PDI 
Hundreds of 
Enterprise Data 
Sources
Thank You 
JOIN THE CONVERSATION. YOU CAN FIND US ON: 
blog.pentaho.com 
@Pentaho 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 25 +1 (866) 660-7555 
Facebook.com/Pentaho 
Pentaho Business Analytics
Streamlined Data Refinery 
Drive a Sustainable Analytics Strategy with Big Data ETL at Scale 
Why Do It? 
• Give business users 
insight into all data 
• Scale ETL and data 
management cost 
savings 
• Next step after DW 
optimization 
Text Here 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 26 +1 (866) 660-7555 
Vertical Fit 
High Tech, Telecom, Media, 
Financial Services, etc 
Technology Fit 
Primarily Hadoop, 
but also NoSQL 
Benefits 
• Establish usable analytics on 
diverse sources at high volume 
(terabytes+) 
• Speed queries substantially with 
rapid ingestion & powerful 
processing 
• Reduce costs of ETL processing 
Challenges 
• Expansive integration project 
• May require new coding skillsets 
that are hard to find 
• May call for swapping from a data 
warehouse to a higher performing 
Analytic database, depending on 
requirements 
What is It? 
In the face of exploding volumes of transaction, customer, 
and other data, traditional ETL systems slow down, 
making analytics unworkable. One solution is to 
streamline most data through a scalable Big Data 
processing hub – that pushes refined data to a data 
warehouse or analytical database for low-latency self-service 
analytics across a diverse base of data.
Streamlined Data Refinery 
Drive a Sustainable Analytics Strategy with Big Data ETL at Scale 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 27 +1 (866) 660-7555 
Why 
• Offers a full platform for this use 
case, including broad data 
integration (incl. leading Hadoop 
distros and analytic DBs) and a 
powerful array of easy to use 
front-end analytics 
• Visual mapReduce mitigates 
need for additional developers, 
and makes Big Data accessible 
to existing IT staff 
• Pentaho mapReduce runs much 
faster in the cluster vs. other 
scripting tools 
Transactions – 
Batch & Real-time 
Enrollments & PDI 
Redemptions 
Location, 
Email, Other 
Data 
Hadoop 
Cluster 
PDI 
Analytic 
Database 
Analyzer 
Reports
Pentaho Big Data Analytics Platform 
Simplified data preparation and analytics for all users 
Simplified 
Analytics 
Experience 
Blended 
Big Data 
Enterprise 
Big Data 
Integration 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 28 +1 (866) 660-7555
Claudia Imhoff 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
29 
President and Founder 
Intelligent Solutions, Inc. 
A thought leader, visionary, and practitioner, Claudia 
Imhoff, Ph.D., is an internationally recognized expert on 
analytics, business intelligence, and the architectures to 
support these initiatives. Dr. Imhoff has co-authored five 
books on these subjects and writes articles (totaling more 
than 150) for technical and business magazines. 
She is also the Founder of the Boulder BI Brain Trust, a 
consortium of internationally-recognized independent 
analysts and experts. You can follow them on Twitter at 
#BBBT or become a subscriber at www.bbbt.us. 
Email: claudia@bbbt.us 
Phone: 303-444-6650 
Twitter: Claudia_Imhoff
Topics 
§ An extended data warehouse architecture for a 
modern BI environment 
§ Questions 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
30
Data Warehouse Technology 
Drivers 
§ Do more with less 
§ Data compression 
§ Schemas on read 
§ Open source components 
§ In-Memory capabilities 
§ Simpler environments 
§ Cloud deployments 
§ Easier data management 
§ Mobile and Self-service BI 
§ Built-in analytic functions 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
31
Extended Data Warehouse 
Architecture 
Analytic tools & applications 
RT analysis engine 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
32 
Traditional EDW 
environment 
Investigative computing 
platform 
Data 
refinery 
Data integration 
platform 
Operational real-time environment 
Other internal & external 
structured & multi-structured data 
Real-time streaming data 
Operational systems 
BI services 
Slide created by Colin White – BI Research, Inc.
Data Integration Use Case: Data 
Refinery 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
Ingests raw detailed data in batch 
and/or real-time into a managed data 
store 
Distills the data into useful business 
information and distributes the results 
to downstream systems 
May also directly analyze certain 
types of data 
Employs low-cost hardware and 
software to enable large amounts of 
detailed data to be managed cost 
effectively 
Requires (flexible) governance 
policies to manage data security, 
privacy, quality, archiving and 
destruction 
Traditional EDW 
environment 
Investigative computing 
platform 
Data 
refinery 
Data integration 
platform 
33
Traditional EDW Use Cases 
Analytic tools & applications 
Operational systems RT analysis engine 
BI services 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
Most BI environments today 
§ New technologies can be 
incorporated into the EDW 
environment to improve 
performance, efficiency and 
reduce costs 
Use cases 
§ Production reporting 
§ Historical comparisons 
§ Customer analysis (next 
best offer, segmentation, 
life-time value scores, 
churn analysis, etc.) 
§ KPI calculations 
§ Profitability analysis 
§ Forecasting 
Traditional EDW 
environment 
Data 
refinery 
Data integration 
platform 
Operational real-time environment 
34
Investigative Computing Use 
Cases 
New technologies used here 
include: 
o Hadoop, in-memory computing, 
columnar storage, data 
compression, appliances, etc. 
Use cases 
o Data mining and predictive 
modeling for EDW and real-time 
environments 
o Cause and effect analysis 
o Data exploration (“Did this ever 
happen?” “How often?”) 
o Pattern analysis 
o General, unplanned investigations 
of data 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
Analytic tools & applications 
Investigative computing 
Data 
refinery 
platform 
Data integration 
platform 
RT analysis engine 
Operational systems 
BI services 
Operational real-time environment 
35
Operational RT Environment Use 
Cases 
Embedded or callable BI services: 
o Real-time fraud detection 
o Real-time loan risk 
assessment 
o Optimizing online promotions 
o Location-based offers 
o Contact center optimization 
o Supply chain optimization 
RT analysis engine 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
Real-time analysis engine: 
§ Traffic flow optimization 
§ Web event analysis 
§ Natural resource 
exploration analysis 
§ Stock trading analysis 
§ Risk analysis 
§ Correlation of unrelated 
data streams (e.g., weather 
effects on product sales) 
Operational real-time environment 36 
Other internal & external 
structured & multi-structured data 
Real-time streaming data 
Operational systems 
BI services 
36
BUT – All Components Must Work 
Together! 
New sources of data Enterprise DW 
Investigative 
existing 
customer 
data 
Data refinery computing platform Operational systems 
Analytic tools 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
37 
analytic models 
analyses 
next best 
customer offer 
3rd party data 
location data 
social data 
feedback 
RT analysis engine 
call center dashboard 
or web event stream 
Slide created by Colin White – BI Research, Inc.
Topics 
§ Extending the data warehouse architecture for a 
modern analytics environment 
§ Questions 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
38
Many Organizations Not Ready 
for Big Data? 
§ Many companies are struggling to get a traditional 
data warehouse in place and produce basic BI 
§ Business users not analytically savvy 
§ Minimal governance 
§ Chaotic architectures 
§ What do you say to these organizations? 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
39
Existing Data Warehouse 
§ Do organizations have to rip and replace their 
existing DW to solve big data problems? 
§ When do I use a traditional DW versus the Hadoop 
environment? 
§ Does the data hub replace the data warehouse? 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
40
Data Integration 
§ Where is ETL used and not used? 
§ How do enterprises control data blending and 
virtualization (do they need to)? 
§ Is data governance still important? 
§ How does it change in this new environment? 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
41
New IT Skills 
§ To achieve DW optimization… 
§ Does IT have to rip and replace their employees? 
§ Should they rely on consultants? 
§ To what extent? 
§ What is needed to move from basic DW to a big 
data architecture? 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
42
Evolving to Advanced 
Analytics 
§ Is it mandatory to hire data scientists? 
§ Is training on new technology enough? 
§ What else is needed to make the company more 
analytically-driven? 
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 
43
Twitter Tag: #briefr 
The Briefing Room
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
The Briefing Room

Mais conteúdo relacionado

Mais de Inside Analysis

To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariInside Analysis
 

Mais de Inside Analysis (20)

To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
DisrupTech 2015ek
DisrupTech 2015ekDisrupTech 2015ek
DisrupTech 2015ek
 

Último

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Último (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

What Is Hadoop and Where Is It Going?

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Hand in Hand—Optimizing the Data Warehouse for Big Data The Briefing Room
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. ! Reveal the essential characteristics of enterprise software, good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics This Month: BIG DATA May: DATABASE June: ANALYTICS & MACHINE LEARNING 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Claudia Imhoff Claudia Imhoff is President & Founder of Intelligent Solutions, Inc.
  • 8. Twitter Tag: #briefr The Briefing Room Pentaho ! Pentaho offers a suite of open source business intelligence products called Pentaho Business Analytics ! Pentaho’s big data solution provides access to any data source, and includes data integration, discovery, analysis and visualization ! Pentaho’s solutions are available in community or enterprise editions
  • 9. Twitter Tag: #briefr The Briefing Room Guest: Chuck Yarbrough Chuck is the Director of Big Data Product Marketing at Pentaho, a leading big data analytics company that helps organizations engineer big data connections, blend data and report and visualize all of their data. Much of Chuck's focus at Pentaho is in educating organizations on how big data can help win, serve and retain customers, lower costs and grow revenue through the proper use of big data. A life-long participant in the data game, Chuck has held leadership roles at Deloitte Consulting, SAP Business Objects, Hyperion and National Semiconductor.
  • 10. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 10 +1 (866) 660-7555 April 29, 2014 Director, Big Data Product Marketing @cyarbrough Data Warehouse Optimization Blueprint Chuck Yarbrough
  • 11. OUR VISION The New Reality: Powerful yet simplified analytics for all users Billing Social Media Location Customer Web Network © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 11 +1 (866) 660-7555 Analytics ANY Analytics • Reports • Dashboards • Visualizations • Discovery • Predictive • Any role Existing & New Data Infrastructure & Processes ANY Environment • Data warehouses • Data marts • Stack vendors • Cloud • Embedded ANY Data • Relational • Operational • Big Data • Data sources not yet anticipated
  • 12. Emerging big data use cases demand blending multiple data sources Improve operational effectiveness Machines/sensors: predict failures, network attacks Financial risk management: reduce fraud, increase security © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 12 +1 (866) 660-7555 Reduce data warehouse cost Integrate new data sources without increased database cost Provide online access to ‘dark data’ Drive incremental revenue Predict customer behavior across all channels Understand and monetize customer behavior Begin to monetize data as a service
  • 13. A Spectrum of Big Data Use Cases What the Market is Deploying Today and Planning for Tomorrow Entry Transform © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 13 +1 (866) 660-7555 Advanced Optimize Streamlined Data Refinery Big Data Exploration Customer 360 Degree View Harnessing Machine & Sensor Data Next Generation Applications Internal Big Data as a Service On-Demand Big Data Blending Big Data Predictive Analytics Use Case Complexity Business Impact Monetize My Data Data Warehouse Optimization
  • 14. A Spectrum of Big Data Use Cases What the Market is Deploying Today and Planning for Tomorrow Entry Transform © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 14 +1 (866) 660-7555 Advanced Optimize Streamlined Data Refinery Big Data Exploration Customer 360 Degree View Harnessing Machine & Sensor Data Next Generation Applications Internal Big Data as a Service On-Demand Big Data Blending Big Data Predictive Analytics Use Case Complexity Business Impact Monetize My Data Data Warehouse Optimization
  • 15. Data Warehouse Optimization Remove the clutter and connect to Big Data Cut Downtime and Focus on Product Creation © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 15 +1 (866) 660-7555 Remove Costly Legacy Systems Simplicity Empowers Business Users “Using Pentaho in our data warehouse, it now takes about 20 minutes to break down a metric and do specific analysis to identify performance issues. In the past, similar queries would take all night.” Greg Allen, Business Analyst, Kiva “Pentaho Data Integration not only simplifies the data delivery process but also enables us to gather the high-quality data. Ultimately Pentaho has enabled us to reach our goal of making the Swiss real estate market more transparent.” Prof. Dr. Peter IlG, Managing Director, Swiss Real Estate Datapool “We needed fully functional reporting and data integration tools but wanted to cut the cost burden experienced with Oracle. After looking at what was out there, Pentaho had the complete tool set, and after further testing, our users noticed no difference in the features they need.” Uwe Geercken. IT Manger, Swissport
  • 16. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 16 +1 (866) 660-7555 Why Do It? • Save data capacity & management costs • Empower IT and business users to meet goals on time Key Considerations • Normally leverages Hadoop • Relevant across industries • May require new coding skillsets that are hard to find What is it? • Existing DW infrastructure can’t support data explosion, & adding DW capacity is costly • So offload low priority data to Big Data store to extend capacity
  • 17. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 17 +1 (866) 660-7555
  • 18. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 18 +1 (866) 660-7555 PDI Hadoop Cluster
  • 19. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 19 +1 (866) 660-7555 PDI Hadoop Cluster
  • 20. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI Other Data Sources PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 20 +1 (866) 660-7555 PDI Hadoop Cluster
  • 21. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI Other Data Sources PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 21 +1 (866) 660-7555 PDI Hadoop Cluster Analytic Data Mart PDI
  • 22. Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users CRM & ERP Systems Data Warehouse PDI Other Data Sources PDI © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 22 +1 (866) 660-7555 PDI Hadoop Cluster Analytic Data Mart PDI Relational Layer
  • 23. Data Warehouse Optimization Cost effective, fast processing Business Challenge • Gain competitive advantage through intraday balance reporting for commercial customers • Use Hadoop and relational data stores to process huge volumes 15x faster © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 23 +1 (866) 660-7555 to develop 10x faster to execute No coding Integrate with existing Easy to find resources Pentaho Benefits • Graphical orchestration for Hadoop, Hbase & DB2 data integration workloads • 15x faster to develop, 10x faster to execute A Major Financial Institution
  • 24. Optimize data infrastructure to connect hundreds of interdependent banking applications Cash Processing Data Scalable Enterprise Data Hub © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 24 +1 (866) 660-7555 Internal User Reporting & Data mining Clients Statements, Balance, Transaction Reporting & Analytics A Major Financial Institution Hadoop Cluster Historical Data Mart Data Marts Customer & Account Master Data Payments Data Other Financial Apps PDI PDI Hundreds of Enterprise Data Sources
  • 25. Thank You JOIN THE CONVERSATION. YOU CAN FIND US ON: blog.pentaho.com @Pentaho © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 25 +1 (866) 660-7555 Facebook.com/Pentaho Pentaho Business Analytics
  • 26. Streamlined Data Refinery Drive a Sustainable Analytics Strategy with Big Data ETL at Scale Why Do It? • Give business users insight into all data • Scale ETL and data management cost savings • Next step after DW optimization Text Here © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 26 +1 (866) 660-7555 Vertical Fit High Tech, Telecom, Media, Financial Services, etc Technology Fit Primarily Hadoop, but also NoSQL Benefits • Establish usable analytics on diverse sources at high volume (terabytes+) • Speed queries substantially with rapid ingestion & powerful processing • Reduce costs of ETL processing Challenges • Expansive integration project • May require new coding skillsets that are hard to find • May call for swapping from a data warehouse to a higher performing Analytic database, depending on requirements What is It? In the face of exploding volumes of transaction, customer, and other data, traditional ETL systems slow down, making analytics unworkable. One solution is to streamline most data through a scalable Big Data processing hub – that pushes refined data to a data warehouse or analytical database for low-latency self-service analytics across a diverse base of data.
  • 27. Streamlined Data Refinery Drive a Sustainable Analytics Strategy with Big Data ETL at Scale © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 27 +1 (866) 660-7555 Why • Offers a full platform for this use case, including broad data integration (incl. leading Hadoop distros and analytic DBs) and a powerful array of easy to use front-end analytics • Visual mapReduce mitigates need for additional developers, and makes Big Data accessible to existing IT staff • Pentaho mapReduce runs much faster in the cluster vs. other scripting tools Transactions – Batch & Real-time Enrollments & PDI Redemptions Location, Email, Other Data Hadoop Cluster PDI Analytic Database Analyzer Reports
  • 28. Pentaho Big Data Analytics Platform Simplified data preparation and analytics for all users Simplified Analytics Experience Blended Big Data Enterprise Big Data Integration © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 28 +1 (866) 660-7555
  • 29. Claudia Imhoff Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 29 President and Founder Intelligent Solutions, Inc. A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines. She is also the Founder of the Boulder BI Brain Trust, a consortium of internationally-recognized independent analysts and experts. You can follow them on Twitter at #BBBT or become a subscriber at www.bbbt.us. Email: claudia@bbbt.us Phone: 303-444-6650 Twitter: Claudia_Imhoff
  • 30. Topics § An extended data warehouse architecture for a modern BI environment § Questions Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 30
  • 31. Data Warehouse Technology Drivers § Do more with less § Data compression § Schemas on read § Open source components § In-Memory capabilities § Simpler environments § Cloud deployments § Easier data management § Mobile and Self-service BI § Built-in analytic functions Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 31
  • 32. Extended Data Warehouse Architecture Analytic tools & applications RT analysis engine Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 32 Traditional EDW environment Investigative computing platform Data refinery Data integration platform Operational real-time environment Other internal & external structured & multi-structured data Real-time streaming data Operational systems BI services Slide created by Colin White – BI Research, Inc.
  • 33. Data Integration Use Case: Data Refinery Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved Ingests raw detailed data in batch and/or real-time into a managed data store Distills the data into useful business information and distributes the results to downstream systems May also directly analyze certain types of data Employs low-cost hardware and software to enable large amounts of detailed data to be managed cost effectively Requires (flexible) governance policies to manage data security, privacy, quality, archiving and destruction Traditional EDW environment Investigative computing platform Data refinery Data integration platform 33
  • 34. Traditional EDW Use Cases Analytic tools & applications Operational systems RT analysis engine BI services Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved Most BI environments today § New technologies can be incorporated into the EDW environment to improve performance, efficiency and reduce costs Use cases § Production reporting § Historical comparisons § Customer analysis (next best offer, segmentation, life-time value scores, churn analysis, etc.) § KPI calculations § Profitability analysis § Forecasting Traditional EDW environment Data refinery Data integration platform Operational real-time environment 34
  • 35. Investigative Computing Use Cases New technologies used here include: o Hadoop, in-memory computing, columnar storage, data compression, appliances, etc. Use cases o Data mining and predictive modeling for EDW and real-time environments o Cause and effect analysis o Data exploration (“Did this ever happen?” “How often?”) o Pattern analysis o General, unplanned investigations of data Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved Analytic tools & applications Investigative computing Data refinery platform Data integration platform RT analysis engine Operational systems BI services Operational real-time environment 35
  • 36. Operational RT Environment Use Cases Embedded or callable BI services: o Real-time fraud detection o Real-time loan risk assessment o Optimizing online promotions o Location-based offers o Contact center optimization o Supply chain optimization RT analysis engine Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved Real-time analysis engine: § Traffic flow optimization § Web event analysis § Natural resource exploration analysis § Stock trading analysis § Risk analysis § Correlation of unrelated data streams (e.g., weather effects on product sales) Operational real-time environment 36 Other internal & external structured & multi-structured data Real-time streaming data Operational systems BI services 36
  • 37. BUT – All Components Must Work Together! New sources of data Enterprise DW Investigative existing customer data Data refinery computing platform Operational systems Analytic tools Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 37 analytic models analyses next best customer offer 3rd party data location data social data feedback RT analysis engine call center dashboard or web event stream Slide created by Colin White – BI Research, Inc.
  • 38. Topics § Extending the data warehouse architecture for a modern analytics environment § Questions Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 38
  • 39. Many Organizations Not Ready for Big Data? § Many companies are struggling to get a traditional data warehouse in place and produce basic BI § Business users not analytically savvy § Minimal governance § Chaotic architectures § What do you say to these organizations? Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 39
  • 40. Existing Data Warehouse § Do organizations have to rip and replace their existing DW to solve big data problems? § When do I use a traditional DW versus the Hadoop environment? § Does the data hub replace the data warehouse? Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 40
  • 41. Data Integration § Where is ETL used and not used? § How do enterprises control data blending and virtualization (do they need to)? § Is data governance still important? § How does it change in this new environment? Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 41
  • 42. New IT Skills § To achieve DW optimization… § Does IT have to rip and replace their employees? § Should they rely on consultants? § To what extent? § What is needed to move from basic DW to a big data architecture? Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 42
  • 43. Evolving to Advanced Analytics § Is it mandatory to hire data scientists? § Is training on new technology enough? § What else is needed to make the company more analytically-driven? Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved 43
  • 44. Twitter Tag: #briefr The Briefing Room
  • 45. This Month: BIG DATA May: DATABASE June: ANALYTICS & MACHINE LEARNING www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 46. Twitter Tag: #briefr THANK YOU for your ATTENTION! The Briefing Room