The data center has gone through several inflection points in the past decades: adoption of Linux, migration from physical infrastructure to virtualization and Cloud, and now large-scale data analytics with Big Data and Hadoop.
Please join us to learn about how Cloudera and Intel are jointly innovating through open source software to enable Hadoop to run best on IA (Intel Architecture) and to foster the evolution of a vibrant Big Data ecosystem.
Dev Dives: Streamline document processing with UiPath Studio Web
Intel and Cloudera: Accelerating Enterprise Big Data Success
1. 1
Intel and Cloudera:
Accelerating Enterprise Big Data Success
Alan Saldich | VP of Marketing | Cloudera
Ron Kasabian | VP of Big Data Solutions | Intel
2. 2
Big Picture: Datacenter Inflection
Cluster to Cloud
ASIC to IA/Fabric3
Big Data4
Physical to Virtual
SW-only to HW-assisted2
2010 2011 2012 2013
Public
Private
2008 2009 2010 2011 2012 2013
Virtualized
Nonvirtualized
RISC to IA
UNIX to Linux
1
Linux/x86 Units
UNIX/RISC units
2000 20132001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
0
“In 2000 Intel saw Linux coming &
invested in heavily in Red Hat; in 2005
we saw virtualization happening and
invested in VMware; in 2008 we
started investing heavily in hyper-scale
computing.
We think big data & Hadoop
will dwarf all of them.”
Diane Bryant, SVP & GM
Data Center Group, Intel
3. 3
Intel, the Open-Source Software Company
~10k SW Developers, 35+ Sites
3
Commercial
ecosystem
Academic
research
Customer
solutions
#2 Linux
contributor
100010
011011
Open building
blocks
Industry
standards
Tools and
resources
4. 4
Research
Benchmarking
Tuning
Optimization
Product
History of Intel and Apache Hadoop*
2009 2014
Open Cirrus*
HiBench
Release IDH 1.0
(2011)
* Other names and brands may be claimed as the property of others.
Release IDH 2.0
(2012)Telco Smart City
Web
RetailHealthcare
Release IDP 3.1
(2014)
5. 5
Delivering Big Data Solutions
Consumer Behavior Security &
Risk Management
Operational
Efficiency
Location Aware
Ad Placement
Buyer Protection
Program
Personalized
Preventive Care
Claim Fraud
Reduction
Traffic
Optimization
Smart Energy
Grid
6. 6
The Big Data Platform
Analytic Tools and Utilities
Data
Servers Storage Network
* Other names and brands may be claimed as the property of others.
Services
Big Data Platform
Ecosystem of
Verticals
Scalable Data &
Analytics Platform
Composable
Resource
Pools
9. 9
Intel-Cloudera Strategic Alliance
Advance the data management industry by:
1. Combining the strengths’ of IDH and CDH
2. Driving adoption through standardization and open
source innovation
3. Delivering a Moore’s Law multiplier => 10X
improvement in price / performance above and beyond
Moore’s law
Industry leading CDH is superset of CDH and IDH features
11. 11
A Strong Track Record of Innovation
2008
CLOUDERA FOUNDED
BY MIKE OLSON
AMR AWADALLAH &
JEFF HAMMERBACHER
2009
HADOOP CREATOR
DOUG CUTTING
JOINS CLOUDERA
2009
CLOUDERA RELEASES CDH
THE FIRST COMMERCIAL
APACHE HADOOP
DISTRIBUTION
2010
CLOUDERA MANAGER:
FIRST MANAGEMENT
APPLICATION FOR
HADOOP
2011
CLOUDERA
REACHES 100
PRODUCTION
CUSTOMERS
2011
CLOUDERA
UNIVERSITY
EXPANDS TO 140
COUNTRIES
2012
CLOUDERA ENTERPRISE 4
THE STANDARD FOR
HADOOP IN THE
ENTERPRISE
2012
CLOUDERA
CONNECT REACHES
300 PARTNERS
2014
THE ENTERPRISE
DATA HUB
LAUNCHED
2013
CLOUDERA IMPALA
CLOUDERA NAVIGATOR
CLOUDERA SEARCH
2013
TOM REILLY JOINS AS CEO
OVER 800 PARTNERS
IN CLOUDERA CONNECT
2014
SERIES F FUNDING WITH
INTEL AS KEY PARTNER
OVER 900 PARTNERS
IN CLOUDERA CONNECT
2014
CLOUDERA
ENTERPRISE 5
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK BIGGER
QUESTIONS
ENTERPRISE
DATA HUB
CLOUDERA
ENTERPRISE
5
12. 12
Expanding Big Data Requires A New Approach
1980s
Bring Data to Compute
Now
Bring Compute to Data
Relative size & complexity
Data
Information-centric
businesses use all data:
Multi-structured,
internal & external data
of all types
Compute
Compute
Compute
Process-centric
businesses use:
• Structured data mainly
• Internal data only
• “Important” data only
Compute
Compute
Compute
Data
Data
Data
Data
13. 13
Hadoop Changes the Game:
Storage and Compute on One Platform
The Old Way
Expensive & Unattainable
The Hadoop Way
Affordable & Attainable
14. 14
The Old Way: Bringing Data to Compute
Complex Architecture
• Many special-purpose
systems
• Moving data around
• No complete views
4
Missing Data
• Leaving data behind
• Risk and compliance
• High cost of storage
1
Time to Data
• Up-front modeling
• Transforms slow
• Transforms lose data
2
Cost of Analytics
• Existing systems strained
• No agility
• “BI backlog”
3
SERVERSMARTSEDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
15. 15
SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
The New Way: Bringing Compute to Data
Diverse Analytic Platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True analytic agility
4
1
2
3 4
Active Compliance Archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage
1
Persistent Staging
• One source of data for all analytics
• Persist state of transformed data
• Significantly faster & cheaper
2
Self-Service Exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests
3
16. 16
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open Architecture
Secure and
Governed
From Hadoop to an Enterprise Data Hub
✔
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
CLOUDERA’S ENTERPRISE DATA HUB
Filesystem Online NoSQL
✔
17. 17
Discover New Use Cases
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
CONSUMER
PACKAGED GOODS
Sentiment
analysis of
what’s hot,
customer service
TRAVEL & TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer
sentiment
RETAIL
Consumer sentiment
Optimized
marketing
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
EDUCATION
& RESEARCH
Experiment
sensor
analysis
LIFE SCIENCES
Clinical trials
Genomics
AUTOMOTIVE
Auto sensors
reporting location,
problems
COMMUNICATIONS
Location-
based
advertising
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
UTILITIES
Smart Meter
analysis for
network
capacity
OIL & GAS
Drilling
exploration
sensor
analysis
19. 19
Ensuring Cloudera runs best on Intel Architecture
• Encryption (AES-NI)
• Compression (SSE 4.2)
• Math (MKL)
Software & Silicon co-evolve to deliver dramatic gains
1 Push compute-
intensive work down
to the silicon
Increase main
memory utilization up
to 20X
Design for rack-
scale architecture
200:1
10:1
Improve Disk:Memory
2 3
20. 20
Faster Insights, Better Security, & Less Complexity
•Maintain an open horizontal platform for big data
•Continue to enhance Apache Hadoop and related projects
Accelerate innovation via open source software
•Optimize performance across compute, storage, & network
•Ensure platform security, enhanced by hardware
Enable Hadoop to run best on IA
•Establish usage models and industry standard benchmarks
•Develop reference architectures and industry-wide solutions
Foster evolution of big data ecosystem