3. Data
30 million
networked
sensors growing
at 30% a year
Computing
1 trillion devices
connected to the
Internet by 2015
Experience
500 million smart
phone users
increasing 20% a year
Social
Machine Generated
User Generated
Feedback loops driving exponential growth
4. Evolving towards end-to-end real-time analytics
Decade Paradigm Architecture Platform
• Reporting / Data Mining
• High Cost / Isolated use
90s
2000s
Today
• Model-based discovery
• High Cost / Dept Use
• Unbounded Map Reduce Query
• Low Cost / Enterprise Use
• Arrival of vast amounts of
unstructured data
• Batch – “sales reports”
• Sequential SQL queries
• Batch-ie correlated buying pattern
• No SQL. parallel analysis
• Shared disk/memory
Unlimited
Linear Scale
RDMS
Proprietary MPP/
DW Appliance
Open Source SW loosely
coupled to commodity HW
No SQL RDMS
Scale
Scale NodeNode
• Real-time - ie recommend engine
• Process @ storage node
• Built-in data replication/reliability
• Shared nothing, in memory
Distributed node addition
NodeNode Node
Multi-core
Node
5. Make big data work for you
Amount of data your enterprise will need to ingest: 50X
Proportion of data that is useful to you: 10%
Projected increase in your IT budget: 10%
=> Business as usual is not an option
8. Fabricating silicon for big data
22nm
A Revolutionary Leap
in Process
Technology
37%
Performance Gain at Low
Voltage1
>50%
Active Power Reduction at
Constant Performance1
Intel lead vs. Industry
3.5 years
2007
45 nm
2009
32 nm
2011
22 nm
High-k Metal Gate Tri Gate
Intel lead vs. Industry
4 years
9. Intel® Xeon® Processor E5-4600
Product Family
Highest reliability & scalability
Highest memory capacity
Highest enterprise & database performance
Density-optimized
Cost-optimized
Improved HPC performance
1 Source: Published results as of 8 May 2012. See http://www.intel.com/performance/server/xeonE7/summary.htm for full list of benchmarks and configuration details.
Pumping the heart of the open datacenter
Intel® Xeon® Processor E7-4800
Product Family
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products.
10. Enabling open source solutions
Optimize software to take advantage of Intel® architecture
AES-NI SSD, 10GbE TXTMCAVT-*
3x performance in
3 years
Mission Critical
deployments
Accelerates Crypto
in JBoss
30x throughput Trusted Compute
Pools
11. Contributing to Apache Hadoop
• File based encryption for Hadoop jobs
• ACLs for HDFS and HBase at cell level
• Flash storage for MapReduce shuffle data
• Caching and non-volatile memory for increased throughput
• HDFS adaptive replication of hot-files
• HBase distributed tables across data centers
• HDFS data replication across data centers
• Archival storage support for cold data on HDFS
• SSE Instructions
• JVM Enhancements
• Infiniband RDMA Support
12. Supporting Intel Distribution for Apache Hadoop
Data Mining
Graph Analytics
Full Text SearchFull SQL
Batch Analytics
Security
13. Intel® Distribution for Apache Hadoop* software
Granular access control in HBase
Up to 20X faster crypto with AES-NI*
30X faster Terasort on Intel® Xeon
processors, Intel 10GbE, and SSD
Up to 8.5X faster queries in Hive*
Job profiling and configuration,
automated by Intel® Active Tuner
*Based on internal testing
Rhino
Cloud
HPC
Common authentication,
access control, auditing
Bringing MapReduce to
data on Lustre FS
Enabling real-time 100%
SQL on Hadoop
Optimizing Hadoop for
virtualization & cloud
14. Backed by portfolio of datacenter products
Software
NetworkStorage & MemoryServer
Cache
Acceleration
Software
15. With broad support from the ecosystem
* Other names and brands may be claimed as the property of others.
16. Proven in the enterprise
Using the Intel® Distribution to gain tremendous results
* Other names and brands may be claimed as the property of others.
IT
17. Putting advanced capabilities at work…
• Expose new data
• Dashboard/historical reporting
• Real-time campaigns
• Vertical apps
• Predictive data services
• Graph visualization
• Log analysis
to solve real use cases
• Fraud & threat detection
• Life sciences research
• Behavioral analysis
• Warranty analysis
• Customer segmentation
• Infrastructure optimization
From Hype to High Performance
18. Data-Driven Business: Customer Service
Value
• Enable subscriber access to billing data
• 30X gain in performance; lower TCO
Analytics
• Provides real-time retrieval of 6 months data
• Supports new BI with 15 types of queries
• Enables targeted ad serving and promotions
Data Management
• 30 TB/month of billing data
• 300K reads/second; 800K inserts/second
• 133-node cluster / Intel Xeon E5 processors
CDR
Subscriber Self Service
Intel Distribution
19. Value
Enable researchers to discover biomarkers and drug
targets by correlating genomic data sets
90% gain in throughput; 6X data compression
Analytics
Provide curated data sets with pre-computed analysis
(classification, correlation, biomarkers)
Provide APIs for applications to combine and analyze
public and private data sets
Data Management
Use Hive and Hadoop for query and search
Dynamically partition and scale Hbase
10-node cluster / Intel Xeon E5 processors / 10GbE
Data-Intensive Discovery: Genomics
Intel Distribution
20. Data-Rich Communities: Smart City
Value
• Enforce traffic laws and detect license fraud
• Monitor and predict traffic patterns
• In a city of 31 million people
Analytics
• Detect traffic law violations automatically
• Detect driver license fraud by data mining
• Forecast traffic with predictive analytics
Data Management
• 30,000 cameras
• 6Mb/s stream rate per camera
• 15 PB of images in use / 2B records in HBase
Detection Prevention
Regional
Local