SlideShare uma empresa Scribd logo
1 de 32
Confidential © 2014 Actian Corporation1
SQL + Hadoop: The High Performance
Advantage
Turn Hadoop into a High Performance Analytics Platform
Emma McGrattan, Actian
Jim Hare, Actian
8 July 2014
Confidential © 2014 Actian Corporation2
1. Introduction
2. Hadoop Challenges
3. Actian Analytics Platform –
Hadoop SQL Edition
4. Industrialized, High
Performance SQL in Hadoop
5. Questions
Agenda
All lines are muted
To ask a question, use
Chat or Q&A panel
Recording will be made
available
We‘ll be running a few
polling questions
Confidential © 2014 Actian Corporation3
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
3
“Actian is now very powerfully
positioned in the big data and
analytics markets.” Robin Bloor
Actian is Delivering Transformational Value
“Actian has assembled all of the next generation
IPs into a single analytics platform, allowing
users a level of flexibility in data interaction that
competitors have not been able to match.”
siliconANGLE
Confidential © 2014 Actian Corporation4
Big Data Offers Significant Opportunities
Personalized Experience
New Products/Services
Reduce Risk
Predictive Analytics
Many Data Sources
Low Cost Storage
…But only for those who embrace it
Improve Decision-Making
Confidential © 2014 Actian Corporation5
Enter Hadoop as the Big Data Enabler
for Low Cost Storage
DW
Offload
Landing
Zone
Data Reservoir
?
Confidential © 2014 Actian Corporation6
But It isn’t Easy with Hadoop
Batch performance
Time to Value
Expensive Skills
Silo’d Data
Access
Data preparation
Confidential © 2014 Actian Corporation7
Hadoop Complexity Forcing Organizations
to Move Data in order to Analyze it
DW
Offload
Landing
Zone
Hadoop Data Reservoir
Data
Management
Analytics
Processing
Visualization
& Data
Science
Workbench
Result: duplicate storage & infrastructure costs, more IT
resources, network bandwidth usage, and complexity
Data
Transfer
Confidential © 2014 Actian Corporation8
CIOs Challenged by Big Data Costs
One in three CIOs pay
between 21 cents to 30 cents per
gigabyte a month.
Translation: it costs a company $3.12
million per year to store 500,000
gigabytes at an average cost of 26
cents per gigabyte per month.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation9
CIOs Challenged by Types of Big Data
73% of CIOs day up
to 50% of their data
will be unstructured
within two years.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation10
Instead, what if you could move the
analytic processing to the Hadoop data?
Data Science
Workbench
Analytic
Processing
Data
Management
… And transform Hadoop from a data lake into a high
performance, fully functional analytics platform
SQL User
Access
Confidential © 2014 Actian Corporation11
What is it?
Introducing the Actian Analytics Platform –
Hadoop SQL Edition
Patented X100 vector processing engine plus visual data and analytics work
flow, all running natively in Hadoop via YARN
Turns Hadoop into a High-Performance, Fully-Functional Analytics Database
How is this unique?
Highest performing, most industrialized SQL access to Hadoop data
Only end-to-end analytic processing natively in Hadoop
Most consumable, accessible, manageable Hadoop analytics
What does this mean to you?
Removes all barriers for business access to big data analytics
Enables SQL users with no constraints on Hadoop data
Accelerates time to value
Confidential © 2014 Actian Corporation12
The Industry’s Abuzz – about Actian!
“Deploying on Hadoop enables the Actian Analytics Platform to scale to massively
parallel scale without having to modify the underlying engine. For Actian, Hadoop
is a means to an end; it provides an opening for Actian to introduce a fast SQL
engine that operates at scale.”
Tony Baer, Principal Analyst, Software, Ovum
“Actian’s platform now makes Hadoop data repositories accessible to the entire
enterprise by empowering millions of business-savvy SQL users and business
analysts to conduct advanced analytics directly on data in the Hadoop
Distributed File System (HDFS). Companies investing in Hadoop now can
broaden the scope of data discovery, increase the accuracy of decisions, and
speed time to value.”
Daniel Gutierrez, Inside Big Data
“The latest version of the Actian Analytics Platform provides end-to-end analytic
processing natively in Hadoop. This will make the Hadoop Big Data framework
more accessible by offering high-performance ELT (extract, load and transform)
and SQL analytics on Hadoop with no need for MapReduce skills. This is a big
deal because data scientists with Hadoop skills are in short supply, while SQL
skills are relatively abundant.”
Confidential © 2014 Actian Corporation13
Libraries of Analytics
Hadoop
Connections to Access Any Data
Actian Analytics Platform – Hadoop SQL Edition
Visual Data and Analytic Workbench
High Performance
Data Flow Engine
Industrialized SQL
Analytics Database
Natively in Hadoop
Removes all barriers for business access to big data
analytics
Business
Processes
Users
Machines
Applications
Expansive Connectivity  Data Blending & Enrichment  Discovery  Data Science  Analytics  Operational BI
Enterprise Data
Machine Data
Social Data
Data Warehouse
SaaS Data
Amazon
Redshift
Confidential © 2014 Actian Corporation14
Actian Analytics Platform – Hadoop SQL Edition
Lightning fast and industrial strength
SQL in Hadoop – Up to 30X faster than
Impala
Full end-to-end analytic processing
platform - all native in Hadoop
Packaged with “real world” solution
blueprints
Confidential © 2014 Actian Corporation15
Visual Data Science & Analytics Workbench
• Drag/drop interface with 100’s of data prep and analytic functions
• Connect, blend, & enrich data and perform discovery & data science
• Build and test predictive models
• Running on top of a high performance data flow engine
• All natively within Hadoop via YARN
Confidential © 2014 Actian Corporation16
Ubiquitous Skills
■ 1 Million+ SQL Users
■ $ Lower cost
■ Easy to find, in most
companies
■ Embedded in the business
Specialty Skills
■ 150K MapReduce
Programmers
■ $$$ Expensive
■ 170K Shortage, hard to find
■ Separate from the business
Unleash millions of business-savvy, SQL users
with no constraints on Hadoop data
Actian Analytics PlatformTM
Analyze ActConnect
+
Confidential © 2014 Actian Corporation17
Actian Analytics Platform = 25 Minutes
Log Reader Filter Rows Group Load Vectork-Means
Coding MapReduce = 4 Weeks
Avro Writer
MapReduce Code
k-Means
MapReduce Code
Log Reader Filter Rows Group Load Vector
MapReduce Code MapReduce Code MapReduce Code MapReduce Code
Accelerate time to value and turn Hadoop data
into transformational value
Confidential © 2014 Actian Corporation18
Vendor Approaches to “SQL on Hadoop”
“marketing jobs”
“wrapped legacy”
“from scratch”
SQL Outside Hadoop
• Connector approach
• MPP DB  need 2 clusters
• Expensive, hard to manage
Mature but non-Integrated
• Legacy engine (e.g. Postgres) + top layer
• Store data outside HDFS (local files)
• Separate Failover Management (tools)
Integrated but Immature
• No trickle updates
• Immature/poor optimizers+engines
• I18N, security, workload mgmt,
access control?
Confidential © 2014 Actian Corporation19
“wrapped
legacy”
“from
scratch”
Maturity
(SQL support,
ACID, reliability,
security, connectivity,
performance)
Hadoop IntegrationLow Native
High
“marketing jobs” Mature &
Integrated
+
+
“SQL on Hadoop” Vendor Landscape
Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20
Actian Vector Hadoop Edition
Actian Analytics Platform
Hadoop SQL Edition
Actian Analytics Platform
NameNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
Prepare
Standard SQL Interfaces
Orchestrate
Connect
Connect to any data
via Actian
DataConnect
Manage dataflow
across the entire
analytic process
6 POINTS OF
INNOVATION:
Vector Processing
On Chip Cache
Fast Real-time
Updates
Smart Compression
Storage Indexes
Multi-Core Parallelism
Running natively in
Hadoop via YARN
Prepare, enrich, and
analyze any data with
Actian DataFlow
NEXT GENERATION
DATABASE
TECHNOLOGY::
Columnar
Compressed
Storage Indexes
Confidential © 2014 Actian Corporation21
Actian Vector – Unmatched InnovationTime/CyclestoProcess
Data Processed
DISK
RAM
CHIP
10GB2-3GB40-400MB
2-20150-250Millions
Vector Processing
Single
Instruction
Multiple
Data
2nd Gen Column Store
Limit I/O
Efficient real time updates
Smarter Compression
Maximize throughput
Vectorized decompression
Exploiting Chip Cache
Process data on chip – not in RAM
1
2
3
4
Multi-core Parallelism
Maximize system resource
utilization…
Storage Indexes
Quickly identify candidate data
blocks
Minimize IO
5
6
Confidential © 2014 Actian Corporation22
TPC-H 1TB – Faster, Less Hardware
0 100,000 200,000 300,000 400,000
Actian Vector 445,529
Actian Vector 436,788
SQL Server 219,888
Oracle 209,534
Oracle 201,487
SQL Server 173,962
Sybase IQ 164,747
Oracle 140,181
SQL Server 134,117
June ‘12
May ‘11
Aug ‘11
June ‘11
Sept ‘11
Apr ‘11
Dec ‘10
Apr ‘10
Dec ‘11
$57,146
$1,229,968
$460,869
$2,402,706
$753,392
$278,527
$85,621
$1,249,967
$258,880
Hardware Cost
(excluding discounts)QphH
Fastest TPC-H QphH@1TB Benchmark (non-clustered)
Source: www.tpc.org /
Confidential © 2014 Actian Corporation23
HADOOP
YARN
HDFS
Standard
SQL
Interfaces
DataNode
HDFS
Visual Data
& Analytics
Workflow
Actian Analytics Platform – Hadoop SQL Edition
Transform Hadoop into a High Performance Analytics Platform
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
X100X100X100
Read
Load
Actian Vector
Blend &
Enrich
Data Science
& Analytics
DataNode
HDFS
X100
HDFS
Vector
• Original file format
• Standard block
replication
NameNode
High Performance,
Industrialized SQL
Database
High Performance,
Parallelized Data Flow
Engine
• Column-based
blocks
• Compressed
• Partitioned
Replicated
Vector
• >=3 Replicated
Copies of Vector
Blocks
• Leveraged to co-
locate data with
various join keys
Confidential © 2014 Actian Corporation24
History of the TPC-DS Comparison
Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25
TPC-DS Benchmark Components
Operational
Systems
Refresh Process Ad-hoc Reporting
Queries
User Queries
DSS Database
TPC-DS
Reports
Store
Web
Catalog
Inventory
Promotions
Set of Files
ETL
Confidential © 2014 Actian Corporation26
Actian Hadoop SQL Performance
0
5
10
15
20
25
30
35
Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98
“Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB)
Speedup vs Impala
Impala Actian
16x avg. speedup
Background to “Impala Subset “of TPC-DS benchmark can be found here:
http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment:
5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks.
SpeedupFactor
Confidential © 2014 Actian Corporation27
Comprehensive – covers full analytic process: data blending & enrichment, discovery &
data science, analytics & operational BI
Accessible – standard ANSI SQL to support standard BI tools; plus key advanced
analytics including cube, grouping sets and windowing functions
Optimized – mature, proven planner and optimizer; optimal use of every node, CPU,
memory, and cache
Secure – native DBMS security including authentication, user and role-based security,
data protection, and encryption
Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide
failover protection
Manageable – resources managed automatically in Hadoop via YARN
Consumable – now usable by millions of users with every SQL tool and application on
the planet
Scalable – unlimited expansion to handle extreme #s of users, nodes, data
Most Industrialized SQL in Hadoop
Confidential © 2014 Actian Corporation28
Actian Director for Management
Confidential © 2014 Actian Corporation29
Actian Analytics Platform – Hadoop SQL Edition
Industrialized, High-Performance SQL in Hadoop
Only end-to-end analytic processing natively in Hadoop
Highest performing, most industrialized SQL in Hadoop
Removes all barriers for business access to big data analytics
Unleashes millions of business-savvy SQL users on Hadoop data
Outperforms Cloudera’s Impala by up to 30x
Actian transforms Hadoop from a data lake into a high-
performance analytics platform.
Confidential © 2014 Actian Corporation30
Transform Hadoop – Transform your Business
Confidential © 2014 Actian Corporation31
3
Get started today! www.actian.com/hadoop
Pre-register for an
evaluation copy of
Actian’s SQL in
Hadoop
bigdata.actian.com/
sql-in-hadoop
Register for a Sand
Hill Hadoop Survey
Results webinar on
July 24, 2014
bigdata.actian.com/
SandHill- Hadoop-
Results
2
1
Confidential © 2014 Actian Corporation32
3
Get started today! www.actian.com/hadoop
Pre-register for an
evaluation copy of
Actian’s SQL in
Hadoop
bigdata.actian.com/
sql-in-hadoop
Register for a Sand
Hill Hadoop Survey
Results webinar on
July 24, 2014
bigdata.actian.com/
SandHill- Hadoop-
Results
2
1

Mais conteúdo relacionado

Mais procurados

Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 

Mais procurados (20)

Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and Microsoft
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Azure for SAP Solutions - Use Cases and Migration Options
Azure for SAP Solutions - Use Cases and Migration OptionsAzure for SAP Solutions - Use Cases and Migration Options
Azure for SAP Solutions - Use Cases and Migration Options
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 

Destaque

Jump start your analytics investments and accelerate analytics ROI
Jump start your analytics investments and accelerate analytics ROIJump start your analytics investments and accelerate analytics ROI
Jump start your analytics investments and accelerate analytics ROI
Actian Corporation
 

Destaque (6)

Drive Customer Loyalty with Big Data 2.0
Drive Customer Loyalty with Big Data 2.0Drive Customer Loyalty with Big Data 2.0
Drive Customer Loyalty with Big Data 2.0
 
7 Ingredients to Create Real Value From Hadoop
7 Ingredients to Create Real Value From Hadoop7 Ingredients to Create Real Value From Hadoop
7 Ingredients to Create Real Value From Hadoop
 
Transforming Healthcare Data Into Value
Transforming Healthcare Data Into ValueTransforming Healthcare Data Into Value
Transforming Healthcare Data Into Value
 
The Bank Job: How to stop ATM Fraud in Real Time
The Bank Job: How to stop ATM Fraud in Real TimeThe Bank Job: How to stop ATM Fraud in Real Time
The Bank Job: How to stop ATM Fraud in Real Time
 
Jump start your analytics investments and accelerate analytics ROI
Jump start your analytics investments and accelerate analytics ROIJump start your analytics investments and accelerate analytics ROI
Jump start your analytics investments and accelerate analytics ROI
 
Elevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customer
 

Semelhante a SQL + Hadoop: The High Performance Advantage�

Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
Adrian Turcu
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
Robin Fong 方俊强
 

Semelhante a SQL + Hadoop: The High Performance Advantage� (20)

Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
OAC Workshop - Detroit 2019
OAC Workshop -  Detroit 2019OAC Workshop -  Detroit 2019
OAC Workshop - Detroit 2019
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 

Último

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

SQL + Hadoop: The High Performance Advantage�

  • 1. Confidential © 2014 Actian Corporation1 SQL + Hadoop: The High Performance Advantage Turn Hadoop into a High Performance Analytics Platform Emma McGrattan, Actian Jim Hare, Actian 8 July 2014
  • 2. Confidential © 2014 Actian Corporation2 1. Introduction 2. Hadoop Challenges 3. Actian Analytics Platform – Hadoop SQL Edition 4. Industrialized, High Performance SQL in Hadoop 5. Questions Agenda All lines are muted To ask a question, use Chat or Q&A panel Recording will be made available We‘ll be running a few polling questions
  • 3. Confidential © 2014 Actian Corporation3 $140M Revenues + Profitable 10,000+ Customers Global Presence: 8 world-wide offices, 7x 24 multinational support model 3 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor Actian is Delivering Transformational Value “Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE
  • 4. Confidential © 2014 Actian Corporation4 Big Data Offers Significant Opportunities Personalized Experience New Products/Services Reduce Risk Predictive Analytics Many Data Sources Low Cost Storage …But only for those who embrace it Improve Decision-Making
  • 5. Confidential © 2014 Actian Corporation5 Enter Hadoop as the Big Data Enabler for Low Cost Storage DW Offload Landing Zone Data Reservoir ?
  • 6. Confidential © 2014 Actian Corporation6 But It isn’t Easy with Hadoop Batch performance Time to Value Expensive Skills Silo’d Data Access Data preparation
  • 7. Confidential © 2014 Actian Corporation7 Hadoop Complexity Forcing Organizations to Move Data in order to Analyze it DW Offload Landing Zone Hadoop Data Reservoir Data Management Analytics Processing Visualization & Data Science Workbench Result: duplicate storage & infrastructure costs, more IT resources, network bandwidth usage, and complexity Data Transfer
  • 8. Confidential © 2014 Actian Corporation8 CIOs Challenged by Big Data Costs One in three CIOs pay between 21 cents to 30 cents per gigabyte a month. Translation: it costs a company $3.12 million per year to store 500,000 gigabytes at an average cost of 26 cents per gigabyte per month. Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html -- CIO Insight
  • 9. Confidential © 2014 Actian Corporation9 CIOs Challenged by Types of Big Data 73% of CIOs day up to 50% of their data will be unstructured within two years. Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html -- CIO Insight
  • 10. Confidential © 2014 Actian Corporation10 Instead, what if you could move the analytic processing to the Hadoop data? Data Science Workbench Analytic Processing Data Management … And transform Hadoop from a data lake into a high performance, fully functional analytics platform SQL User Access
  • 11. Confidential © 2014 Actian Corporation11 What is it? Introducing the Actian Analytics Platform – Hadoop SQL Edition Patented X100 vector processing engine plus visual data and analytics work flow, all running natively in Hadoop via YARN Turns Hadoop into a High-Performance, Fully-Functional Analytics Database How is this unique? Highest performing, most industrialized SQL access to Hadoop data Only end-to-end analytic processing natively in Hadoop Most consumable, accessible, manageable Hadoop analytics What does this mean to you? Removes all barriers for business access to big data analytics Enables SQL users with no constraints on Hadoop data Accelerates time to value
  • 12. Confidential © 2014 Actian Corporation12 The Industry’s Abuzz – about Actian! “Deploying on Hadoop enables the Actian Analytics Platform to scale to massively parallel scale without having to modify the underlying engine. For Actian, Hadoop is a means to an end; it provides an opening for Actian to introduce a fast SQL engine that operates at scale.” Tony Baer, Principal Analyst, Software, Ovum “Actian’s platform now makes Hadoop data repositories accessible to the entire enterprise by empowering millions of business-savvy SQL users and business analysts to conduct advanced analytics directly on data in the Hadoop Distributed File System (HDFS). Companies investing in Hadoop now can broaden the scope of data discovery, increase the accuracy of decisions, and speed time to value.” Daniel Gutierrez, Inside Big Data “The latest version of the Actian Analytics Platform provides end-to-end analytic processing natively in Hadoop. This will make the Hadoop Big Data framework more accessible by offering high-performance ELT (extract, load and transform) and SQL analytics on Hadoop with no need for MapReduce skills. This is a big deal because data scientists with Hadoop skills are in short supply, while SQL skills are relatively abundant.”
  • 13. Confidential © 2014 Actian Corporation13 Libraries of Analytics Hadoop Connections to Access Any Data Actian Analytics Platform – Hadoop SQL Edition Visual Data and Analytic Workbench High Performance Data Flow Engine Industrialized SQL Analytics Database Natively in Hadoop Removes all barriers for business access to big data analytics Business Processes Users Machines Applications Expansive Connectivity  Data Blending & Enrichment  Discovery  Data Science  Analytics  Operational BI Enterprise Data Machine Data Social Data Data Warehouse SaaS Data Amazon Redshift
  • 14. Confidential © 2014 Actian Corporation14 Actian Analytics Platform – Hadoop SQL Edition Lightning fast and industrial strength SQL in Hadoop – Up to 30X faster than Impala Full end-to-end analytic processing platform - all native in Hadoop Packaged with “real world” solution blueprints
  • 15. Confidential © 2014 Actian Corporation15 Visual Data Science & Analytics Workbench • Drag/drop interface with 100’s of data prep and analytic functions • Connect, blend, & enrich data and perform discovery & data science • Build and test predictive models • Running on top of a high performance data flow engine • All natively within Hadoop via YARN
  • 16. Confidential © 2014 Actian Corporation16 Ubiquitous Skills ■ 1 Million+ SQL Users ■ $ Lower cost ■ Easy to find, in most companies ■ Embedded in the business Specialty Skills ■ 150K MapReduce Programmers ■ $$$ Expensive ■ 170K Shortage, hard to find ■ Separate from the business Unleash millions of business-savvy, SQL users with no constraints on Hadoop data Actian Analytics PlatformTM Analyze ActConnect +
  • 17. Confidential © 2014 Actian Corporation17 Actian Analytics Platform = 25 Minutes Log Reader Filter Rows Group Load Vectork-Means Coding MapReduce = 4 Weeks Avro Writer MapReduce Code k-Means MapReduce Code Log Reader Filter Rows Group Load Vector MapReduce Code MapReduce Code MapReduce Code MapReduce Code Accelerate time to value and turn Hadoop data into transformational value
  • 18. Confidential © 2014 Actian Corporation18 Vendor Approaches to “SQL on Hadoop” “marketing jobs” “wrapped legacy” “from scratch” SQL Outside Hadoop • Connector approach • MPP DB  need 2 clusters • Expensive, hard to manage Mature but non-Integrated • Legacy engine (e.g. Postgres) + top layer • Store data outside HDFS (local files) • Separate Failover Management (tools) Integrated but Immature • No trickle updates • Immature/poor optimizers+engines • I18N, security, workload mgmt, access control?
  • 19. Confidential © 2014 Actian Corporation19 “wrapped legacy” “from scratch” Maturity (SQL support, ACID, reliability, security, connectivity, performance) Hadoop IntegrationLow Native High “marketing jobs” Mature & Integrated + + “SQL on Hadoop” Vendor Landscape
  • 20. Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20 Actian Vector Hadoop Edition Actian Analytics Platform Hadoop SQL Edition Actian Analytics Platform NameNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Prepare Standard SQL Interfaces Orchestrate Connect Connect to any data via Actian DataConnect Manage dataflow across the entire analytic process 6 POINTS OF INNOVATION: Vector Processing On Chip Cache Fast Real-time Updates Smart Compression Storage Indexes Multi-Core Parallelism Running natively in Hadoop via YARN Prepare, enrich, and analyze any data with Actian DataFlow NEXT GENERATION DATABASE TECHNOLOGY:: Columnar Compressed Storage Indexes
  • 21. Confidential © 2014 Actian Corporation21 Actian Vector – Unmatched InnovationTime/CyclestoProcess Data Processed DISK RAM CHIP 10GB2-3GB40-400MB 2-20150-250Millions Vector Processing Single Instruction Multiple Data 2nd Gen Column Store Limit I/O Efficient real time updates Smarter Compression Maximize throughput Vectorized decompression Exploiting Chip Cache Process data on chip – not in RAM 1 2 3 4 Multi-core Parallelism Maximize system resource utilization… Storage Indexes Quickly identify candidate data blocks Minimize IO 5 6
  • 22. Confidential © 2014 Actian Corporation22 TPC-H 1TB – Faster, Less Hardware 0 100,000 200,000 300,000 400,000 Actian Vector 445,529 Actian Vector 436,788 SQL Server 219,888 Oracle 209,534 Oracle 201,487 SQL Server 173,962 Sybase IQ 164,747 Oracle 140,181 SQL Server 134,117 June ‘12 May ‘11 Aug ‘11 June ‘11 Sept ‘11 Apr ‘11 Dec ‘10 Apr ‘10 Dec ‘11 $57,146 $1,229,968 $460,869 $2,402,706 $753,392 $278,527 $85,621 $1,249,967 $258,880 Hardware Cost (excluding discounts)QphH Fastest TPC-H QphH@1TB Benchmark (non-clustered) Source: www.tpc.org /
  • 23. Confidential © 2014 Actian Corporation23 HADOOP YARN HDFS Standard SQL Interfaces DataNode HDFS Visual Data & Analytics Workflow Actian Analytics Platform – Hadoop SQL Edition Transform Hadoop into a High Performance Analytics Platform DataNode HDFS DataNode HDFS DataNode HDFS X100X100X100 Read Load Actian Vector Blend & Enrich Data Science & Analytics DataNode HDFS X100 HDFS Vector • Original file format • Standard block replication NameNode High Performance, Industrialized SQL Database High Performance, Parallelized Data Flow Engine • Column-based blocks • Compressed • Partitioned Replicated Vector • >=3 Replicated Copies of Vector Blocks • Leveraged to co- locate data with various join keys
  • 24. Confidential © 2014 Actian Corporation24 History of the TPC-DS Comparison
  • 25. Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25 TPC-DS Benchmark Components Operational Systems Refresh Process Ad-hoc Reporting Queries User Queries DSS Database TPC-DS Reports Store Web Catalog Inventory Promotions Set of Files ETL
  • 26. Confidential © 2014 Actian Corporation26 Actian Hadoop SQL Performance 0 5 10 15 20 25 30 35 Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98 “Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB) Speedup vs Impala Impala Actian 16x avg. speedup Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks. SpeedupFactor
  • 27. Confidential © 2014 Actian Corporation27 Comprehensive – covers full analytic process: data blending & enrichment, discovery & data science, analytics & operational BI Accessible – standard ANSI SQL to support standard BI tools; plus key advanced analytics including cube, grouping sets and windowing functions Optimized – mature, proven planner and optimizer; optimal use of every node, CPU, memory, and cache Secure – native DBMS security including authentication, user and role-based security, data protection, and encryption Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide failover protection Manageable – resources managed automatically in Hadoop via YARN Consumable – now usable by millions of users with every SQL tool and application on the planet Scalable – unlimited expansion to handle extreme #s of users, nodes, data Most Industrialized SQL in Hadoop
  • 28. Confidential © 2014 Actian Corporation28 Actian Director for Management
  • 29. Confidential © 2014 Actian Corporation29 Actian Analytics Platform – Hadoop SQL Edition Industrialized, High-Performance SQL in Hadoop Only end-to-end analytic processing natively in Hadoop Highest performing, most industrialized SQL in Hadoop Removes all barriers for business access to big data analytics Unleashes millions of business-savvy SQL users on Hadoop data Outperforms Cloudera’s Impala by up to 30x Actian transforms Hadoop from a data lake into a high- performance analytics platform.
  • 30. Confidential © 2014 Actian Corporation30 Transform Hadoop – Transform your Business
  • 31. Confidential © 2014 Actian Corporation31 3 Get started today! www.actian.com/hadoop Pre-register for an evaluation copy of Actian’s SQL in Hadoop bigdata.actian.com/ sql-in-hadoop Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014 bigdata.actian.com/ SandHill- Hadoop- Results 2 1
  • 32. Confidential © 2014 Actian Corporation32 3 Get started today! www.actian.com/hadoop Pre-register for an evaluation copy of Actian’s SQL in Hadoop bigdata.actian.com/ sql-in-hadoop Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014 bigdata.actian.com/ SandHill- Hadoop- Results 2 1

Notas do Editor

  1. But it isn’t easy Changing your company is not easy. Give examples: you’ve just invested $1m in a data warehouse, but business now wants to … It now will cost you 10 fold.
  2. We are announcing Vector on Hadoop - industrial strength sql on hadoop with atom smashing speed never before seen in the industry. This is a core part of our Actian Analytics Platform – Hadoop SQL Edition. Let me tell you about it (details below) and show you a few things. What are we announcing? Highest performing, most industrialized SQL in Hadoop Turns Hadoop into a High-Performance, Fully-Functional Analytics Database Actian Analytics Platform – Hadoop SQL Edition includes our hardened (patented) X100 vector processing engine, combined with Actian’s visual data and analytics work flow, all running natively in Hadoop via YARN How is this unique? Highest performing, most industrialized SQL access to Hadoop data Only end-to-end analytic processing natively in Hadoop (covers the full analytics processes: data blending & enrichment, discovery & data science, analytics & operational BI) Most consumable, accessible, manageable Hadoop analytics What does this mean to our customers? Removes all barriers for business access to big data analytics Unleashes millions of business-savvy, SQL users with no constraints on Hadoop data to improve the accuracy of their analytical predictions and decision-making Accelerates time to value and turns Hadoop data into transformational value: customer delight, competitive advantage, world-class risk management, disruptive business models
  3. I’m going to show you three things: How fast it is, how easy it is to get started and how it can be used in real-world scenarios.
  4. internationalization
  5. 1: We use vectorized processing to exploit modern CPU architecture. We execute one operation at a time on a vector of data, which allows for tight inner code loops without branching. This way, we can use SIMD instructions and, because of the lack of branching, make sure the CPU pipelines are not thrashed. A vector is typically 1024 rows of a single column, so it’s a manageable amount of data while the overhead per row is still negligible. 2: A vector will fit in the CPU cache together with the code for a particular operation, so all execution is in-cache. 3: To feed this engine with enough data, we’re also applying the vectorized paradigm to the storage subsystem. First of all, we’re using a column store, so only relevant columns are read from disk. Data is stored in blocks of typically 512mb and a single block contains only data from a single column (there are exceptions). Blocks of different columns can be interleaved per block, but typically more than one block of the same column is grouped. To keep the stable storage fast and defragmented, we use in-memory overlays to store updates to the data. These overlays are automatically flushed to stable storage when needed. 4: The blocks are stored compressed on-disk. We’ve got a number of lightweight compression algorithms and the most efficient one is chosen per block, depending on the data characteristics. The decompression takes place per vector and can be done in the CPU cache, which neatly ties in with our in-cache execution. We have a buffer manager that predicts what blocks are needed when and makes sure no blocks that will be used in the near future are evicted from the buffer cache. 5: We have min-max indexes on the disk blocks, so when data is not completely random we can narrow down the ranges of blocks we need to read from disk, per column. All in all, the execution engine is able to do about 1.5GB/s per core, and high-end I/O subsystems are able to keep up with this.
  6. Execution Subset of TPC-DS as chosen by Impala Data size is 3TB (SF3000) Executed on 5-node “rushcluster” in Austin Both Impala and Vector numbers are on the same hardware Comparison with Impala Verified that Impala plans are sensible Currently observed average speedup is 11x Optimal query plans (manually written) gives us 16x speedup These are real numbers! We executed manual plans directly Changes in the cost model would get us to this performance Performance improvements Cost model changes will get us to 16x speedup Pipeline of query execution changes Well into H2 Estimated to get us 2x improvement So, estimated speedup vs Impala would be ~30x (no guarantees) Planning to run TPC-H SF1000 and SF3000 With all planned improvements (end of the year) we should be able to beat the EXASOL cluster numbers.
  7. What are we announcing? Actian Analytics Platform – Hadoop SQL Edition, the first offering that turns Hadoop into a fully-functioning analytics platform. This new edition introduces the highest performing, most industrialized SQL in Hadoop, powered by our hardened (patented) X100 vector processing engine, combined with Actian’s visual data and analytics work flow, all running natively in Hadoop via YARN. How is this unique? Provides the only end-to-end analytic processing natively in Hadoop (covers the full analytics processes: data blending & enrichment, discovery & data science, analytics & operational BI) Delivers the highest performing, most industrialized SQL access to Hadoop data Makes the entire analytic process more consumable, easier to access, and easier to manage than on any other What does this mean to our customers? Industrialized SQL in Hadoop removes all barriers for business access to big data analytics Broad SQL access unleashes millions of business-savvy, SQL users with no constraints on Hadoop data to improve the accuracy of their analytical predictions and decision-making Turbocharged Hadoop analytics and SQL in Hadoop accelerates time to value and turns Hadoop data into transformational value: customer delight, competitive advantage, world-class risk management, disruptive business models
  8. We want to partner with you to identify where the most obvious places where big data analytics could be applied to your organization.