In this presentation, executives from Denodo preview the new Denodo Platform 6.0 release that delivers Dynamic Query Optimizer, cloud offering on Amazon Web Services, and self-service data discovery and search. Over 30 analysts, led by Claudia Imhoff, provide input on strategic direction and benefits of Denodo 6.0 to the data virtualization and the broader data integration market.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DR6r3m.
5. HEADQUARTERS
Palo Alto, CA.
DENODO OFFICES, CUSTOMERS, PARTNERS
Global presence throughout North America,
EMEA, APAC, and Latin America.
CUSTOMERS
250+ customers, including many
F500 and G2000 companies across every
major industry have gained significant
business agility and ROI.
LEADERSHIP
Longest continuous focus on data
virtualization and data services.
Product leadership.
Solutions expertise.
5
THE LEADER IN DATA VIRTUALIZATION
Denodo provides agile, high performance data
integration and data abstraction across the broadest
range of enterprise, cloud, big data and unstructured
data sources, and real-time data services at half the
cost of traditional approaches.
6. Award-Winning Data Virtualization Leader
6
Forrester Wave: Enterprise Data Virtualization
Forrester Wave: Enterprise Data Virtualization, Q1 2015
2015 Magic Quadrant
for Data Integration
Tools
2015 Leader in Forrester
Wave: Enterprise Data
Virtualization.
2015 Technology
Innovation Award for
Information
Management
2015 #1 Readers Choice
Awards For Data
Virtualization Platforms
2015 Rank
Companies that
Matters Most in
Data
2015 Big Data 50 –
Companies Driving
Innovation
2015 Leadership
Award in Big Data
For Denodo
Customer Autodesk
Trend-Setting Products in
Data and Information
Management for 2016
2016 Premier 100
Technology Leader
For Denodo Customer
CIT
8. The Business Need
8
Ready Access to Critical Information to Support Business Processes
MarketingSales ExecutiveSupport
Customers
Warranties
Channels
Products
Access to complete
information
Access to related information
Access in real-time
Cross-sell / Up-sell
9. Manually access different
systems
Not productive – slows
down response times
IT responds with point-to-
point data integration
The Challenge
9
Data Is Siloed Across Disparate Systems
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
10. The Solution
10
Data Abstraction Layer
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
10
DATA ABSTRACTION LAYER
12. Benefits of Data Virtualization
12
Data Virtualization
Better Data Integration
Lower integration costs by 80%.
Flexibility to change.
Real-time (on-demand) data services.
Complete Information
Focus on business information needs.
Include web / cloud, big data,
unstructured, streaming.
Bigger volumes, richer/easier access to
data.
Better Business Outcome
Projects in 4-6 weeks.
ROI in <6 months.
Adds new IT and business capabilities
13. Problem Solution Results
Case Study
13
Autodesk Successfully Changes Their
Revenue Model and Transforms Business
Autodesk was changing their business
revenue model from a conventional
perpetual license model to
subscription-based license model.
Inability to deliver high quality data in
a timely manner to business
stakeholders.
Evolution from traditional operational
data warehouse to contemporary
logical data warehouse deemed
necessary for faster speed.
General purpose platform to deliver
data through logical data warehouse.
Denodo Abstraction Layer helps live
invoicing with SAP.
Data virtualization enabled a culture
of “see before you build”.
Successfully transitioned to
subscription-based licensing.
For the first time, Autodesk can do
single point security enforcement and
have uniform data environment for
access.
Autodesk, Inc. is an American multinational software corporation that makes software for the
architecture, engineering, construction, manufacturing, media, and entertainment industries.
15. Accelerate Your Fast Data Strategy
With Denodo Platform 6.0
Dynamic Query
Optimizer
In the Cloud
Self Service Data
Discovery and
Search
Best Real-time
Performance.
Shortest Time to
Data.
Rapid Decision
Making.
16. Accelerate Your Fast Data Strategy with Denodo Platform 6.0
16
New Release of Denodo Platform Delivers Breakthrough Performance, Accelerates Adoption,
and Expedites Business Use of Data
Breakthrough
Performance
Dynamic Query Optimizer
delivers breakthrough performance
for big data, logical data
warehouse, and operational
scenarios
Data Virtualization
In the Cloud
Denodo Platform for AWS
accelerates adoption of data
virtualization
Self-service data discovery,
and search
Self-service data discovery
and search expedites use of
data by business users
17. Dynamic Query Optimizer
17
Delivers Breakthrough Performance for Big Data, Logical Data Warehouse, and
Operational Scenarios
Dynamically determines lowest-cost query execution plan based on
statistics
Factors in all the special characteristics of big data sources such as
number of processing units and partitions
Can easily handle any number of incremental queries
Enables connectivity to the broadest array of big data sources such
as Redshift, Impala, Spark.
Best dynamic query optimization engine.
18. How Dynamic Query Optimizer Works
18
Example: Mining external dimensions with EDW
Total sales by retailer and product during the last month for the brand ACME
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
EDW MDM
SELECT retailer.name,
product.name,
SUM(sales.amount)
FROM
sales JOIN retailer ON
sales.retailer_fk = retailer.id
JOIN product ON sales.product_fk =
product.id
JOIN time ON sales.time_fk = time.id
WHERE time.date < ADDMONTH(NOW(),-1)
AND product.brand = ‘ACME’
GROUP BY product.name, retailer.name
19. How Dynamic Query Optimizer Works
19
Example: Non-optimized
1,000,000,0
00 rows
JOIN
JOIN
JOIN
GROUP BY
product.name,
retailer.name
100 rows 10 rows 30 rows
10,000,000
rows
SELECT
sales.retailer_fk,
sales.product_fk,
sales.time_fk,
sales.amount
FROM sales
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT
product.name,
product.id
FROM product
WHERE
produc.brand =
‘ACME’
SELECT time.date,
time.id
FROM time
WHERE time.date <
add_months(CURRENT_
TIMESTAMP, -1)
20. How Dynamic Query Optimizer Works
20
Step 1: Applies JOIN reordering to maximize delegation
100,000,000
rows
JOIN
JOIN
100 rows 10 rows
10,000,000
rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
sales.amount
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
21. How Dynamic Query Optimizer Works
21
Step 2
100,000
rows
JOIN
JOIN
100 rows 10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
Since the JOIN is on foreign keys
(1-to-many), and the GROUP BY is
on attributes from the dimensions,
it applies the partial aggregation
push down optimization
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
22. How Dynamic Query Optimizer Works
22
Step 3
Selects the right JOIN
strategy based on costs for
data volume estimations
10,000 rows
NESTED
JOIN
HASH
JOIN
100 rows10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
WHERE product.id IN (1,2,…)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
23. How Dynamic Query Optmizer Works
The use of Automatic JOIN reordering groups branches that go to the same source to
maximize query delegation and reduce processing in the DV layer
End users don’t need to worry about the optimal “pairing” of the tables
The Partial Aggregation push-down optimization is key in those scenarios. Based on PK-
FK restrictions, pushes the aggregation (for the PKs) to the DW
Leverages the processing power of the DW, optimized for these aggregations
Reduces significantly the data transferred through the network (from 1 b to 100 k)
The Cost-based Optimizer picks the right JOIN strategies based on estimations on data
volumes, existence of indexes, transfer rates, etc.
Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata)
than for regular databases to take into consideration the different way those systems operate
(distributed data, parallel processing, different aggregation techniques, etc.)
23
Summary
24. How Dynamic Query Optimizer Works
Pruning of unnecessary JOIN branches (based on 1 to + associations) when the
attributes of the 1-side are not projected
Relevant for horizontal partitioning and “fat” semantic models when queries do not need
attributes for all the tables
Unnecessary tables are removed from the query (even for single-source models)
Pruning of UNION branches based on incompatible filters
Enables detection of unnecessary UNION branches in vertical partitioning scenarios
Automatic data movement
Creation of temp tables in one of the systems to enable complete delegation of a federated
branch.
The target source needs to have the “data movement” option enabled for this option to be
taken into account
24
Other relevant optimization techniques
25. Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse
Customer Dimension
2 M rows
Sales Facts
290 M rows
Items Dimension
400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance
of decision support solutions including, but not limited to, Big Data systems.
• Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
• The baseline was set using the same queries with all data in a Netezza appliance
26. Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse
Query Description Returned Rows
Avg. Time
Physical
(Netezza)
Denodo Avg.
Time Logical
Optimization Technique
(automatically chosen)
Total sales by customer 1.99 M 21.0 sec 21. 5 sec Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5.51 M 52.3 sec 59.1 sec Full aggregation push-down
Total sales by item brand 31.4 K 4.7 sec 5.3 sec Partial aggregation push-down
Total sales by item where sale
price less than current list
price
17.1 K 3.5 sec 5.2 sec On the fly data movement
27. Improved Cache Performance
27
Incremental Queries
Merge cached data and changed data to provide fully up-to-date results with minimum latency
Get Leads
changed / added
since 1:00AM
CACHE
Leads updated
at 1:00AM
Up-to-date Leads
data
1. Salesforce ‘Leads’ data
cached in VDP at 1:00
AM
2. Query needing Leads
data arrives at 11:00 AM
3. Only new/changed leads
are retrieved through
the WAN
4. Response is up-to-date
but query is much faster
28. Big Data Connectivity
Big Data and Cloud Databases Connectivity
■ Redshift – enhanced adapter as data source, cache and data movement
target
■ Vertica – enhanced as source, cache and data movement target
■ Apache Spark – enhanced adapter
■ Impala – enhanced as cache and data movement target
28
29. Data Virtualization in the Cloud
29
Accelerate Adoption of Data Virtualization
Ready-to-use and available on AWS Marketplace
Dynamic and elastic infrastructure
Complete with all enterprise-grade features at the lowest cost
Zero set-up requirements
Flexible rent-by-the-hour options
A wide range of capacity options
Only data virtualization platform on AWS.
30. Buying a Subscription
• Customer must have an Amazon AWS account
• Choose configuration required (building block + Amazon VM)
• Building block by ‘sources’ or ‘number of conc. queries & results’
• Click-Through license agreement
• Amazon provides monthly billing based on usage
• Annual subscriptions billed upfront
• Support included in final pricing
30
31. Self-Service Data Discovery and Search
31
Expedite Use of Data by Business Users
Search – Google-like search for data and metadata
Discover – Easy-to-use user interface to browse data and
metadata as well as data lineage
Explore – Ability to view the graphical representation of entities
and relationships
Advanced Query Wizard for users to create ad-hoc queries
Sandbox environment to explore the data before publishing
Data virtualization solution to search data from sources.
36. Managing Very Large Deployments
■ Establish limits on resource usage e.g.
■ Estimated memory, estimated cost, # of concurrent queries, limits to max. execution time and/or
max. # of rows
■ Assigned to user and/or roles
■ Limits can be individual or global e.g.
■ Individual: Each query of a user with role ‘marketing’ cannot use more than 100 MB
■ Global: All concurrent queries from users with role ‘marketing’ cannot use more than 300 MB
■ Possible actions if limits are surpassed:
■ Prevent execution
■ Allow execution with restricted resources
■ Allow execution; cancel if resources limit is surpassed
■ Can be dynamically assigned through custom policies
■ (e.g. assign different plans based on time of day)
36
New Resource Manager
37. Managing Very Large Deployments
Monitor operation of the system, Diagnose
Problems and Analyze Usage Metrics
■ The new tool will also allow ‘after
the fact’ diagnosis of problems
■ Set the time when the problem
occurred and you will see everything
that was happening in an integrated,
graphical manner down to the
individual query level
37
Enhanced Monitoring and Diagnostic Tool
38. Unified Security and Governance
38
Enforcing Security and Governance Policies
■ Kerberos “Southbound” support for databases and Web Services
■ Kerberos pass-through support and Kerberos constrained delegation
■ API for accessing view dependencies information and data lineage
information
39. Agile Development
39
New Admin Tool
■ Multiple tabs and
databases
■ Resize and organize
all panels and dialogs
■ Manages several
open tasks at the
same time
■ VQL highlighting and
autocomplete
features
■ Graphical support for
GIT
41. Denodo 6.0 – Fast Data Strategy Summit
41
March 30 – US; March 31 – EMEA
9:00 Welcome: Fast Data Strategy Summit
Angel Vina, CEO, Denodo
9:30 Analyst Keynote: Accelerating Fast Data Strategy with Data Virtualization
Presenter: Noel Yuhanna, Principal Research Analyst, Forrester Research
10:00 Customer Case Study: Designing Fast Data Architectures with Data Virtualization and Big Data on
Cloud
Presenter: Kurt Jackson, Platform Architect, Autodesk
10:30 Experts Panel: Core Components of Fast Data Strategy – Big Data and Data Virtualization
Panelists: Noel Yuhanna, Principal Research Analyst, Forrester Research
Mark Eaton, Enterprise Architect, Autodesk
Matt Morgan, Vice President, Product and Partner Marketing, Hortonworks
Moderated by: Ravi Shankar, CMO, Denodo
11:00 Use cases: Where does Fast Data Strategy fit within IT Projects
Presenter: Ravi Shankar, CMO, Denodo
12:00 Demo: How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and Operatio
Scenarios
Presenter: Pablo Alvarez, Principal Technical Account Manager, Denodo
12:05 Closing: Fast Data Strategy Summit
Angel Vina, CEO, Denodo
42. Denodo 6.0 – Fast Data Strategy Summit
42
March 30 – US; March 31 – EMEA
Tracks Case Studies Intro to Data Virtualization Technical Deep-Dive
Customer Case Study: SQLization of
Hadoop – Increasing Business Adoption
of Big Data
Chuck DeVries, VP, Strategic Technology
and Enterprise Architecture, Vizient
Intro: Getting Started with Data Virtualization
– What problems DV solves
Richard Walker, VP, Sales, Denodo
Data Science: Expediting Use of Data by
Business Users with Self-service Discovery
and Search
Mark Pritchard, Director, Sales
Engineering
Customer Case Study: Data Services –
Rapid Application Development using
Data Virtualization
Jay Heydt, Manager, Database
Technologies, DrillingInfo
Demo: Getting Started with Data
Virtualization – What problems DV solves
Pablo Alvarez, Principal Technical Account
Manager, Denodo
Data Virtualization Reference
Architectures: Correctly Architecting your
Solutions for Analytical & Operational
Uses
Alberto Bengoa, Sr. Product Manager,
Denodo
Customer Case Study: Data Virtualization
in the Cloud
Avinash Desphande, Big Data and
Advanced Analytics, Logitech
Enabling Fast Data Strategy: What’s new in
Denodo Platform 6.0
Alberto Pan, CTO, Denodo
Data Virtualization Deployments: How to
Manage Very Large Deployments
Juan Lozano, Sales Engineering Manager,
Denodo
Customer Case Study: TBD
TBD
Data Virtualization in the Cloud: Accelerating
Data Virtualization Adoption
Paul Moxon, Sr. Director, Strategic
Technology Office, Denodo
Big Data: Architecture and Performance
Considerations in Logical Data Lakes
Alberto Pan, CTO, Denodo
Customer Case Study: TBD
TBD
Data Virtualization Maturity: Enterprise
Features in Denodo Platform 6.0
Suresh Chandrasekaran, Sr. Vice President,
Denodo
Data Integration Alternatives: When to
use Data Virtualization, ETL, and ESB
Alberto Bengoa, Sr. Product Manager,
Denodo
Customer Case Study: TBD
TBD
Analyst View of Data Virtualization:
Conversations with Boulder Business
Intelligence Brain Trust
Claudia Imhoff, CEO, Intelligent Solutions
Partner Enablement: Architecting and
Deploying Data Virtualization