SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Denodo TechTalks
Product Deep-Dive Series
A product deep-dive, webinar series covering
the critical capabilities of Denodo’s modern
data virtualization
Denodo TechTalks
Product Deep-Dive Series
Leveraging Data Lake Capabilities:
Hybrid Cloud, Summary Views and
MPP Acceleration
Edwin Robbins
Senior Sales Engineer at Denodo
Agenda
Denodo ech al s
rod ct Deep Di e eries
1. Logical Data Fabric Introduction
2. 3 Key Data Fabric Capabilities
• Seamless data access across platforms
• AI-based Query Acceleration Optimization
• MPP Query Acceleration
3. Demonstrations
4. Q&A
Logical Data Fabric Introduction
5
Denodo ech al s
rod ct Deep Di e eries
Logical Data Fabric is the evolution of Denodo’s Data Virtualization Logical Architecture
Logical Data Fabric
Demystifying the Data Fabric,
September 2020
The core of the matter is being
able to consolidate many
diverse data sources in an
efficient manner by allowing
trusted data to be delivered
from all relevant data sources
to all relevant data consumers
through one common layer.
3 Key Data Fabric Capabilities
7
Denodo ech al s
rod ct Deep Di e eries
A Logical Data Fabric
▪ Pillar 1 - Integrates data across hybrid environments
▪ Pillar 2 - Automates manual tasks using augmented intelligence
▪ Pillar 3 - Boosts performance of analytics with rapid data delivery
▪ Pillar 4 - Supports data discovery and data science initiatives
▪ Pillar 5 - Analyzes across data at rest and data in motion
▪ Pillar 6 - Catalogs all data for discovery, lineage, and associations
https://www.denodo.com/en/document/analyst-report/tdwi-checklist-report-
six-critical-capabilities-logical-data-fabric - May 2020
8
1. Source Abstraction
What’s the impact of a new
marketing campaign for each
country?
▪ Historical sales data offloaded to
Hadoop cluster (Presto) for cheaper
storage
▪ Marketing campaigns managed in an
external SaaS cloud app that returns
data in JSON format
▪ Country is part of the customer table
stored in the Oracle DW Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
join
group and sum
join
Sales
(2.8 million rows)
Campaign
(300 rows)
Customer
(100,000 rows)
Data Catalog
Virtual Table (View)
Role Based Security
& Masking
Push Down
Optimization
& Caching
SaaS
App
Data Services
9
2. Automatic Recommendations for Smart Query Acceleration
Denodo v8 uses Artificial Intelligence to
automatically recommend selective
materialization of datasets (summaries) to
increase performance
• AI Algorithms using Active Metadata
• Usage history from Denodo monitor
• Data profiling statistics of sources
• Cost simulations to generate summary
recommendations
• Optimization
• Generic enough to cover multiple queries
• Specific enough to keep it small and fast
10
2. Query Acceleration with Aggregate Awareness Example
• TPCS-DS data:
• Distributed in 3 different systems
• Tables with hundreds of millions of rows
• Summary: total sales by store id, sold_date_id
Query
Execution Time
(no acceleration)
Execution Time
(acceleration)
Performance Gain Summary used
Total sales by year 15.45 s 2.38 s 6.5 x summary_total_by_store_day
Total sales by quarter,
store name and city
22.49 s 2.62 s 8.57 x summary_total_by_store_day
Total sales by store and
City for last quarter
14.71 s 0.47 s 31.1 x summary_total_by_store_day
Total sales in a specific store 14.36 s 2.66 s 5.39 x summary_total_by_store_day
Total sales in a specific store
and year
14.32 s 3.18 s 4.0 x summary_total_by_store_day
11
3. Denodo and MPPs
• Currently, Denodo can integrate with a variety of MPP Data Lake engines
• Hive, Impala, Presto, SparkSQL, Athena and Databricks/Delta Lake
• The integration with this systems covers multiple capabilities
• As a data source
• As the storage for Denodo’s cache
• As an additional external execution engine
• Denodo can move data on-the-fly from other sources to the MPP for
execution
• Controlled by the Cost Based Optimizer, can also be used by manual hints
• As a target for replication pipelines (remote tables)
• Run a query in Denodo and put the results in the data lake
• If source and target are the same (e.g. data lake), data is processed
directly instead of coming to Denodo first
• Better support for incremental updates in data lake engines
12
Denodo ech al s
rod ct Deep Di e eries
join
Group by ZIP
join
Group by ZIP
3. Massive Parallel Processing: Example
2M rows
(sales by customer)
Customer
(2M rows)
System Execution Time Optimization Techniques
Others ~ 10 min Basic
No MPP 43 sec Aggregation push-down
With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes)
Sales
(300 million rows)
join
Group by ZIP
1. Partial Aggregation
push down
Maximizes source processing
Reduces network traffic
3. On-demand data transfer
For SQL-on-Hadoop systems,
Denodo automatically generates
and upload Parquet files
4. Integration with local
and pre-cached data
The engine detects when data
Is cached or a is native table
in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particular operations,
the CBO can decide to move all or part
Of the execution tree to the MPP
5. Fast parallel execution
Support for Spark, Presto and Impala
For fast analytical processing in
inexpensive Hadoop-based solutions
With MPP Integration
group by
customer ID
13
Denodo ech al s
rod ct Deep Di e eries
3. Future: Embedded MPP execution engine
▪ Customer with existing data lakes are encourage to use
their existing ones with equivalent capabilities
▪ However, an embedded engine has some advantages
▪ High performant MPP queries over data in distributed
filesystems without the need of additional software
▪ Out-of-the-box MPP options for caching and acceleration
capabilities
▪ Efficient integrated store for large volumes of active metadata
/ query history to enable upcoming AI capabilities
▪ Integrated security, deployment configuration and
management
14
Denodo ech al s
rod ct Deep Di e eries
3. MPP Integration Timeline
1. Preliminary phase (Q2 2020)
▪ Evaluate and choose engine: PrestoSQL
2. Phase I (Q3 2020)
▪ Scripts / templates to automatically create and manage a cluster
running on Kubernetes (on-prem or cloud with EKS/AKS)
3. Phase II (Q1 2021)
▪ Automatic metadata integration. Files in distributed file system are
automatically accessible from Denodo
▪ Automatically create presto Tables and Denodo Base Views from
path to parquet Files.
▪ To access datasets from Presto, data files from AWS S3 (or Azure
Blob Storage, or Azure Data Lake, etc..) are mapped to tables in the
Hive Metastore.
4. Phase III
▪ Tighter UI integration (explore filesystem graphically)
5. Final Phase (Denodo 9)
▪ Deployment, management and monitoring is fully integrated in
Denodo's Solution Manager
This integration will be done using a phased approach, so that many of this capabilities can be released periodically
before the next major version
Demonstration
Q&A
Denodo ech al s
rod ct Deep Di e eries
17
Next Steps
Get Started Today
Denodo Standard Free Trial
Try Denodo Standard free for 30 days
on your choice of cloud environment
denodo.com/free-trials
18
19
Denodo ech al s
rod ct Deep Di e eries
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

Mais conteúdo relacionado

Mais procurados

The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
Denodo
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Denodo
 
Analyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
Analyst Webinar: Enabling a Customer Data Platform Using Data VirtualizationAnalyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
Analyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
Denodo
 

Mais procurados (20)

Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
Logical Data Fabric: An Introduction
Logical Data Fabric: An IntroductionLogical Data Fabric: An Introduction
Logical Data Fabric: An Introduction
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
 
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Analyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
Analyst Webinar: Enabling a Customer Data Platform Using Data VirtualizationAnalyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
Analyst Webinar: Enabling a Customer Data Platform Using Data Virtualization
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
 
Self Service Analytics enabled by Data Virtualization from Denodo
Self Service Analytics enabled by Data Virtualization from DenodoSelf Service Analytics enabled by Data Virtualization from Denodo
Self Service Analytics enabled by Data Virtualization from Denodo
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Data Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery PlatformData Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery Platform
 
Simplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data VirtualizationSimplifying Cloud Architectures with Data Virtualization
Simplifying Cloud Architectures with Data Virtualization
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 

Semelhante a Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration

Semelhante a Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration (20)

Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 

Mais de Denodo

Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 

Mais de Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Último

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 

Último (20)

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration

  • 1. Denodo TechTalks Product Deep-Dive Series A product deep-dive, webinar series covering the critical capabilities of Denodo’s modern data virtualization
  • 2. Denodo TechTalks Product Deep-Dive Series Leveraging Data Lake Capabilities: Hybrid Cloud, Summary Views and MPP Acceleration Edwin Robbins Senior Sales Engineer at Denodo
  • 3. Agenda Denodo ech al s rod ct Deep Di e eries 1. Logical Data Fabric Introduction 2. 3 Key Data Fabric Capabilities • Seamless data access across platforms • AI-based Query Acceleration Optimization • MPP Query Acceleration 3. Demonstrations 4. Q&A
  • 4. Logical Data Fabric Introduction
  • 5. 5 Denodo ech al s rod ct Deep Di e eries Logical Data Fabric is the evolution of Denodo’s Data Virtualization Logical Architecture Logical Data Fabric Demystifying the Data Fabric, September 2020 The core of the matter is being able to consolidate many diverse data sources in an efficient manner by allowing trusted data to be delivered from all relevant data sources to all relevant data consumers through one common layer.
  • 6. 3 Key Data Fabric Capabilities
  • 7. 7 Denodo ech al s rod ct Deep Di e eries A Logical Data Fabric ▪ Pillar 1 - Integrates data across hybrid environments ▪ Pillar 2 - Automates manual tasks using augmented intelligence ▪ Pillar 3 - Boosts performance of analytics with rapid data delivery ▪ Pillar 4 - Supports data discovery and data science initiatives ▪ Pillar 5 - Analyzes across data at rest and data in motion ▪ Pillar 6 - Catalogs all data for discovery, lineage, and associations https://www.denodo.com/en/document/analyst-report/tdwi-checklist-report- six-critical-capabilities-logical-data-fabric - May 2020
  • 8. 8 1. Source Abstraction What’s the impact of a new marketing campaign for each country? ▪ Historical sales data offloaded to Hadoop cluster (Presto) for cheaper storage ▪ Marketing campaigns managed in an external SaaS cloud app that returns data in JSON format ▪ Country is part of the customer table stored in the Oracle DW Sources Combine, Transform & Integrate Consume Base View Source Abstraction join group and sum join Sales (2.8 million rows) Campaign (300 rows) Customer (100,000 rows) Data Catalog Virtual Table (View) Role Based Security & Masking Push Down Optimization & Caching SaaS App Data Services
  • 9. 9 2. Automatic Recommendations for Smart Query Acceleration Denodo v8 uses Artificial Intelligence to automatically recommend selective materialization of datasets (summaries) to increase performance • AI Algorithms using Active Metadata • Usage history from Denodo monitor • Data profiling statistics of sources • Cost simulations to generate summary recommendations • Optimization • Generic enough to cover multiple queries • Specific enough to keep it small and fast
  • 10. 10 2. Query Acceleration with Aggregate Awareness Example • TPCS-DS data: • Distributed in 3 different systems • Tables with hundreds of millions of rows • Summary: total sales by store id, sold_date_id Query Execution Time (no acceleration) Execution Time (acceleration) Performance Gain Summary used Total sales by year 15.45 s 2.38 s 6.5 x summary_total_by_store_day Total sales by quarter, store name and city 22.49 s 2.62 s 8.57 x summary_total_by_store_day Total sales by store and City for last quarter 14.71 s 0.47 s 31.1 x summary_total_by_store_day Total sales in a specific store 14.36 s 2.66 s 5.39 x summary_total_by_store_day Total sales in a specific store and year 14.32 s 3.18 s 4.0 x summary_total_by_store_day
  • 11. 11 3. Denodo and MPPs • Currently, Denodo can integrate with a variety of MPP Data Lake engines • Hive, Impala, Presto, SparkSQL, Athena and Databricks/Delta Lake • The integration with this systems covers multiple capabilities • As a data source • As the storage for Denodo’s cache • As an additional external execution engine • Denodo can move data on-the-fly from other sources to the MPP for execution • Controlled by the Cost Based Optimizer, can also be used by manual hints • As a target for replication pipelines (remote tables) • Run a query in Denodo and put the results in the data lake • If source and target are the same (e.g. data lake), data is processed directly instead of coming to Denodo first • Better support for incremental updates in data lake engines
  • 12. 12 Denodo ech al s rod ct Deep Di e eries join Group by ZIP join Group by ZIP 3. Massive Parallel Processing: Example 2M rows (sales by customer) Customer (2M rows) System Execution Time Optimization Techniques Others ~ 10 min Basic No MPP 43 sec Aggregation push-down With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes) Sales (300 million rows) join Group by ZIP 1. Partial Aggregation push down Maximizes source processing Reduces network traffic 3. On-demand data transfer For SQL-on-Hadoop systems, Denodo automatically generates and upload Parquet files 4. Integration with local and pre-cached data The engine detects when data Is cached or a is native table in the MPP 2. Integrated with Cost Based Optimizer Based on data volume estimation and the cost of these particular operations, the CBO can decide to move all or part Of the execution tree to the MPP 5. Fast parallel execution Support for Spark, Presto and Impala For fast analytical processing in inexpensive Hadoop-based solutions With MPP Integration group by customer ID
  • 13. 13 Denodo ech al s rod ct Deep Di e eries 3. Future: Embedded MPP execution engine ▪ Customer with existing data lakes are encourage to use their existing ones with equivalent capabilities ▪ However, an embedded engine has some advantages ▪ High performant MPP queries over data in distributed filesystems without the need of additional software ▪ Out-of-the-box MPP options for caching and acceleration capabilities ▪ Efficient integrated store for large volumes of active metadata / query history to enable upcoming AI capabilities ▪ Integrated security, deployment configuration and management
  • 14. 14 Denodo ech al s rod ct Deep Di e eries 3. MPP Integration Timeline 1. Preliminary phase (Q2 2020) ▪ Evaluate and choose engine: PrestoSQL 2. Phase I (Q3 2020) ▪ Scripts / templates to automatically create and manage a cluster running on Kubernetes (on-prem or cloud with EKS/AKS) 3. Phase II (Q1 2021) ▪ Automatic metadata integration. Files in distributed file system are automatically accessible from Denodo ▪ Automatically create presto Tables and Denodo Base Views from path to parquet Files. ▪ To access datasets from Presto, data files from AWS S3 (or Azure Blob Storage, or Azure Data Lake, etc..) are mapped to tables in the Hive Metastore. 4. Phase III ▪ Tighter UI integration (explore filesystem graphically) 5. Final Phase (Denodo 9) ▪ Deployment, management and monitoring is fully integrated in Denodo's Solution Manager This integration will be done using a phased approach, so that many of this capabilities can be released periodically before the next major version
  • 16. Q&A Denodo ech al s rod ct Deep Di e eries
  • 17. 17 Next Steps Get Started Today Denodo Standard Free Trial Try Denodo Standard free for 30 days on your choice of cloud environment denodo.com/free-trials
  • 18. 18
  • 19. 19 Denodo ech al s rod ct Deep Di e eries
  • 20. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.