SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Unlock Your Data for ML & AI
using Data Virtualization .
Mitesh Shah
Senior Cloud Product Manager
June 20, 2019
2
Source: Gartner 2018, Data Virtualization Market Guide
Through 2022, 60% of all organizations will implement
data virtualization as one key delivery style in their data
integration architecture.
3
Key Challenges for Data Integration
Required expansion of Analytics
by growing consumers of data
Need for Agile
Self-Service BI
Increasing use
of third-party
data for
Information
Agility
Big Data
volumes
continue to
grow
Security and
Data Privacy
implications
becoming core
to data
strategy
Reduce or
eliminate Data
Latency
Providing data access irrespective
of Storage Location
Growth in
Hybrid &
Multi– Cloud
Deployments
Convergence
of Application
and Data
Integration
4
What is Data Virtualization?
Consume
in business applications
Combine
related data into views
Connect
to disparate data sources
2
3
1
DATA CONSUMERS
DISPARATE DATA SOURCES
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Analytical Operational
Less StructuredMore Structured
CONNECT COMBINE PUBLISH
Multiple Protocols,
Formats
Query, Search,
Browse
Request/Reply,
Event Driven
Secure
Delivery
SQL,
MDX
Web
Services
Big Data
APIs
Web Automation
and Indexing
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover, Transform,
Prepare, Improve
Quality, Integrate
Normalized views of
disparate data
“Data virtualization
integrates disparate
data sources in real
time or near-real
time to meet
demands for
analytics and
transactional data.”
– Create a Road Map For A
Real-time, Agile, Self-
Service Data Platform,
Forrester Research, Dec 16,
2015
5
Modern Data Architecture
DATA
VIRTUALIZATION
6
Challenges / Known Facts in Data Management!
✓ The current data landscape is fragmented.
✓ Data Lakes, IoT architectures, SaaS fuel the needs of modern analytics, ML and AI.
✓ Exploring and understanding the data available within your company is a time
consuming task.
✓ Dealing with bureaucracy, different languages and protocols is not easy.
✓ A logical architecture based on a virtualization layer connects the different systems
and exposes them as one, hiding the underlying complexity.
7
Logical Architectures – Brief History
▪ Logical Architectures were first described by Mark Bayer, and analyst from Gartner,
in 2009 to describe the efforts to expand the current data warehouse architectures
▪ Since then, the term “Logical Data Warehouse” has been widely used to present the
natural evolution of analytical architectures
▪ For example, “Adopt the Logical Data Warehouse Architecture to Meet Your Modern
Analytical Needs”. Henry Cook, Gartner April 2018
▪ Other data architectures have also see their logical counterpart:
• Logical Data Marts
• Logical Data Lakes
▪ In all these cases, a virtualization layer is a key component of the architecture
8
Data Lakes
A data lake is a storage repository that holds a vast
amount of raw data in its native format. The data
structure and requirements are not defined until the
data is needed
The current needs for sophisticated
data-driven intelligence and data
science favored this concept for its
simplicity and power
Hadoop and its ecosystem provided
the foundation that data lakes
required: vast storage and processing
muscle
It also favored the concept of ELT vs
ETL: load data first, (maybe)
9
The Promise of Data Lakes
• Consolidate data in a single physical repository
• No more data integration issues
• Users can get the data they need from the
lake
• Store massive amounts of raw, unfiltered data
– maintain structure and fidelity of data
• Using cheap commodity hardware
• 100X cheaper than EDW appliance
• Take advantage of processing power of
Hadoop for data analysis
10
…Data lakes lack semantic consistency and
governed metadata. Meeting the needs of
wider audiences require curated repositories
with governance, semantic consistency and
access controls.”
11
Data Lakes – Not a Perfect World
Physical Nature
▪ Based on Replication. Data Lakes require data to be copied to its physical storage
▪ Replication extends development cycles and costs
▪ Not all data is suitable for replication
▪ Real time needs: Cloud and SaaS APIs
▪ Large volumes: existing EDW
▪ Privacy laws and restrictions
Single Purpose
▪ Usage of the data lake is often monopolized by data scientists
▪ New data silo. No clear path to share insights with business users
▪ Lacks the governance, security and quality that business users are used to (e.g. in the EDW)
12
How Denodo Complement’s Logical Data Lake in Cloud
Denodo Architecture for Logical Data Lake
● Denodo does not substitute data
warehouses, data lakes, ETLs...
● Denodo enables the use of all together
plus other data sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only
difference is in the main objective
● There are also use cases where Denodo
can be used as data source in a ETL flow
13
Data science project characteristics
❑ Bulk of work in data science projects involves integrating many disparate data
sets to create extremely wide data
❑ Data science data requires as many data sets as possible to be integrated in such
a way that the business context aligns with the goals of the project
❑ Data-savvy business analysts are knowledgeable with business systems’ data and
SQL but are not programmers
Extend the Reach of Data Science with Data Virtualization
14
Data Lakes as a Data Scientists Playground
The early data scientists saw Hadoop
as their personal supercomputer.
Hadoop-based Data Lakes helped
democratize access to state of the art
supercomputing with off-the-shelf HW
(and later cloud)
The industry push for BI made
Hadoop–based solutions the standard
to bring modern analytics to any
corporation
15
The Key Ingredient for Data Science is…Data ☺
Data Lakes has acted as a Data Scientists Playground
Input data for a data science project may come in a
variety of systems and formats. Some examples:
• Files (CSV, logs, Parquet)
• Relational databases (EDW, operational systems)
• NoSQL systems (key-value pairs, document stores,
time series, etc.)
• SaaS APIs (Salesforce, Marketo, ServiceNow,
Facebook, Twitter, etc.)
In addition, the Big Data community has also embraced
data science as one of their pillars. For example Spark
and SparkML, and architectural patterns like the Data
Lake
Typical Data Science Workflow
16
Typical Data Science Workflow
80% of time – Finding and preparing the data
10% of time – Analysis
10% of time – Visualizing data
Reduce data prep time by 25% → increase data
analysis by 3X
17
Where Does the Time Go?
A large amount of time and effort goes into tasks not intrinsically related to data
science:
• Finding where the right data may be
• Getting access to the data
• Bureaucracy
• Understand access methods and technology (noSQL, REST APIs, etc.)
• Transforming data into a format easy to work with
• Combining data originally available in different sources and formats
• Profile and cleanse data to eliminate incomplete or inconsistent data points
• Making this ‘data pipeline’ a repeatable, systematic process → Operationalize it
18
Benefits of a Virtual Data Layer
▪ A Virtual Layer improves decision making and shortens development cycles
• Surfaces all company data from multiple repositories without the need to replicate all data
into a lake
• Eliminates data silos: allows for on-demand combination of data from multiple sources
▪ A Virtual Layer broadens usage of data
• Improves governance and metadata management to avoid “data swamps”
• Decouples data source technology. Access normalized via SQL or web services
• Allows controlled access to the data with low grain security controls
▪ A Virtual Layer offers performant access
• Leverages the processing power of the existing sources controlled by Denodo’s optimizer
• Processing of data for sources with no processing capabilities (e.g. files)
• Caching and ingestion engine to persist data when needed
19
Faster Data Science from data refreshes
Machine learning model training, supervised reinforcement, and
unsupervised techniques
▪ Materialize training data from a virtual table that stores its results in another
database for machine learning supervised training
▪ Access real-time data from a virtual table for the latest data to be used in machine
learning reinforcement training
▪ Cache data sets to alleviate performance bottlenecks
20
A Data Catalog and Exploration Tool?
Reporting tools are great to visualize data and
present it to business users.
But there is a gap between the reporting tool and the
data model underneath
How can end users…
• … browse tables through tags and categories ?
• … understand the lineage and definitions of the
fields?
• … search the catalog and its content?
• … validate that data is trustworthy?
21
Data Catalog with Data Access
22
$1.5TRILLION
is the economic value of goods flowing through
our distribution centers each year, representing:
2.8%
of GDP for the 19 countries where
we do business
%2.0
of the World’s GDP
1983 100 GLOBAL 768 MSF
Founded Most sustainable corporations
$87B
Assets under management on four continents
MILLION
employees under Prologis’ roofs
1.0
Prologis - World’s leading industrial real estate company
23
Step 1: Expose Data to Data Scientists
Prologis: Data Science Workflow
DATA
VIRTUALIZATION
Cache
Data Services
Application
Database
EDWCloud Data Lake
24
Step 2: Operationalization of Model Scoring
Prologis: Data Science Workflow
DATA
VIRTUALIZATION
Cache
Web Service
(Python Model Scoring)
AWS Lambda
Application
Database
EDWCloud Data Lake
25
Enterprise Data Services Layer @ Large Mutual Funds Company
• Problem getting consistent data – including key metrics
• Developers ‘hunting down and interpreting data themselves’
• Management decided that they needed consistent data irrespective of channels
• IT tasked with providing consistent data to all users
• Implemented Data Services Layer for all data access
• No direct access to data sources – everything is obtained through Data Virtualization
layer
• Internal reports, web sites, front office/back office apps, IVR system, etc.
26
Enterprise Data Services Layer
Use Cases for Data Virtualization in Data Governance and Security
27
• Use Case 1: Single Source of
Truth to avoid data
inconsistencies, etc.
• Use Case 2: Unified Security
layer with centralized
authorization management and
auditing
• Use Case 3: Data
Catalog/Marketplace
– Single source of truth at CIT (to comply with stringent Basel III risk management regulations)
28
McCormick Spice
29
McCormick Spice (Cont’d)
Data Services
(Data Virtualization)
API Management and Runtime
Semantics & Discovery
Governance
Security
System 1 System n
External
API $
Governance
Security
30
McCormick Spice (Cont’d)
Approach
1. Model requests Specific Modifications/Full Information
2. Model incrementally or fully trains
Algorithms
Backend
Systems
External
Systems
1
Request Enterprise
Data
Services
2 Collect
train
4 3
Receive
Benefits
✓Timely Information
✓No replication
✓No need to validate information
✓Better staging for learning
31
Key Takeaways
▪ A Virtual Data Lake improves decision making and
shortens development cycles
▪ Surfaces all company data without the need to replicate
▪ Eliminates data silos: allows for on-demand data access
▪ A Virtual Data Lake broadens adoption of the lake and
improves its ROI
▪ Improves governance and metadata management (avoid
“data swamps”)
▪ Faster ML models building and Allows controlled access
▪ A Virtual Data Lake offer performance for the Big Data
World
32
Customer Stories
https://www.denodo.com/en/webinar/autodesk-data-virtualization-
core-bi-20-architecture-powered-spark-and-aws
https://www.denodo.com/en/video/case-study/customer-case-study-
schaeffler
We can bring data into the data lake as needed,
for example IoT systems, but we also connect legacy
IT systems or even any server outside of Schaeffler
“ “You check the market and identify new products
that work best for each use case, but your endpoint
doesn’t change, it’s your virtual layer
Dr Jürgun Bohn, Director Data Architecture and Engineering at Schaeffler Kurt Jackson, Platform Architect at Autodesk
33
Try it yourself
Access Denodo Platform in the Cloud!
Take Data Science Test Drive today!
www.denodo.com/TestDrive
GET STARTED TODAY
34
More Resources
▪ “Rethinking the data lake” blog series
▪ http://www.datavirtualizationblog.com/rethinking-data-lake-data-virtualization/
▪ Performance
▪ Optimization and performance are always a key ingredient when dealing with large data
volumes
▪ Denodo offers the most robust and mature data virtualization engine in the market
▪ Cost based optimization
▪ Rule based optimization tailored for federation scenarios
▪ Integrated use of external MPP engines like Spark, Impala, etc.
▪ Designed to perform in big data scenarios with billion-row tables
Thank you!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.
Q&A
37
Query Optimizer
SELECT c.id, SUM(s.amount) as total
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.id
How Denodo works compared with reporting tool federation engines
System Execution Time Data Transferred Optimization Technique
Denodo 9 sec. 4 M Aggregation push-down
Lead Reporting Tool 125 sec. 292 M None: full scan
290 M 2 M
Sales Customer
2 M
2 M
Sales Customer
join
group by join
group by
38
Customer Centricity / MDM
✓ Complete View of Customer
Data Services
✓ Data as a Service
✓ Data Marketplace
✓ Data Services
✓ Application and Data Migration
Cloud Solutions
✓ Cloud Modernization
✓ Cloud Analytics
✓ Hybrid Data Fabric
Data Governance
✓ GRC
✓ GDPR
✓ Data Privacy / Masking
BI and Analytics
✓ Self-Service Analytics
✓ Logical Data Warehouse
✓ Enterprise Data Fabric
Big Data
✓ Logical Data Lake
✓ Data Warehouse Offloading
✓ IoT Analytics
Denodo ‘Solution’ Categories

Mais conteúdo relacionado

Mais procurados

Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Denodo
 

Mais procurados (20)

Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 
Data Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery PlatformData Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery Platform
 
Denodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the CloudDenodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the Cloud
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit DatenvirtualisierungMulti-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit Datenvirtualisierung
 

Semelhante a Unlock Your Data for ML & AI using Data Virtualization

How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
Moacyr Passador
 

Semelhante a Unlock Your Data for ML & AI using Data Virtualization (20)

Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Speak to Your Data
Speak to Your DataSpeak to Your Data
Speak to Your Data
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 

Mais de Denodo

Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 

Mais de Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Último

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Unlock Your Data for ML & AI using Data Virtualization

  • 1. Unlock Your Data for ML & AI using Data Virtualization . Mitesh Shah Senior Cloud Product Manager June 20, 2019
  • 2. 2 Source: Gartner 2018, Data Virtualization Market Guide Through 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture.
  • 3. 3 Key Challenges for Data Integration Required expansion of Analytics by growing consumers of data Need for Agile Self-Service BI Increasing use of third-party data for Information Agility Big Data volumes continue to grow Security and Data Privacy implications becoming core to data strategy Reduce or eliminate Data Latency Providing data access irrespective of Storage Location Growth in Hybrid & Multi– Cloud Deployments Convergence of Application and Data Integration
  • 4. 4 What is Data Virtualization? Consume in business applications Combine related data into views Connect to disparate data sources 2 3 1 DATA CONSUMERS DISPARATE DATA SOURCES Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word... Analytical Operational Less StructuredMore Structured CONNECT COMBINE PUBLISH Multiple Protocols, Formats Query, Search, Browse Request/Reply, Event Driven Secure Delivery SQL, MDX Web Services Big Data APIs Web Automation and Indexing CONNECT COMBINE CONSUME Share, Deliver, Publish, Govern, Collaborate Discover, Transform, Prepare, Improve Quality, Integrate Normalized views of disparate data “Data virtualization integrates disparate data sources in real time or near-real time to meet demands for analytics and transactional data.” – Create a Road Map For A Real-time, Agile, Self- Service Data Platform, Forrester Research, Dec 16, 2015
  • 6. 6 Challenges / Known Facts in Data Management! ✓ The current data landscape is fragmented. ✓ Data Lakes, IoT architectures, SaaS fuel the needs of modern analytics, ML and AI. ✓ Exploring and understanding the data available within your company is a time consuming task. ✓ Dealing with bureaucracy, different languages and protocols is not easy. ✓ A logical architecture based on a virtualization layer connects the different systems and exposes them as one, hiding the underlying complexity.
  • 7. 7 Logical Architectures – Brief History ▪ Logical Architectures were first described by Mark Bayer, and analyst from Gartner, in 2009 to describe the efforts to expand the current data warehouse architectures ▪ Since then, the term “Logical Data Warehouse” has been widely used to present the natural evolution of analytical architectures ▪ For example, “Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018 ▪ Other data architectures have also see their logical counterpart: • Logical Data Marts • Logical Data Lakes ▪ In all these cases, a virtualization layer is a key component of the architecture
  • 8. 8 Data Lakes A data lake is a storage repository that holds a vast amount of raw data in its native format. The data structure and requirements are not defined until the data is needed The current needs for sophisticated data-driven intelligence and data science favored this concept for its simplicity and power Hadoop and its ecosystem provided the foundation that data lakes required: vast storage and processing muscle It also favored the concept of ELT vs ETL: load data first, (maybe)
  • 9. 9 The Promise of Data Lakes • Consolidate data in a single physical repository • No more data integration issues • Users can get the data they need from the lake • Store massive amounts of raw, unfiltered data – maintain structure and fidelity of data • Using cheap commodity hardware • 100X cheaper than EDW appliance • Take advantage of processing power of Hadoop for data analysis
  • 10. 10 …Data lakes lack semantic consistency and governed metadata. Meeting the needs of wider audiences require curated repositories with governance, semantic consistency and access controls.”
  • 11. 11 Data Lakes – Not a Perfect World Physical Nature ▪ Based on Replication. Data Lakes require data to be copied to its physical storage ▪ Replication extends development cycles and costs ▪ Not all data is suitable for replication ▪ Real time needs: Cloud and SaaS APIs ▪ Large volumes: existing EDW ▪ Privacy laws and restrictions Single Purpose ▪ Usage of the data lake is often monopolized by data scientists ▪ New data silo. No clear path to share insights with business users ▪ Lacks the governance, security and quality that business users are used to (e.g. in the EDW)
  • 12. 12 How Denodo Complement’s Logical Data Lake in Cloud Denodo Architecture for Logical Data Lake ● Denodo does not substitute data warehouses, data lakes, ETLs... ● Denodo enables the use of all together plus other data sources ○ In a logical data warehouse ○ In a logical data lake ○ They are very similar, the only difference is in the main objective ● There are also use cases where Denodo can be used as data source in a ETL flow
  • 13. 13 Data science project characteristics ❑ Bulk of work in data science projects involves integrating many disparate data sets to create extremely wide data ❑ Data science data requires as many data sets as possible to be integrated in such a way that the business context aligns with the goals of the project ❑ Data-savvy business analysts are knowledgeable with business systems’ data and SQL but are not programmers Extend the Reach of Data Science with Data Virtualization
  • 14. 14 Data Lakes as a Data Scientists Playground The early data scientists saw Hadoop as their personal supercomputer. Hadoop-based Data Lakes helped democratize access to state of the art supercomputing with off-the-shelf HW (and later cloud) The industry push for BI made Hadoop–based solutions the standard to bring modern analytics to any corporation
  • 15. 15 The Key Ingredient for Data Science is…Data ☺ Data Lakes has acted as a Data Scientists Playground Input data for a data science project may come in a variety of systems and formats. Some examples: • Files (CSV, logs, Parquet) • Relational databases (EDW, operational systems) • NoSQL systems (key-value pairs, document stores, time series, etc.) • SaaS APIs (Salesforce, Marketo, ServiceNow, Facebook, Twitter, etc.) In addition, the Big Data community has also embraced data science as one of their pillars. For example Spark and SparkML, and architectural patterns like the Data Lake Typical Data Science Workflow
  • 16. 16 Typical Data Science Workflow 80% of time – Finding and preparing the data 10% of time – Analysis 10% of time – Visualizing data Reduce data prep time by 25% → increase data analysis by 3X
  • 17. 17 Where Does the Time Go? A large amount of time and effort goes into tasks not intrinsically related to data science: • Finding where the right data may be • Getting access to the data • Bureaucracy • Understand access methods and technology (noSQL, REST APIs, etc.) • Transforming data into a format easy to work with • Combining data originally available in different sources and formats • Profile and cleanse data to eliminate incomplete or inconsistent data points • Making this ‘data pipeline’ a repeatable, systematic process → Operationalize it
  • 18. 18 Benefits of a Virtual Data Layer ▪ A Virtual Layer improves decision making and shortens development cycles • Surfaces all company data from multiple repositories without the need to replicate all data into a lake • Eliminates data silos: allows for on-demand combination of data from multiple sources ▪ A Virtual Layer broadens usage of data • Improves governance and metadata management to avoid “data swamps” • Decouples data source technology. Access normalized via SQL or web services • Allows controlled access to the data with low grain security controls ▪ A Virtual Layer offers performant access • Leverages the processing power of the existing sources controlled by Denodo’s optimizer • Processing of data for sources with no processing capabilities (e.g. files) • Caching and ingestion engine to persist data when needed
  • 19. 19 Faster Data Science from data refreshes Machine learning model training, supervised reinforcement, and unsupervised techniques ▪ Materialize training data from a virtual table that stores its results in another database for machine learning supervised training ▪ Access real-time data from a virtual table for the latest data to be used in machine learning reinforcement training ▪ Cache data sets to alleviate performance bottlenecks
  • 20. 20 A Data Catalog and Exploration Tool? Reporting tools are great to visualize data and present it to business users. But there is a gap between the reporting tool and the data model underneath How can end users… • … browse tables through tags and categories ? • … understand the lineage and definitions of the fields? • … search the catalog and its content? • … validate that data is trustworthy?
  • 21. 21 Data Catalog with Data Access
  • 22. 22 $1.5TRILLION is the economic value of goods flowing through our distribution centers each year, representing: 2.8% of GDP for the 19 countries where we do business %2.0 of the World’s GDP 1983 100 GLOBAL 768 MSF Founded Most sustainable corporations $87B Assets under management on four continents MILLION employees under Prologis’ roofs 1.0 Prologis - World’s leading industrial real estate company
  • 23. 23 Step 1: Expose Data to Data Scientists Prologis: Data Science Workflow DATA VIRTUALIZATION Cache Data Services Application Database EDWCloud Data Lake
  • 24. 24 Step 2: Operationalization of Model Scoring Prologis: Data Science Workflow DATA VIRTUALIZATION Cache Web Service (Python Model Scoring) AWS Lambda Application Database EDWCloud Data Lake
  • 25. 25 Enterprise Data Services Layer @ Large Mutual Funds Company • Problem getting consistent data – including key metrics • Developers ‘hunting down and interpreting data themselves’ • Management decided that they needed consistent data irrespective of channels • IT tasked with providing consistent data to all users • Implemented Data Services Layer for all data access • No direct access to data sources – everything is obtained through Data Virtualization layer • Internal reports, web sites, front office/back office apps, IVR system, etc.
  • 27. Use Cases for Data Virtualization in Data Governance and Security 27 • Use Case 1: Single Source of Truth to avoid data inconsistencies, etc. • Use Case 2: Unified Security layer with centralized authorization management and auditing • Use Case 3: Data Catalog/Marketplace – Single source of truth at CIT (to comply with stringent Basel III risk management regulations)
  • 29. 29 McCormick Spice (Cont’d) Data Services (Data Virtualization) API Management and Runtime Semantics & Discovery Governance Security System 1 System n External API $ Governance Security
  • 30. 30 McCormick Spice (Cont’d) Approach 1. Model requests Specific Modifications/Full Information 2. Model incrementally or fully trains Algorithms Backend Systems External Systems 1 Request Enterprise Data Services 2 Collect train 4 3 Receive Benefits ✓Timely Information ✓No replication ✓No need to validate information ✓Better staging for learning
  • 31. 31 Key Takeaways ▪ A Virtual Data Lake improves decision making and shortens development cycles ▪ Surfaces all company data without the need to replicate ▪ Eliminates data silos: allows for on-demand data access ▪ A Virtual Data Lake broadens adoption of the lake and improves its ROI ▪ Improves governance and metadata management (avoid “data swamps”) ▪ Faster ML models building and Allows controlled access ▪ A Virtual Data Lake offer performance for the Big Data World
  • 32. 32 Customer Stories https://www.denodo.com/en/webinar/autodesk-data-virtualization- core-bi-20-architecture-powered-spark-and-aws https://www.denodo.com/en/video/case-study/customer-case-study- schaeffler We can bring data into the data lake as needed, for example IoT systems, but we also connect legacy IT systems or even any server outside of Schaeffler “ “You check the market and identify new products that work best for each use case, but your endpoint doesn’t change, it’s your virtual layer Dr Jürgun Bohn, Director Data Architecture and Engineering at Schaeffler Kurt Jackson, Platform Architect at Autodesk
  • 33. 33 Try it yourself Access Denodo Platform in the Cloud! Take Data Science Test Drive today! www.denodo.com/TestDrive GET STARTED TODAY
  • 34. 34 More Resources ▪ “Rethinking the data lake” blog series ▪ http://www.datavirtualizationblog.com/rethinking-data-lake-data-virtualization/ ▪ Performance ▪ Optimization and performance are always a key ingredient when dealing with large data volumes ▪ Denodo offers the most robust and mature data virtualization engine in the market ▪ Cost based optimization ▪ Rule based optimization tailored for federation scenarios ▪ Integrated use of external MPP engines like Spark, Impala, etc. ▪ Designed to perform in big data scenarios with billion-row tables
  • 35. Thank you! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
  • 36. Q&A
  • 37. 37 Query Optimizer SELECT c.id, SUM(s.amount) as total FROM customer c JOIN sales s ON c.id = s.customer_id GROUP BY c.id How Denodo works compared with reporting tool federation engines System Execution Time Data Transferred Optimization Technique Denodo 9 sec. 4 M Aggregation push-down Lead Reporting Tool 125 sec. 292 M None: full scan 290 M 2 M Sales Customer 2 M 2 M Sales Customer join group by join group by
  • 38. 38 Customer Centricity / MDM ✓ Complete View of Customer Data Services ✓ Data as a Service ✓ Data Marketplace ✓ Data Services ✓ Application and Data Migration Cloud Solutions ✓ Cloud Modernization ✓ Cloud Analytics ✓ Hybrid Data Fabric Data Governance ✓ GRC ✓ GDPR ✓ Data Privacy / Masking BI and Analytics ✓ Self-Service Analytics ✓ Logical Data Warehouse ✓ Enterprise Data Fabric Big Data ✓ Logical Data Lake ✓ Data Warehouse Offloading ✓ IoT Analytics Denodo ‘Solution’ Categories