2. Agenda
1. Data Virtualization: An Introduction
2. Data Virtualization Platforms – Key Capabilities
through Customer Success stories @BoW, @Intel and
@Asurion
3. Demo c3: Connect, Combine and Consume
3. 3
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America.
LEADERSHIP
Longest continuous focus on data
virtualization – since 1999
Leader in 2017 Forrester Wave –
Enterprise Data Virtualization
Winner of numerous awards
CUSTOMERS
~500 customers, including many F500 and
G2000 companies across every major industry
have gained significant business agility and ROI.
FINANCIALS
Backed by $4B+ private equity firm.
50+% annual growth; Profitable.
4. Data Virtualization – An Introduction
Why Data Virtualization? Challenges, Solution and Benefits
5. 5
IT & Business data Dilemma
IT focuses on Data Collection,
Storage & Security
Biz focuses on data Consumption,
Analysis & strategic decisions
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
Web/Social Analytics &
Reporting
Enterprise Apps
Digital
Transformation
Risks &
Compliance
Decision
making
Data
Science
M & A
Enterprise APIs
6. 6
IT & Business data Dilemma (IT Spaghetti Architecture)
IT focuses on Data Collection,
Storage & Security
Biz focuses on data Consumption,
Analysis & strategic decisions
Decision
making
Enterprise APIs
Billing System
Cloud/SaaS
CRM
PLM
Product Data
System UsageInventory System
Product Catalog
Customer Voice
Web/Social Analytics &
Reporting
Enterprise Apps
Digital
Transformation
M & A
Risks &
Compliance
Data
Science
7. 7
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
8. 8
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
Can you
retrieve all
the data
No one data
container
solution for
all
Can you
consume all
the data
No unique
way for
consuming
data
9. 9
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
So much replication
(security, data, license, storage,
hardware, €, etc.)
So much replication
(security, data, license, storage,
hardware, €, etc.)
So much replication
(security, data, license, storage,
hardware, €, etc.)
So much replication
(security, data, license, storage,
hardware, €, etc.)
75% data stored not used
90% request need current data2016
10. 10
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
Not Time-to-Market Oriented
Between request & consumption, the unit is couple of months, limited agility &
devOps readiness limited value to business
11. 11
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
Innovation & digital initiatives (MVPs)
(APIs 1st, independent data, fast & agile products)
12. 12
IT & Business data Dilemma But…
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Big Data
Data Governance
Who can access what, where, filtering, hiding, encrypting, compliances, lineage, origin, transformation, auditability (IT & Biz)
13. 13
IT & Business need…
IT focuses on Data Collection,
Management & Security
Analytics &
Reporting
Enterprise Apps
Customer
Support
Digital
Transformation
M & A
Revenue
Collection
Decision
making
Risks &
Compliance
Biz focuses on data Consumption,
Analysis & strategic decisions
Product Data
Billing System
Cloud/SaaS
System UsageInventory System
CRM
Product Catalog
Customer Voice
PLM
Big Data
Governed & Secured
Enterprise
Data Delivery
Platform
with best Time-to-Market…
…& Cost effective
14. Data Virtualization – Key Capabilities
Customer Success stories @BoW, @Intel and @Asurion
15. 15
Essential Capabilities of Data Virtualization
1. Data abstraction – ease of use & standardization
2. Zero replication – on-demand & ease of usage
3. Performance – intelligent algorithms
4. Data Catalog – self-service & search
5. Multi-location – ease of deployment
6. Governance, security & compliances
16. -Michael Norton, VP Data Architecture, Bank of the West
With Denodo, we saw about 30 to 40% increase in
our ability to deliver projects to our consumers.“
25. 25
Challenges
• Intel’s data is globally distributed across
heterogeneous tools & technologies
• New data sources (ex: big data) & consumers (ex:
emergence of SaaS)
• New information exchange channels (ex: mobility)
• Web Services and API Management
• M&A
• BI users want fresh easily accessible data
26. 26
Results
• Intel’s data is globally distributed across
heterogeneous tools & technologies
• New data sources (ex: big data) & consumers (ex:
emergence of SaaS)
• New information exchange channels (ex: mobility)
• Web Services and API Management
• M&A
• BI users want fresh easily accessible data
27. -Larry Dawson, Enterprise Architect, Asurion
Our Denodo rollout was one of the easiest and most
successful rollouts of critical enterprise software I have seen.”
28. 28
Challenges
• Need for Big Data and Predictive Analytics
• Move to Cloud
• International Initiative + Personal Identifiable
Information = Geographic client based constraints
Asurion Premium
Support Services
29. 29
Solution
• Need for Big Data and Predictive Analytics
• Move to Cloud
• International Initiative + Personal Identifiable
Information = Geographic client based constraints
Governed & Secured
Enterprise
Data Delivery
Platform
with best Time-to-Market…
…& Cost effective
Unique Entreprise Data Access Layer
Hybrid environment: Denodo on
Premise and Denodo on Cloud
Fine grained authorization: row,
column, encryption
Asurion Premium
Support Services
30. 30
Solution
• Need for Big Data and Predictive Analytics
• Move to Cloud
• International Initiative + Personal Identifiable
Information = Geographic client based constraints
Governed & Secured
Enterprise
Data Delivery
Platform
with best Time-to-Market…
…& Cost effective
Unique Entreprise Data Access Layer
Hybrid environment: Denodo on
Premise and Denodo on Cloud
Fine grained authorization: row,
column, encryption
APIs: expose JSON Data
Asurion Premium
Support Services
31. 31
Solution
• Need for Big Data and Predictive Analytics
• Move to Cloud
• International Initiative + Personal Identifiable
Information = Geographic client based constraints
Governed & Secured
Enterprise
Data Delivery
Platform
with best Time-to-Market…
…& Cost effective
Unique Entreprise Data Access Layer
Hybrid environment: Denodo on
Premise and Denodo on Cloud
Fine grained authorization: row,
column, encryption
APIs: expose JSON Data
Performance: Push-down
optimization
Asurion Premium
Support Services
33. 33
Scenario
What’s the impact of a new
marketing campaign for each
country?
Historical sales data offloaded to
Hadoop cluster for cheaper storage
Marketing campaigns managed in an
external cloud app
Country is part of the customer
details table, stored in the Oracle
DW
Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
join
group by state
join
Historical Sales Campaign Customer
34. 34
Performance Benchmark : no cache, no MPP
Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following
scenario that compares the performance of a federated approach in Denodo with an MPP system where all the
data has been replicated via ETL
Benchmarks: Federating large data sets
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of
decision support solutions including, but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
35. 35
Performance Benchmark : no cache, no MPP
Query Description Returned Rows Netezza Time
Denodo Time (Federated Oracle,
Netezza & SQL Server)
Denodo Optimization Technique
(automatically selected)
Total sales by customer 1.99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and year between
2000 and 2004
5.51 M 52.3 sec. 59.0 sec. Full aggregation push-down
Total sales by item brand 31.35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where sale price less
than current list price
17.05 K 3.5 sec. 5.2 sec. On the fly data movement
Benchmarks: Federating large data sets
Execution times are comparable with single source executions based only on automatic
optimizer decisions
36. 36
Denodo mentioned on
“We were very impressed when our DV
specialists were able to set up the
connections to the data sources, the data
hashing, as well as the services exposed
as an API for our Blockchain engine, all
with tight security and in much less time
than we thought would be needed.”
http://www.itone.lu/actualites/blockchain-and-data-virtualisation-european-commission
37. Immediate Access
Higher Impact
More Agile
3-10x
Faster
Lower TCO
Up to 75%
savings
“
”
Through 2020, 50% of
enterprises will implement
some form of data
virtualization as one
enterprise production
option for data integration.
Gartner’s Guide to Data
Virtualization, Dec. 2017
Notas do Editor
Thank you Lydie.
Good morning ladies and gentlemen, and welcome to our session on Data Virtualization. My name is Sihem Merah, I work as a Senior Sales engineer with Denodo, and I am honored to be presenting this session to you, with the collaboration of our partner Oktopus Consulting.
This is the agenda that we are going to cover today.
First of all, we will briefly look at an introduction to data virtualization: what is it? How does it work? How is it different from your traditional approaches to data integration?
Then we’ll look at some key capabilities that you should look for in a DV platform, and we will do this through some enterprise implementations at some of our major customers like Asurion, Bank of the West and Intel
Finally, we will show you Denodo in action with a demo on connectivity, building views and consuming them
But before that, a little bit more about Denodo.
In 1999, a fellow from Galicia, located in North West Spain, Angel Vina, had a vision. Early on, he saw that modern data integration was headed toward pervasive, real-time data access, and he built the best data virtualization solution on the market. Now data virtualization has tremendous momentum, and Denodo is a big success. We have over 500 happy customers around the globe, served by 300 Denodians, mainly in Europe. Even though we now are headquartered in Palo alto where we managed to lift a mere 150 millions dollars fund.
Most of our customers are Fortune 500 and Global 2000 companies, across all industries, including FSI.
So, let’s get started with an introduction to DV:
What it is?
Why is it different from your traditionnal data integration ?
If you think about the problems of data integration today it would look like this:
On one side, we have the data and data management: how is data stored? In what form? Applications running in our Data Centers? Or do we have data in the Cloud, such as CRM.
and on the right: how is that data used from the business point of view?
And the whole dilemma is that we are still witnessing today is : how I can I expose data to my business units?
Usually first approaches look like something like this : organizations implement scripts, feeds, we extract data, transform it and replicate it, sometimes generating redundancy with the objective of making data available to end users
Obviously, all of this can be enhanced, standardized and industrialised but organizations still face two majors problems:
Data sources and data volumes keep on increasing (Forbes predicted a 500% growth in Data and Device Avalanche by 2020)
And secondly, the business is speeding up. PWC predicted that to remain competitive, Business Decision Speed and Analysis requires a 300% increase.
Business needs to be able to consume all enterprise data that is necessary to remain efficient and competitive
And IT needs to be able to expose all that data in a simple and agile manner all the while minimizing risks.
As applications are envisioned and delivered throughout any organization, the desire to use data replication as a "quick-fix" integration strategy often wins favor. This path of least resistance breeds redundancy and inconsistency.
This would be fine, if that replicated data was used. But according to TDWI, 75% of the data stored is not used and 90% of the requests coming from the business concerns current data.
Another question arises: how fast am I delivering data to my users? Most organizations can go anywhere from a few weeks to several months: this is not Time-to-Market Oriented.
Innovation and digital initiatives need to be fast and agile today, the way of working is changing with the likes of agile methodology and implementations in one or two weeks sprints and data cannot be the bottleneck here.
And finally, what about Data Governance? Data is distributed across different systems, data is replicated but we sill need to know how did I get to that specific record I am accessing. Where is the data coming from? What transformations has it gone through? Who is accessing it and how? Am I protecting sensitive data, like Personal Information, the way I should?
What organizations need is a Data Delivery platform that is fast and efficient. Data Virtualization will sit between all your data and everything and everyone that consumes that data. Denodo Accesses, Integrates and Delivers Data 10x Faster and 10x Cheaper than Any Other Middleware Solution by moving from a physical data model to a logical data model
some key capabilities that you should look for in a DV platform, and we will do this through some enterprise implementations at some of our major customers like Asurion, Bank of the West and Intel
As Rules?
« Rule number 2 …. I like this one »
Zero replication: we are not moving data around, we are just doing a logical mapping
BoW owned by BNPP , HQ in San Francisco, 10 k employees and 600 branches, manage $80.7 Billion in Assets
+ strong digital and online presence
Lines of Business include Retail Banking, Personal Finance, Commercial Banking, and Wealth Management
Standard enterprise data environment for a bank: classic EDW design
Accelerated growth from 2012 to 2018 : size grown explosively in size, 560TB now and we expect to double that before the end of 2018
To access data: data marts, and publish a lot of data on ESB for real-time
Emerging Big Data environment: initiated a big data approach
But They found that as the data warehouse grew in complexity, the ability to pull data out through standard sql became more and more challenging for the analysts community.
They had a regulatory and compliance project at the beginning of 2017. For this project, if they implemented it using their standard ETL processes, they would not have been able to meet the deadline in order to reproduce the required output.
For migrations, when they know that either a source or a destination of data is going to be changing, because of an upgrade or a replacement, they had to think of another solution than have to unwire and rewire a tightly coupled integration which would impact business.
Big Data coming up, how to get around it without impacting the business?
Denodo to the rescue! With BoW, Denodo usage has come on very quickly, within a year pf purchase, they implemented it in over half a dozen projects.
First thing: solve the complex data warehouse problem to accelerate data usage from the analysts community by publishing a Semantic layer powered by Denodo.
Denodo allowed them to very quickly simplify what they were doing in exposing a much simpler semantic layer to end users so that they can use the building blocks to very quickly assemble the data extracts they needed to do.
They have also used it to implement a tighter security and access management protocol across the data within the data warehouse.
They had a regulatory and compliance project at the beginning of 2017. For this project, if they implemented it using their standard ETL processes, they would not have been able to meet the deadline in order to reproduce the required output. So instead, they used Denodo to allow the development staff and the analysts to more quickly put how that data would be transformed and consumed by the consumer system so it’s really accelerated. In that case, they saw about 30 to 40% increase in their ability to deliver projects to our consumers.
Denodo became a Development Accelerator.
They have used it in multiple occasions to do what they call a ‘Lift and shift’ = that’s when they know that either a source or a destination of data is going to be changing, because of an upgrade or a replacement. And rather than have to unwire and rewire a tightly coupled integration, they used Denodo to isolate the source systems from the consumer systems so that they could literally lift the access off, shift the underlying system and then replace it with the new system. So they have done quite a number of lift and shifts.
They are moving towards a big data environment and have done a couple of prototypes and a couple of POCs and had some fantastic results so far and we are moving things into production.
As we move towards a data lake and a data hub, we are benefitting from wat we have learnt.
We learnt for exemple, quickly and painfully, that movig data is expensive. We pay for it in doing our development, we pay for it in doing the analysis and figuring how we are going to move that data, we pay for it in storage in duplicating that data, and then most impottantly we actually pay for it in maintenance. So as we continue to move things forward, we are still looking back and revise what we have done in order to accomodate new products, new systems and so on.
So if we can minimise the data movement tthat we do, we can maximise ROI in our data projects and really make data a more valuable asset to the bank.
As we move forward, we are seeing that more and more applications that we wish to run in our big data environment whether its analytics or simple storage facilities on that are going to bemarried with the existing relational database systems
For exemple we have our customer master sitting in a relational database and we know that much of the analytics we want to do around our customers behaviour is going ot reside in a big data environment.
So marrying that relational customer data to the big data that we have was a concern for us as we looked forward. Our concern was around how we are going to parallelise the processing that was going on, how we are going to scale what we were doing. In particular because we really thought using Denodo to publish and make available that semantic layer out of our big data environment.
Coincidentally, roadmap for denodo 7.0 aligns very well with the roadmap that we have in place
And finally, one of the most exciting things they have done in the last year was to implement the service bus exposure for our datawarehouse using Denodo. Previously they were writing sql stored proc and populating information to the service bus. With Denodo, they found that it’s much easier and much more efficient to use Denodo to publish data onto the service bus.
Big corporation
6300 IT employees = 6300 different opinions
Serve 104 000 people around the world
61 different data centers = 61 different geo points to source data from
160k possible data sources, not including cloud and saas ecocystems
Growing 25% YoY and Denodo is really helping there
Denodo gives Intel the flexibility to be able to deliver to all on those different channels
Larry Dawson, Enterprise Architect at Asurion
Let’s talk about Asurion’s journey with DV, in particular in terms of security
A bit about Asurion first: they have about 290 millions customers globally, but mostly in the US. For over 20 years, they have been supplying telecommunications industry and retailers insurance and extended warrantees.
As they expanded, they started to move more globally and one of the effects this has had on their data infrastructure is that it has become more difficult to manage as a single set of siloes.
One of their, back then, new product sets is called PSS which extends beyond insurance and extended warrantee. Some of these fonctionalities can be solved with traditional data support and others are a lot more analytical, so a strong need for predictive analytics, based on bigdata. That was a big push towards the cloud for them. So in order to innovate fast in this space, they needed a place where they can go and spin up new machines, spin up new products very quickly, like within the week (within days instead of weeks).
When you start to look these innovations, you’re changing the way you’re doing analytics.
They also had this move internationnaly, and everytime they moved to a different country they were hitting a different set of data rules. For example they have constraints that are geographic, client based rules. Multiple constraints in a single cloud structure: how did they implement these different constraints?
And they have PII, they can’t afford to have their customers Personally Identifiable information to get outside of their environment or even be spread to groups that don’t have the correct access. So they needed some way to provide those security constraints
More over all of their data is encrypted and for PII data, they needed column level encryption: some people are allowed to see PII data at particular geographies, some people are not, some are not even allowed to see the encrypted values. So they have a lot of competing rules that kind of define that their data infrastructure needs to do.
And this would be fine if they a static data structure but when you look on the right, they have a big data environment where data scientists have this thirst to use the next big thing: hive, presto, spark, etc.
So they were to take tose security constraints and implement it for every single source that arises, they would not be agile, it would be a multi-month process to bring this new product in, make sure that it supports their active directory, row-based autorisation and actually implement it
Small grained authorization: column based, row based, control the encryotion and decryption of data. Instead of a many to many, a 1 to many: all users stream through the DV layer
Voltage encryption; their standard for encryption is hp voltage
Integrate their metadata repository
Integration with their ETL environment
API usage was not somehting they looked at initially, as when they put data viertualization in place, they were completely focused on security but denodo makes it so easy to deploy. For exemple instead of a 2week cycle, where someone goes implementing a javascript abstraction layer that talks to api management technology and make sure that it’s secured correctly, we have a simple right click and you have a new web service . You would still need to work with the api management team but it’s cut 2/3 of the cycle time.
They have push-down optmizitaion: data scientists looking at data in this 100 nodes cluster, they wanted everything they are doing to be pushed down to that cluster, they wanted to avoid federation and joins processed at the denodo layer
Federate data: for optimisation reasons, they really didn’t want to have denodo clusters doing joins as they already are deploying new clusters with big processing power, why redoing that in denodo. But when new data, that is outside then the data federation becomes necessary. So data scientists have access to denodo to federate data, but when they want to publish it, it gets sent through the big data optimisation piece
So, let’s get started with an introduction to DV:
What is=t is
Why it is different from your traditionnal data integration
Pour notre démo, nous aurons le scénario suivant : nous voulons analyser l’impact de nos campagnes marketing et ce par pays.
Pour cela, nous aurons besoin d’accéder:
Aux données de vente historiques stockées dans un cluster Hadoop, avec Cloudera Impala
Les données campagnes marketing sont accessibles via un web service dans le cloud
Et enfin les données clients qui nous permettent de récupérer le champ pays sont dans une base de données plus traditionnelles, comme Oracle ici en l’occurrence.
La vue métier exposée et consommée via un outil de BI tel que Tableau ou Power BI sera donc une combinaison de ces trois sources hétérogènes avec des filtres et et des aggrégats par pays.
We performed extensive testing using queries from the standard test TPC-DS*: the goal is to compare Logical Data Warehouse and Physical Data Warehouse
We have a federated approach in Denodo with an MPP system where all the data has been replicated via an ETL.
We will start out looking at a performance comparison. Comparing the same set of queries run against a logical data warehouse scenario you can see on the left, where the sales fact table is stored in a data warehouse, but the dimension tables, in this case customer dimension are actually stored in a separate physical store. So we are looking at federating 3 data sources and compare the performance of these queries against the same queries executed against a single based source where all of the dimensions and fact tables have been preloaded into a data warehouse. In both instances we are using Netezza as a data warehouse.
Now in this scenario, we are running some queries that come from the TPC-DS standard set of queries. TPC = Transactions ?? And DS is Decision support systems. So these are typical queries for reporting scenarios, ex: give customer sales per country, etc.
The results are here with :
Full aggregation push-down
Partial aggregation push-down
And on the fly data movement
If we go and have a look at the performance comparison, you can see the queries on the left hand side and the data volumes they return.
Then we can see the performances for the physical data warehouse where all the data has been moved to Netezza and you can see the comparative time in the Logical data warehouse scenario where the data stayed in the original three data sources.
And you can see that the results are very very similar.
So the overheads involved here are very minimal, the DV platform has been doing a great job at optimizing these queries and that without the effort and time to move and have all the data preloaded into one data source, Netezza in this case.
And one can wonder why are they so close?
Denodo works with any data source type, even recent technologies.
Early on, our founder saw that modern data integration was headed toward pervasive, real-time data access, and he built the best data virtualization solution on the market. Now data virtualization has tremendous momentum, as you can see here an example from Gartner, and we look forward to partnering you. (HGGC)
Denodo is the leader in data virtualization providing agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data, unstructured data sources and real-time data services at half the cost of traditional approaches. Denodo’s customers across every major industry have gained significant business agility and ROI by enabling faster and easier access to unified business information for agile BI, big data analytics, Web and cloud integration, single-view applications, and enterprise data services. Denodo is well-funded, profitable and privately held.