Watch the full webinar: Data Ninja Webinar Series by Denodo: https://goo.gl/QDVCjV
The expanding volume and variety of data originating from sources that are both internal and external to the enterprise are challenging businesses in harnessing their big data for actionable insights. In their attempts to overcome big data challenges, organizations are exploring data lakes as consolidated repositories of massive volumes of raw, detailed data of various types and formats. But creating a physical data lake presents its own hurdles.
Attend this session to learn how to effectively manage data lakes for improved agility in data access and enhanced governance.
This is session 5 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O
8. 8
Duplicate sets of data in source system and data lake
Creates data quality challenges because data must be updated in two
places.
Expensive since the data needs to be stored and maintained twice
Governance of such large amounts of data can be challenging
Restrictions on data access must be maintained as data is brought
into the data lake as well as on new data created within the data lake.
Data lakes themselves can become silos
Often built for a specific department, data from the data lake must be
integrated with other enterprise data to create a complete picture
Limitations of Physical Data Lakes
8
9. 9
Overcoming the Limitations of Physical Data Lakes
Integrate your Data Lake with other Enterprise Data Architectures
Provides a way to
access data from
separate systems
through an
abstraction layer
that makes it
appear as if the
data were in a
single data lake
Improves the
enterprise func-
tionality of data
lakes by
combining one or
more physical
data lakes with
other enterprise
data
Improves an
organization’s
ability to govern
and extract more
value from its
data lakes by
extending them
as logical data
lakes
9
Implement a Single Logical Data Lake Using Data Virtualization
10. Marketing
Data Lakes
Research
Logical Data Lake/Big Data Fabric
Healthcare
Self-Service
Analytics
Operational
Apps
A Single Governed Logical Data Lake
Data Virtualization combines one or more physical data lakes with other enterprise data to create a
“virtual” or “logical” data lake.
Other Data Sources
MDM Cloud Apps
BI/Analytical
Tools
Excel
Reports
DATA VIRTUALIZATION
Semantic
Model
Data
Discovery
Metadata
Catalog
Data
Governance
Denodo Platform Bridges Distinct Data Architectures
10
Discover
Prepare
Curate
Orchestrate
Integrate
Publish
“Rely on Data Integration
Infrastructure to make the
Data Lake Work.”
Philip Russom, Analyst, TDWI
11. 11
Data Governance
Data Lineage
Structure and organization to your data lake
Data Masking
Data Quality
Functions for validating, cleansing, enriching
and standardizing data.
SDK to integrate with external DQ tools and Big
Data systems
Enterprise Access Point
Enterprise-level access controls – table, row, column
Authentication/Authroization
Roles
Audit all access
Encryption/Decryption
Universal Semantic Model
The Governed Data Lake
17. 17
Performance
Denodo’s unique query optimizer
Denodo’s optimizer borrows many techniques from traditional RDBMs
Query plans based on statistics and indexes
Multiple JOIN methods
Query rewriting to generate more optimal SQL
However, given the distributed execution of a query in a processing
fabric, Denodo has designed unique techniques to maximize
performance in this environment
Dynamic rewriting focused on maximizing execution at source and reduction of
network traffic
Cost estimates also factor-in:
Processing power of the sources (e.g. number of nodes in a Hadoop cluster)
Network and transfer rates
18. 18
1. A logical data lake prevents the data lake from becoming
a silo and provides access to all the information an
organization needs to power its analytics.
2. Data virtualization improves agility in big data activities.
Users can quickly combine sources of information
without spending time installing and configuring new
databases or clusters to store the consolidated
information.
3. This ease of use encourages exploration of data since the
cost or effort to access the information is lower.
4. Data virtualization eliminates the cost of storing
information twice and the need to update information in
multiple places since information is not duplicated.
Managing Data Lakes
18
Data virtualization is a practical strategy for managing data lakes
19. 19
Why do Enterprises need Denodo’s Big Data Fabric
to succeed?
New actionable insights with minimal effort
Information Self-service for business users Secures big data end-to-end
Real-time integrated data
across the business
Ability to aggregate, transform, cleanse,
and integrate data from multiple big data
sources, which can then be presented in
dashboards, reporting tools, and web
applications.
Allows any application, process,
dashboard, tool, or user to access
any integrated data, regardless of
where the data is physically or
logically located and regardless of
the data format.
Offers consistent, timely, and
trusted data for internal and
external users.
Enables centralized data access
and control, and supports data-at-
rest and data-in-motion security
measures.
Remediates security risks with
masking, auditing, and encryption
across the fabric.
Provides self-service data discovery
and search capabilities.
Virtual Sandboxing for Citizen Users
21. 21
-Source: “Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure
Modernization”
By 2018, organizations with data virtualization capabilities will
spend 40% less on building and managing data integration
processes for connecting distributed data assets.”
23. Developers and users don’t need
to learn special languages.
They can leverage Denodo
graphical user interface to
model, unify and deliver the data
to multiple consumers.
Data Virtualization combines
new and legacy data sources
24. 24
How the data goes in… How it gets back out…
Denodo’s Big data
fabric provides
easy access to big
data without
having to decipher
various data
formats.
Data Virtualization provides ease of use
25. 25
-Source: “Forrester Wave™: Big Data Fabric Q4 2016”
Denodo’s key strength is delivering a unified and centralized
data services fabric with security and real-time integration
across multiple traditional and big data sources, including
Hadoop, NoSQL, cloud, and software-as-a-service (SaaS).”
26. 26
-Source: “Forrester Wave™: Big Data Fabric Q4 2016”
Today, several enterprises are leveraging Denodo to support big data fabric
deployments — such as virtual big data marts, big data analytics, realtime
analytics, and iot data processing — in various vertical industries. Customers
like its easy-to-use, simple yet sophisticated data modeling capabilities,
search, and support for various big data sources.
29. 29
Leading Construction Manufacturer - Telematics &
Predictive Maintenance
Dealer
Maintenance
Parts Inventory
OSI PI Hadoop Cluster
Tableau: Dealer / Customer Dashboard
30. 30
Enrich Machine Data and Combine with Other Data
Ingest, Integrate & Deliver
Persisted
(In-memory, Hadoop)
Streams
(specific time window)
Message Queue
Machine-generated/Event data Alerts
Workflows
Operational
Processes
Analytical
Processes
Consumers
Visualization
Data Virtualization
Enrich and Combine IoT
Data with Other Data
Historians
Streams
ERP/SCM
DW
Analytical
DB
MDM
Apps
Data Marts
Hadoop NoSQL
31. 31
Business Benefits
Improved asset performance and proactive maintenance.
Reduced warranty costs due to proactive maintenance of parts
preventing parts failure.
Optimized pricing for services and parts among global service
providers.
New Business Model opportunities based on real-time analysis
of detailed sensor data.
33. Next Steps
Forrester Wave™: Big Data Fabric Q4 2016
http://www.denodo.com/en/page/forrester-wave-big-data-fabric-
q4-2016
Get Started!
Download Denodo Express: www.denodoexpress.com
Access Denodo Platform on AWS: www.denodo.com/en/denodo-
platform/denodo-platform-for-aws
34. Data Ninja Webinar Series
Sessions covering data virtualization solutions for driving business value
Next Series: Packed Lunch Series
January - 2017