The document discusses creating a modern data architecture using a data lake. It describes Zaloni as a provider of data lake management solutions, including a data lake management and governance platform and self-service data platform. It outlines key features of a data lake such as storing different types of data, creating standardized datasets, and providing shorter time to insights. The document also discusses Zaloni's data lake maturity model and reference architecture.
WordPress Websites for Engineers: Elevate Your Brand
Strata San Jose 2017 - Ben Sharma Presentation
1. Creating a modern data architecture
March 15th, 2017
Ben Sharma | CEO
ben@zaloni.com
2. • Award-winning provider of enterprise data lake
management solutions:
Data lake management and governance platform
Self-service data platform
• Data Lake Design and Implementation:
POC, Pilot, Production, Operations, Training
• Data Science Professional Services
4. 4 Zaloni Proprietary
• Store all types of data in its raw format
• Create Refined, Standardized, Trusted datasets for various use
cases
• Store data for longer periods of time to enable historical analysis
• Query and Access the data using a variety of methods
• Manage streaming and batch data in a converged platform
• Provide shorter time-to-insight with proper data management and
governance
The data lake promise
5. 5 Zaloni Proprietary
Data architecture modernizationTraditionalModern
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various Sources
Data Discovery
Analytics BI
Data Science
Data Discovery
Analytics BI
6. Zaloni Confidential and Proprietary - Provided under NDA
6 Zaloni Proprietary
0% of market
Optimize
Self-Organizing Data Lake
• Self-improving data
lake via machine
learning algorithms
• True democratization
of big data and
analytics
• Intelligent data
remediation and
curation
• Recommended Data
Security, and
Governance policies
• Lights out business
operations optimized
for business success
2% of market
Automate
Responsive Data Lake
• Self-Service Ingestion
& Provisioning
• 360 View of Customer,
Product, etc
• Enterprise Data
Discovery
• Operationalize
analytical models into
business fabric
• Enables immediate data
impact on business
operations
Manage
10% of market
Managed Data Lake
• Acquire useful data from
across the enterprise
• Improved visibility and
understanding via
managed Ingestion of
data and metadata
• Ensure security and privacy
of sensitive data
• Operationalize
data at scale
• Leverage enterprise
governance &
security policies
• Scalable production data
lake for new and improved
business insights
22% of market
Store
Data Swamp
• Hadoop on premises
or in the Cloud
• Limited visibility and
usability of data
• Limited corporate
oversight & governance
• Sandbox or Dev
Environments
• Ad hoc and incremental
growth of big data
applications
• Ad-hoc and exploratory
insights for individual
use cases
Zaloni Big Data Maturity Model
Stage:
Characteristics:
Descriptor:
Stage Today:
Business Impact:
Ignore
66% of market
• Emphasis on
structured data
• Limited ability to
leverage data at
scale
• Business emphasis
on retrospective
reporting and
analysis
• Strong governance
and security policies
• Slow to
accommodate
business changes
Data Warehouse
Value Realized
7. 7 Zaloni Proprietary
Data Lake Reference Architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies
• Consumers are anyone with appropriate role-based access
• Single version of truth
Transient
Landing Zone Raw Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of
source data
• Consumers are IT,
Data Stewards
• Implemented in highly
regulation industries
• Original source data
ready for consumption
• Consumers are ETL
developers, data
stewards, some data
scientists
• Single source of truth
with history
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
Sensors
(or other time series data)
Relational Data
Stores (OLTP/ODS/
DW)
Logs
(or other unstructured
data)
Social and
shared data
8. 8 Zaloni Proprietary
• Leverage the full power of a scale-out
architecture with an actionable, scalable
data lake
Data Lake 360°: Zaloni’s integrated platform for data lakes
1. Enable the lake
2. Govern the data
• Improve data visibility,
reliability and quality to
reduce time-to-insight
• Safeguard sensitive data and
enable regulatory compliance
• Foster a data-driven business
through self-service data
discovery and preparation
3. Engage the business
9. 9 Zaloni Proprietary
1. Based on a foundation of metadata management
2. Lightweight and distributed
3. Hybrid – top down and bottom up approach
Data Governance
Centrally governed, critical data
elements
Regionally governed,
departmental data sets
Locally governed, data used in
specific applications
Gartner Data and Analytics 2017
10. 10 Zaloni Proprietary
1. Central to a well-managed data lake – provides visibility, reliability and enables
data governance
2. Capture and manage operational, technical and business metadata
3. Reduced time to insight for analytics
Types of metadata:
§ Where it resides, and how was it ingested
§ What it means, and how it should be interpreted
§ What governance policies apply to it
§ What it's worth, and how its value can be expressed
§ Who it's accessed and consumed by
§ Which business processes downstream consume it
Metadata Management
11. 11 Zaloni Proprietary
Metadata Exchange Framework
1. Metadata sharing is critical for an integrated approach
2. Federated approach for metadata collection
3. Two way metadata exchange between the Data Lake and other Enterprise
repositories
Metadata Exchange Framework
Data Lake
Enterprise Metadata
repository
Two way
exchange
12. 12 Zaloni Proprietary
1. Ability to ingest vast amounts of data
2. Ability to handle a wide variety of formats (streaming, files, custom)
and sources
3. Build in repeatability
via automation to pick up
incoming data and apply
pre-defined processing
Managed Ingestion
13. 13 Zaloni Proprietary
1. See how data moves and how it is consumed in the data lake
2. Safeguard data and reduce risk, always knowing where data has come from,
where it is, and how it is being used
Data Lineage
14. 14 Zaloni Proprietary
1. Rules based data validation
2. Integration with the
managed data pipeline
3. Stats and metrics for
reporting and actions
4. Automation, Remediation,
Notifications
Future:
• ML based classification
Data Quality
15. 15 Zaloni Proprietary
1. Secure infrastructure for data in motion and data at rest
2. Role based access control for Metadata and the data
3. Mask or tokenize data before published in the lake for consumption
4. Audit, access logs, alerts and notifications
Data Security and Privacy
16. 16 Zaloni Proprietary
1. Hot -> Warm -> Cold on an entity level based on policies/SLAs
2. Provide data management features to automate scheduling and orchestration
of data movement between heterogeneous storage environments
3. Across on-premise and cloud environments
Data Lifecycle Management
17. 17 Zaloni Proprietary
Data Catalog
1. See what data is available across your enterprise
2. Contribute valuable business information to improve search and usage
3. Use a shopping
cart experience
to create sandbox
for ad-hoc
and exploratory
analytics
18. 18 Zaloni Proprietary
Self-service Data Preparation
1. Blend data in the lake without a costly IT project
2. Perform interactive data-driven transformations
19. 19 Zaloni Proprietary
• How do you create a cloud agnostic data
lake platform?
• How deploy a cost-effective compute layer?
§ Elastic compute layer
§ Batch and near real-time
• How do you optimize storage?
§ Support polyglot persistence
§ DLM
• How do you optimize network connectivity
between Ground to Cloud?
• How do you meet enterprise security
requirements?
Considerations for data lake in the cloud
CLOUD and HYBRID
ENVIRONMENTS
20. 20 Zaloni Proprietary
Building your blueprint
1. Questions 2. Inputs 3. Outcomes
Business Drivers
AND Business
Questions:
e.g. Where is fraud
occurring? How do I
optimize inventory?
Data Use Cases Platform
Subject Areas
Source System
Capabilities,
Process
Ingest, Organize,
Enrich, Explore
Roadmap
Managed
Data Lake
Analytics
Strategy
=
++
21. New Buyer’s Guide on Data Lake Management and Governance
• Zaloni and Industry analyst firm,
Enterprise Strategy Group,
collaborated on a guide
to help you:
1. Define evaluation criteria and compare
common options
2. Set up a successful proof of concept
(PoC)
3. Develop an implementation that is future-
proofed
22. Free Starbucks
card with demo
or survey response
Complimentary
copy of new book
“Understanding Metadata”
Speaking Sessions:
Data Monetization Use Case
Dirk Jungnickel, SVP of BA and Big Data,
Emirates Integrated Telecommunications
Company (du), Tuesday, 11:00 a.m. – LL20 A
Visit Booth #923 for these giveaways!
Building a Modern Data Architecture
Ben Sharma, CEO and Founder, Zaloni
Wednesday, 11:50 a.m. – LL21 A