SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
The First Step in Information Management
looker.com
Produced by:
MONTHLY SERIES
In partnership with:
Data Lake Architecture
October 5, 2017
Topics for Today’s Analytics Webinar
 Benefits and Risks of a Data Lake
 Data Lake Reference Architecture
 Lab and the Factory
 Base Environment for Batch Analytics, Streaming and Real-Time Data
 Critical Governance Components
 Key Take-Aways
 Q&A
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Polling Questions
 Do you have a data lake?
− Yes
− No
− Unsure
 If yes, is it:
− Operational and regularly used for analytics
− Informally used, like a lab or sandbox
− Unsure
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Defining the Data Lake
 A data lake is a collection of storage instances of various data assets additional to the
originating data sources. These assets are stored in a near-exact, or even exact, copy
of the source format.
 The purpose of a data lake is to present an unrefined view of data to only the most
highly skilled analysts, to help them explore their data refinement and analysis
techniques independent of any of the system-of-record compromises that may exist
in a traditional analytic data store (such as a data mart or data warehouse).
 A data lake can support either/or exploratory analytics and operational uses of data.
pg 4© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Source: Gartner IT Glossary
www.firstsanfranciscopartners.com
Benefits and Risks of the Data Lake
Benefits of the Data Lake
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
 Enables “productionizing” advanced
analytics
 Cost-effective scalability and flexibility
 Derives value from unlimited data types
(including raw data)
 Reduces long-term cost of ownership across
entire spectrum of data use
Risks of the Data Lake
 Loss of trust
 Loss of relevance and momentum
 Increased risk
 Long-term excessive cost
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
www.firstsanfranciscopartners.com
Data Lake Reference Architecture
Modern Reality of the Data Lake
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
 The data lake has changed due to storage availability, data management tools
and ease of which data can be managed.
 Today’s data lake is comprised of:
‒ Landing Zone
‒ Standardization Zone
‒ Analytics Sandbox
Modern Reality of the Data Lake
pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONEDATA SOURCES
Landing Zone:
Closest to original data lake
conception where raw data is stored
and available for consumption
Modern Reality of the Data Lake
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONEDATA SOURCES
Standardization Zone:
Standardized, cleaned data –
the preferred version for
downstream consumers and
the Analytics Sandbox
Modern Reality of the Data Lake
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOXDATA SOURCES
Analytics
Sandbox:
Where Data
Scientists
work to
create new
models
Modern Reality of the Data Lake
pg 13© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOXDATA SOURCES
DATA MANAGEMENT
Modern Reality of the Data Lake
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX
DATA GOVERNANCE DATA OPERATIONS
DATA SOURCES
DATA MANAGEMENT
Modern Reality of the Data Lake
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX
DATA GOVERNANCE DATA OPERATIONS
DATA SOURCES
DATA SCIENTISTS
DATA MANAGEMENT
Modern Reality of the Data Lake
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX
DATA GOVERNANCE
DATA CONSUMERS
DATA OPERATIONS
DATA SOURCES
DATA SCIENTISTS
DATA MANAGEMENT
Reminder: Two Lenses to Derive an Effective Architecture
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Form
Developing the
architecture so all
stakeholders can
actually understand
and develop it
Progression
Develop architectures
that are best fit for
purpose and effective,
no matter how simple
or complex
www.firstsanfranciscopartners.com
Lab and the Factory
Why is This Topic Important?
 A key to successful data lake management is understanding if it is a lab,
a factory or both.
 There are architectural, governance and organizational impacts.
 You must clearly identify if you are evolving from a lab to a factory or intend
to keep them separate.
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
First Progression – Lab Elements
pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization ElementsFunctional Elements Technology Elements
Data
Consumption
DataSupply
Chain/Logistics
Data
Management
Landing/Staging ETL
Data Analysts
Access – Publish, Subscribe, Notify
Access Tools – BI, AnalyticsAnalytics – Descriptive,
Predictive, Prescriptive
HDFS, Columnar and Graph
Operational Elements
pg 21© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization ElementsFunctional Elements Technology Elements
Data
Consumption
DataSupply
Chain/Logistics
Data
Management
Pedigree and Preparation
Landing/Staging
Model/Metrics Management
Data Reduction
Glossary Management
Machine Learning/AI
Data Governance
Data OperationsData Ingestion
Reference and Master Data
Competency Centers
Self-Service/Data Citizens
ETL/Virtualization
Distributed Processing
Metadata
Data Quality/Hygiene
Lake, Pond, Warehouse
HDFS, Columnar and Graph
Data Streaming
Data Glossary
Data Lake Management
Taxonomy/Ontology
Web Services
Policy and Process
Data Analysts and Scientists
Collaboration, Decision-Making
Access – Publish, Subscribe, Notify
Access Tools – BI, Analytics
Applications
Analytics – Descriptive,
Predictive, Prescriptive
Business/Tech. Planning
Security, Privacy
Business Continuity
pg 22
The Lab – Characteristics
 Allows for experimentation, testing new models, proof of concepts
 Technical
− Flexible architectures, even ad hoc or non-persistent
− Rarely documented
− Schema on read
 Organizational
− Run by the main users, hence informal or departmental
 Functional
− By nature, results should be evaluated for relevance
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
pg 23
The Factory – Characteristics
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
 Addressing directed requirements, producing regular outputs
associated with a business service, product or action
 Technical
− Architecture needs to be defined so its use and limits are understood
 Organizational
− Published rules of engagement
 Functional
− Data quality is monitored and known
− Lineage and metadata support navigation and use of content
− May need scheduled access and loading
− Publishing results will require some form of quality control and approval
− Models that are executed on a scheduled basis will require some sort of
administrative and maintenance capabilities
www.firstsanfranciscopartners.com
Base Environment for Batch Analytics,
Streaming and Real-Time Data
A Base Environment
pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Data Governance Data Operations
Rapid ingestion –
stream, low latency
or batch updating
Ease of access –
find it, use it,
know what it
means
Effective data supply chain – data of the correct
quality needs to be where it is supposed to be
Flexibility – Data
Scientists need to be
able to experiment,
but without polluting
the lake
Additional Components for Real-time Analytics
and Ingesting Streaming Data
 Are you replacing the Operational Data Store (ODS)?
 Will you be doing full CRUD operation (Create, Update, Read, Delete)?
 How fast do you need to go? Latencies should match your real needs.
 Vendors – Hortonworks, Attunity, Splice
− Ingest
− Process
− Consumption
 Technologies you will hear about
− Apache Kafka, Storm (real-time streaming components)
− Apache Spark (fast batches)
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
www.firstsanfranciscopartners.com
Critical Governance Components
Major Areas of Data Governance Concern
pg 28© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
In the Data Lake
LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX
DATA GOVERNANCE
DATA CONSUMERS
DATA OPERATIONS
DATA SOURCES
DATA SCIENTISTS
1
2
3 3 3
4
5
6
1
2
3
4
5
6
Data Acquisition
Data Catalog
Data Decisions
Analytics Governance
Data Usage
Model Productionalization
Some Data Governance approaches are new, and
others are applications of traditional approaches
Major Areas of Data Governance Concern in the Data Lake
pg 29© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Data is
cataloged/
mapped so it’s
easily found
Data is described
adequately to
permit reuse for
any need
Decisions about data are logged and communicated
Flow of data (data lineage) is documented, so users/
regulators can understand where it came from
Staff who knows and
understands the
data are identified
Data Governance defines
the information you
need to maintain your
data, develops the
processes to do this,
trains staff and provides
the environments to
manage the knowledge,
while monitoring and
ensuring compliance.
Evolution of Critical Governance Components
pg 30© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
While flexible, governance is
required to ensure appropriate use
While operational, governance will ensure
legitimacy, compliance and verify
alignment with business needs
To move to operational, governance
should supply road map, new policies,
training and organization management
www.firstsanfranciscopartners.com
Key Take-Aways
Key Take-Aways
 Make sure you offer up business benefits in addition to
traditional “access to data” – such as lower costs, more
nimble reactions.
 Avoid additional data risks by providing oversight of data
quality and sources. Do not take a causal approach to
managing the data lake assets.
 Understand that the architectural aspect of the data lake
(as it is evolving) is becoming a standard, much like the data
warehouse.
 Maintain an open mind for supporting technologies, because
they are changing every day.
 Implement Data Governance. It is a critical success factor, no
matter how you view the data lake.
pg 32© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Questions?
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
MONTHLY SERIES
Thank you for joining today!
Please join us Thursday, Nov. 2 for the
Keys to Effective Data Visualization webinar.
John Ladley @jladley
john@firstsanfranciscopartners.com
Kelle O’Neal @kellezoneal
kelle@firstsanfranciscopartners.com

Mais conteúdo relacionado

Mais procurados

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceAlation
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationDenodo
 

Mais procurados (20)

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data mesh
Data meshData mesh
Data mesh
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
 

Semelhante a Data Lake Architecture

DI&A Slides: Data Insights and Analytics Frameworks
DI&A Slides: Data Insights and Analytics FrameworksDI&A Slides: Data Insights and Analytics Frameworks
DI&A Slides: Data Insights and Analytics FrameworksDATAVERSITY
 
Advanced Databases and Knowledge Management
Advanced Databases and Knowledge ManagementAdvanced Databases and Knowledge Management
Advanced Databases and Knowledge ManagementDATAVERSITY
 
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureData Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureDATAVERSITY
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?DATAVERSITY
 
DI&A Slides: Data-Centric Development
DI&A Slides: Data-Centric DevelopmentDI&A Slides: Data-Centric Development
DI&A Slides: Data-Centric DevelopmentDATAVERSITY
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
The Missed Promise of Hadoop and New and Emerging Technologies
The Missed Promise of Hadoop and New and Emerging TechnologiesThe Missed Promise of Hadoop and New and Emerging Technologies
The Missed Promise of Hadoop and New and Emerging TechnologiesDATAVERSITY
 
DI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDATAVERSITY
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSUSE Italy
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWSOcean9, Inc.
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Matt Stubbs
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Trends in Data Analytics - From Database to Analyst
Trends in Data Analytics - From Database to AnalystTrends in Data Analytics - From Database to Analyst
Trends in Data Analytics - From Database to AnalystDATAVERSITY
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH ModernizationTrivadis
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 

Semelhante a Data Lake Architecture (20)

DI&A Slides: Data Insights and Analytics Frameworks
DI&A Slides: Data Insights and Analytics FrameworksDI&A Slides: Data Insights and Analytics Frameworks
DI&A Slides: Data Insights and Analytics Frameworks
 
Advanced Databases and Knowledge Management
Advanced Databases and Knowledge ManagementAdvanced Databases and Knowledge Management
Advanced Databases and Knowledge Management
 
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureData Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
DI&A Slides: Data-Centric Development
DI&A Slides: Data-Centric DevelopmentDI&A Slides: Data-Centric Development
DI&A Slides: Data-Centric Development
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
The Missed Promise of Hadoop and New and Emerging Technologies
The Missed Promise of Hadoop and New and Emerging TechnologiesThe Missed Promise of Hadoop and New and Emerging Technologies
The Missed Promise of Hadoop and New and Emerging Technologies
 
DI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data Analytics
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service Platform
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWS
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Trends in Data Analytics - From Database to Analyst
Trends in Data Analytics - From Database to AnalystTrends in Data Analytics - From Database to Analyst
Trends in Data Analytics - From Database to Analyst
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH Modernization
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 

Mais de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 

Último

Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 

Último (20)

Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 

Data Lake Architecture

  • 1. The First Step in Information Management looker.com Produced by: MONTHLY SERIES In partnership with: Data Lake Architecture October 5, 2017
  • 2. Topics for Today’s Analytics Webinar  Benefits and Risks of a Data Lake  Data Lake Reference Architecture  Lab and the Factory  Base Environment for Batch Analytics, Streaming and Real-Time Data  Critical Governance Components  Key Take-Aways  Q&A pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 3. Polling Questions  Do you have a data lake? − Yes − No − Unsure  If yes, is it: − Operational and regularly used for analytics − Informally used, like a lab or sandbox − Unsure pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 4. Defining the Data Lake  A data lake is a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format.  The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).  A data lake can support either/or exploratory analytics and operational uses of data. pg 4© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Source: Gartner IT Glossary
  • 6. Benefits of the Data Lake pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com  Enables “productionizing” advanced analytics  Cost-effective scalability and flexibility  Derives value from unlimited data types (including raw data)  Reduces long-term cost of ownership across entire spectrum of data use
  • 7. Risks of the Data Lake  Loss of trust  Loss of relevance and momentum  Increased risk  Long-term excessive cost pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 9. Modern Reality of the Data Lake pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com  The data lake has changed due to storage availability, data management tools and ease of which data can be managed.  Today’s data lake is comprised of: ‒ Landing Zone ‒ Standardization Zone ‒ Analytics Sandbox
  • 10. Modern Reality of the Data Lake pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONEDATA SOURCES Landing Zone: Closest to original data lake conception where raw data is stored and available for consumption
  • 11. Modern Reality of the Data Lake pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONEDATA SOURCES Standardization Zone: Standardized, cleaned data – the preferred version for downstream consumers and the Analytics Sandbox
  • 12. Modern Reality of the Data Lake pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOXDATA SOURCES Analytics Sandbox: Where Data Scientists work to create new models
  • 13. Modern Reality of the Data Lake pg 13© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOXDATA SOURCES DATA MANAGEMENT
  • 14. Modern Reality of the Data Lake pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX DATA GOVERNANCE DATA OPERATIONS DATA SOURCES DATA MANAGEMENT
  • 15. Modern Reality of the Data Lake pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX DATA GOVERNANCE DATA OPERATIONS DATA SOURCES DATA SCIENTISTS DATA MANAGEMENT
  • 16. Modern Reality of the Data Lake pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX DATA GOVERNANCE DATA CONSUMERS DATA OPERATIONS DATA SOURCES DATA SCIENTISTS DATA MANAGEMENT
  • 17. Reminder: Two Lenses to Derive an Effective Architecture pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Form Developing the architecture so all stakeholders can actually understand and develop it Progression Develop architectures that are best fit for purpose and effective, no matter how simple or complex
  • 19. Why is This Topic Important?  A key to successful data lake management is understanding if it is a lab, a factory or both.  There are architectural, governance and organizational impacts.  You must clearly identify if you are evolving from a lab to a factory or intend to keep them separate. pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 20. First Progression – Lab Elements pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Organization ElementsFunctional Elements Technology Elements Data Consumption DataSupply Chain/Logistics Data Management Landing/Staging ETL Data Analysts Access – Publish, Subscribe, Notify Access Tools – BI, AnalyticsAnalytics – Descriptive, Predictive, Prescriptive HDFS, Columnar and Graph
  • 21. Operational Elements pg 21© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Organization ElementsFunctional Elements Technology Elements Data Consumption DataSupply Chain/Logistics Data Management Pedigree and Preparation Landing/Staging Model/Metrics Management Data Reduction Glossary Management Machine Learning/AI Data Governance Data OperationsData Ingestion Reference and Master Data Competency Centers Self-Service/Data Citizens ETL/Virtualization Distributed Processing Metadata Data Quality/Hygiene Lake, Pond, Warehouse HDFS, Columnar and Graph Data Streaming Data Glossary Data Lake Management Taxonomy/Ontology Web Services Policy and Process Data Analysts and Scientists Collaboration, Decision-Making Access – Publish, Subscribe, Notify Access Tools – BI, Analytics Applications Analytics – Descriptive, Predictive, Prescriptive Business/Tech. Planning Security, Privacy Business Continuity
  • 22. pg 22 The Lab – Characteristics  Allows for experimentation, testing new models, proof of concepts  Technical − Flexible architectures, even ad hoc or non-persistent − Rarely documented − Schema on read  Organizational − Run by the main users, hence informal or departmental  Functional − By nature, results should be evaluated for relevance © 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 23. pg 23 The Factory – Characteristics © 2017 First San Francisco Partners www.firstsanfranciscopartners.com  Addressing directed requirements, producing regular outputs associated with a business service, product or action  Technical − Architecture needs to be defined so its use and limits are understood  Organizational − Published rules of engagement  Functional − Data quality is monitored and known − Lineage and metadata support navigation and use of content − May need scheduled access and loading − Publishing results will require some form of quality control and approval − Models that are executed on a scheduled basis will require some sort of administrative and maintenance capabilities
  • 24. www.firstsanfranciscopartners.com Base Environment for Batch Analytics, Streaming and Real-Time Data
  • 25. A Base Environment pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Data Governance Data Operations Rapid ingestion – stream, low latency or batch updating Ease of access – find it, use it, know what it means Effective data supply chain – data of the correct quality needs to be where it is supposed to be Flexibility – Data Scientists need to be able to experiment, but without polluting the lake
  • 26. Additional Components for Real-time Analytics and Ingesting Streaming Data  Are you replacing the Operational Data Store (ODS)?  Will you be doing full CRUD operation (Create, Update, Read, Delete)?  How fast do you need to go? Latencies should match your real needs.  Vendors – Hortonworks, Attunity, Splice − Ingest − Process − Consumption  Technologies you will hear about − Apache Kafka, Storm (real-time streaming components) − Apache Spark (fast batches) pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 28. Major Areas of Data Governance Concern pg 28© 2017 First San Francisco Partners www.firstsanfranciscopartners.com In the Data Lake LANDING ZONE STANDARDIZATION ZONE ANALYTICS SANDBOX DATA GOVERNANCE DATA CONSUMERS DATA OPERATIONS DATA SOURCES DATA SCIENTISTS 1 2 3 3 3 4 5 6 1 2 3 4 5 6 Data Acquisition Data Catalog Data Decisions Analytics Governance Data Usage Model Productionalization Some Data Governance approaches are new, and others are applications of traditional approaches
  • 29. Major Areas of Data Governance Concern in the Data Lake pg 29© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Data is cataloged/ mapped so it’s easily found Data is described adequately to permit reuse for any need Decisions about data are logged and communicated Flow of data (data lineage) is documented, so users/ regulators can understand where it came from Staff who knows and understands the data are identified Data Governance defines the information you need to maintain your data, develops the processes to do this, trains staff and provides the environments to manage the knowledge, while monitoring and ensuring compliance.
  • 30. Evolution of Critical Governance Components pg 30© 2017 First San Francisco Partners www.firstsanfranciscopartners.com While flexible, governance is required to ensure appropriate use While operational, governance will ensure legitimacy, compliance and verify alignment with business needs To move to operational, governance should supply road map, new policies, training and organization management
  • 32. Key Take-Aways  Make sure you offer up business benefits in addition to traditional “access to data” – such as lower costs, more nimble reactions.  Avoid additional data risks by providing oversight of data quality and sources. Do not take a causal approach to managing the data lake assets.  Understand that the architectural aspect of the data lake (as it is evolving) is becoming a standard, much like the data warehouse.  Maintain an open mind for supporting technologies, because they are changing every day.  Implement Data Governance. It is a critical success factor, no matter how you view the data lake. pg 32© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
  • 33. Questions? © 2017 First San Francisco Partners www.firstsanfranciscopartners.com MONTHLY SERIES
  • 34. Thank you for joining today! Please join us Thursday, Nov. 2 for the Keys to Effective Data Visualization webinar. John Ladley @jladley john@firstsanfranciscopartners.com Kelle O’Neal @kellezoneal kelle@firstsanfranciscopartners.com