SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Got data… now what?
An introduction to modern data architecture
Susan Pierce
Product Manager, Data Analytics
Google Cloud
● Industry context
● History lesson: data warehouses and data lakes
● Data lakehouse
● Data mesh
● Data vault
● Considerations in determining a data strategy
Topics
Why you should care about data
Data
Organizations see data and AI/ML potential
Technology is Altering the Way Organizations Operate
but are struggling to make it a reality
Big data/analytics 34%
Artificial intelligence / machine learning 30%
Cloud infrastructure
Identity and access management (IAM)
Cloud databases
SaaS
Internet of Things / M2M
Security Orchestration Automation and
Response (SOAR)
Serverless computing
Next-generation WiFi
30%
19%
17%
14%
11%
10%
9%
9%
Q: Which of these technologies has the most potential to significantly
alter the way your business operates over the next 3 to 5 years?
CIO Magazine, Oct 2021
10%
of organizations achieve significant
financial benefits from AI
Boston Consulting Group
Are You Making the Most of Your Relationship with AI?, Oct 2020
Challenge #1
Data is big and
multi-format.
● Structured and unstructured
● Real-time streams and at-rest
● Across clouds and on-premise
181ZB
expected by 2025
- STATISTA, FEB 2022
Challenge #2
Data requires more
than SQL.
● Machine learning & AI
● Stream analytics and events
● Data-driven applications
75%
of enterprises will shift from
piloting to operationalizing
artificial intelligence
Gartner ®, Streaming Analytics in the
Cloud: A Comparative Analysis of
Amazon, Microsoft and Google, Sumit
Pal, Shaurya Rana, 14 December, 2021
73%
of data leaders feel
real-time access to data is
extremely important
Challenge #3
Data reaches
everyone.
● Mission critical
● Accessed by everyone
● Shareable asset
- HBR, 2021
Proprietary + Confidential
68%of companies are unable to realize
measurable value from data.
More data copies
Data is big and multi-format.
Data requires more than SQL.
Data reaches everyone.
More tech islands
More integrations
High costs
Low productivity
Limited access
More capacity
More security risk
More data silos Constant Capacity Planning
Data unavailable
Poor SLAs
Unclear compliance
Accenture, Closing the Data Value Gap, 2019
Proprietary + Confidential
The data warehouse
● Timeframe: 1980s/1990s
● Purpose: Bridge the gap between operational data and business intelligence
● Data type(s): Structured, cleaned data with a known schema. Operational data is
aggregated, cleaned/pre-processed, and inserted into the data warehouse in batches
● Good for: Forcing consistency (quality and integrity), querying known dimensions, large
queries
● Typical users: “The Business”
● Other info: Can store data from multiple source databases and don’t require a strict 1-1
mapping with transactional databases
● A data mart is a smaller, application-specific data warehouse, usually tied to specific
team or line of business (example: marketing specific data)
Proprietary + Confidential
Datamarts
(logical)
Data warehouse - single integrated repository of atomic data
Inmon – Enterprise Data Warehouse Kimball – Dimensional Data Warehouse
Data sources ETL
EDW
Datamart Datamart
Datamart
Application
Application
Data sources ETL EDW
BI tool
BI tool Application
Proprietary + Confidential
Legacy systems bust customer budgets
and it’s a hassle to renew the license.
Maintaining the operations of EDW
takes too much time.
Doesn’t support machine learning and AI
initiatives as well as streaming use cases.
Data is not fresh or current enough. Systems can’t keep up with forecasted
usage and data growth. It’s hard to scale
on compute or storage on-demand.
Cost challenges Modernization challenges
Scaling challenges
Data freshness
Data warehouses are often painful to manage
Proprietary + Confidential
BigQuery Architecture
NDA
SQL:2011
Compliant
Petabit Network
High-Available Cluster
Compute
(Dremel)
Streaming
Ingest
Free Bulk
Loading
Replicated, Distributed
Storage
(high durability)
REST API
Client
Libraries
In 7
languages
Web UI, CLI
Distributed Memory
Shuffle Tier
Decoupled storage and compute for maximum flexibility.
Storage API
BigQuery
BI Engine Compute
(Stateful workers)
BQML
Proprietary + Confidential
The data lake
● Timeframe: 2000s/2010s
● Purpose: Storage for data of any type in its native format without pre-processing
● Data type(s): Structured, unstructured, semi-structured in a flat architecture, ingested by
streaming, micro batch, or batch
● Good for: Flexibility across use cases, high volumes of data, data exploration,
granular/low level data
● Typical users: Data scientists
● Other info: Optimized for lower storage cost (cheaper hardware, open source tools), and
considered highly configurable because data is not restricted to a set schema
● Data lakes are usually queried in a programmatic fashion due to the large amounts of
high-variability data they contain
Business applications
Operational
databases
Documents
Log data
Object Storage
Metadata
Storage
(Catalog)
Replicated data
Governance
Security
Operations
ETL Discovery Exploration Analytics tools
Reporting/BI tools
Data lake, a simplified view
ETL
APIs
Proprietary + Confidential
On-premises data lakes are struggling to deliver value
Resource utilization and overall TCO
of on-premises data lakes becomes
unmanageable.
Data governance and security issues open
up compliance concerns.
Resource intensive data and analytics
processing can lead to missed SLAs.
Analytics experimentation is slow due
to resource provisioning time.
TCO challenges
Agility challenges
Scaling challenges
Governance challenges
Data Warehouse
● Schema on write
● Difficult to change
● Structured data
● Strong business context
● Batch ingestion
● Inherent security
● Schema on read
● Easy to change
● Raw data
● Geared for exploration
● Supports streaming
● Difficult to govern
Data Lake
Proprietary + Confidential
The big big data decision: data warehouse or data lake?
Use case
characteristics
Understanding your business
Data Warehouse
(TB scale)
Answer “known” questions
Access “known” data
Structured data
SQL access and manipulation
Data Lake
(PB scale)
Answer “unknown” questions
Access “unknown” data
Unstructured (raw) and structured data
Code-involved access and exploration
Exploring your business
Data type
and access
Paper: Build a modern, unified analytics data platform; Blog: Bringing data lakes and data warehouses together
Proprietary + Confidential
Security
Connecting Tools
Standardization
Financial Governance
Metadata
Security
Governance
Metadata
Security
Governance
Metadata
Security
Governance
Metadata
Security
Governance
Databases
Data Marts
Data Lakes
Data Warehouses
Transactions
Unstructured
Files
Logs
LoB Specific
Data
Real-time App
Data
BI
Machine
Learning
ETL/ ELT Tools
Consumer
App
Data Silos
With more data comes more responsibility
Proprietary + Confidential
80% of analytics work is
still descriptive”
MIT, 2020
Maturity
● Teams: analysts vs. engineers
● Model: self service vs.
centralized
● Technology: BI, AI, and data
fabric
Silos
● Multiple clouds, on-premises
legacy
● No consistent governance
● Duplication: data and
definitions
90% of employees say that
their work is slowed by
unreliable data sources”
Dimensional Research, 2020
Complexity
● Volume, velocity, and variety
● Data marts, EDW, data lakes, and
lake houses
● Experimentation and production
86% of analysts struggle with
data that's out of date”
Dimensional Research, 2020
Closing the data/value gap
Proprietary + Confidential
Empowering technology and people
Data lakehouse removes overhead of data lakes and
data warehouses
Data warehouse gets the capabilities of the data
lakes
Data lake gets the capabilities of the data
warehouses
Provides:
● Multimodal data access with higher volumes of
data
● Schema on read
● The governance that data lakes lacks but data
warehouses provide
Data mesh removes the organizational barriers becoming
the bottleneck
● Emphasizes on data Domain/Team, then technology,
● Agile teams/more insights
Teams own their data and technology
● Provides API access to others teams
● Decentralized raw and processed data
Provides:
● Well defined, governed, and secured data meshes
● Still able to leverage several domains with no data
movement
Enabling data Enabling teams
Proprietary + Confidential
lakehouse
Proprietary + Confidential
Data Warehouse
/ Data Lake
Cloud
Storage
Object Storage
(low-cost semi-unstructured data store)
Structured Storage
(highly optimized analytical store)
Spark Streaming
SQL
Beam
Users
User Experience
Batch
Data lakehouse building blocks
Consistent User
experience
Choice of
processing and
analytic engines
Decoupled data
storage
Data Warehouse
/ Data Lake
Cloud
Storage
Automated Data Discovery
Unified Permissioning
Integrated Data Catalog
Centralized
Management
AI/ML
Proprietary + Confidential
Data Warehouse
/ Data Lake
Cloud
Storage
Cloud Storage
(low-cost semi-unstructured data store)
BigQuery Storage
(highly optimized analytical store)
Dataproc
Spark, Flink,
Presto, Hive
BigQuery
SQL
Data
Fusion
Beam
Users
User Experience
Vertex AI
Google Cloud Data lakehouse building blocks
Consistent User
experience
Choice of
processing and
analytic engines
Decoupled data
storage
Data Warehouse
/ Data Lake
Cloud
Storage
Dataplex
(Governance, Management)
Data Catalog
(Discovery, Search, Metadata)
Centralized
Management
Dataflow …..
Proprietary + Confidential
NDA
Data lakehouse layers
Proprietary + Confidential
Data mesh
Proprietary + Confidential
Distributed ownership
Federated data domain teams
are responsible for maintaining
their data and making it useful
for others, and are provided the
freedom to choose the best
course of action.
Focus on value of data
Teams are incentivized to
maximize the value of their data
products while staying within
policies, and making their data
available and useful for others.
Central support
Central data platform team supports
distributed data domains with tooling,
processes, standards, and guardrails,
and automation in policy enforcement.
Data mesh enables a successful data culture
Through the provision of:
Proprietary + Confidential
Five targeted outcomes
1. Value of data is measured and recognized
2. Teams empowered to generate value from data
3. No central bottleneck
4. Each domain equipped with relevant skills and knowledge
to be successful
5. Distributed ownership for data governance
Proprietary + Confidential
Data mesh data architecture and capability model
Proprietary + Confidential
One way to build a data mesh
Proprietary + Confidential
Data vault
● The Data Model can grow by adding new links and from there new HUB/SAT
○ Flexible changes with less adjustments ETL and reporting (minimal impact
analysis)
○ Parallel loading
○ Minimal regression testing
What is Data Vault? “add-only” model
Rules
● No direct connections between Hubs
● Hubs don’t reference other entities
● Links reference Hubs
● Satellites reference Hubs or Links
Data Vault modelling - how does it work?
A Data Vault model consists of 3 basic entity types
○ The Hub is a business object and separates the business keys from the rest of the model.
No data (!), no relationships -> no reasons to change.
○ The Link stores relationships between hubs (using the business keys).
It is modelled as a many-to-many relationship.
○ Satellites store the context (the attributes of a business key or relationship).
Can be added without changing Hubs and Links.
Identity and Access Management
Scheduling, Logging & Monitoring
Version Control, Continuous Integration
Source Systems
ELT SQL SQL SQL
Check
datatypes
Semantic
integration
hard business
rules
Soft business
rules
Flexible require-
ments
Driven business
domains
Staging Raw Vault Business Vault Information Mart
Presentation Layer
Public
data
Other clouds
sales customer supplier
Data Lake
Relational databases, Legacy systems
Streaming data
No SQL Databases
Staging area Raw vault
Data vault
Operational vault
Metrics vault
Metadata
Data Governance, Data Life Cycle Management
Information
marts
Report
collection
Meta Mart
Metrics Mart
Error Mart
Sheets
Flat files
Cubes
People
Technology
Process
Where to start?
Strategic vs Tactical
Proprietary + Confidential
Paper: Build a modern distributed Data Mesh with Google Cloud
Blog: Building a Unified Analytics Data Platform
Paper: Build a modern, unified analytics data platform with Google Cloud
Blog: Data lake and data warehouse convergence
Paper: Converging Architectures: Bringing Data Lakes and Data Warehouses Together
Blog: Data driven transformation using Google's unified analytics platform
Paper: What type of data processing organization are you?
Blog: Open data lakehouse on Google Cloud
Paper: Building a data lakehouse on Google Cloud Platform
Blog: Building the data science driven organization
Blog: Building the data engineering driven organization
Blog: Building the data analyst driven organization from the first principles
Blog: Announcing BigQuery Migration Service
Further reading
Proprietary + Confidential
Thank you.
Proprietary + Confidential
How do you know where to start?

Mais conteúdo relacionado

Mais procurados

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesCustomer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesInformatica
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief OverviewHal Kalechofsky
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 

Mais procurados (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesCustomer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer Experiences
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 

Semelhante a Got data?… now what? An introduction to modern data platforms

Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day Peter Ward
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 

Semelhante a Got data?… now what? An introduction to modern data platforms (20)

Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Got data?… now what? An introduction to modern data platforms

  • 1. Got data… now what? An introduction to modern data architecture Susan Pierce Product Manager, Data Analytics Google Cloud
  • 2. ● Industry context ● History lesson: data warehouses and data lakes ● Data lakehouse ● Data mesh ● Data vault ● Considerations in determining a data strategy Topics
  • 3. Why you should care about data
  • 5. Organizations see data and AI/ML potential Technology is Altering the Way Organizations Operate but are struggling to make it a reality Big data/analytics 34% Artificial intelligence / machine learning 30% Cloud infrastructure Identity and access management (IAM) Cloud databases SaaS Internet of Things / M2M Security Orchestration Automation and Response (SOAR) Serverless computing Next-generation WiFi 30% 19% 17% 14% 11% 10% 9% 9% Q: Which of these technologies has the most potential to significantly alter the way your business operates over the next 3 to 5 years? CIO Magazine, Oct 2021 10% of organizations achieve significant financial benefits from AI Boston Consulting Group Are You Making the Most of Your Relationship with AI?, Oct 2020
  • 6. Challenge #1 Data is big and multi-format. ● Structured and unstructured ● Real-time streams and at-rest ● Across clouds and on-premise 181ZB expected by 2025 - STATISTA, FEB 2022
  • 7. Challenge #2 Data requires more than SQL. ● Machine learning & AI ● Stream analytics and events ● Data-driven applications 75% of enterprises will shift from piloting to operationalizing artificial intelligence Gartner ®, Streaming Analytics in the Cloud: A Comparative Analysis of Amazon, Microsoft and Google, Sumit Pal, Shaurya Rana, 14 December, 2021
  • 8. 73% of data leaders feel real-time access to data is extremely important Challenge #3 Data reaches everyone. ● Mission critical ● Accessed by everyone ● Shareable asset - HBR, 2021
  • 9. Proprietary + Confidential 68%of companies are unable to realize measurable value from data. More data copies Data is big and multi-format. Data requires more than SQL. Data reaches everyone. More tech islands More integrations High costs Low productivity Limited access More capacity More security risk More data silos Constant Capacity Planning Data unavailable Poor SLAs Unclear compliance Accenture, Closing the Data Value Gap, 2019
  • 10. Proprietary + Confidential The data warehouse ● Timeframe: 1980s/1990s ● Purpose: Bridge the gap between operational data and business intelligence ● Data type(s): Structured, cleaned data with a known schema. Operational data is aggregated, cleaned/pre-processed, and inserted into the data warehouse in batches ● Good for: Forcing consistency (quality and integrity), querying known dimensions, large queries ● Typical users: “The Business” ● Other info: Can store data from multiple source databases and don’t require a strict 1-1 mapping with transactional databases ● A data mart is a smaller, application-specific data warehouse, usually tied to specific team or line of business (example: marketing specific data)
  • 11. Proprietary + Confidential Datamarts (logical) Data warehouse - single integrated repository of atomic data Inmon – Enterprise Data Warehouse Kimball – Dimensional Data Warehouse Data sources ETL EDW Datamart Datamart Datamart Application Application Data sources ETL EDW BI tool BI tool Application
  • 12. Proprietary + Confidential Legacy systems bust customer budgets and it’s a hassle to renew the license. Maintaining the operations of EDW takes too much time. Doesn’t support machine learning and AI initiatives as well as streaming use cases. Data is not fresh or current enough. Systems can’t keep up with forecasted usage and data growth. It’s hard to scale on compute or storage on-demand. Cost challenges Modernization challenges Scaling challenges Data freshness Data warehouses are often painful to manage
  • 13. Proprietary + Confidential BigQuery Architecture NDA SQL:2011 Compliant Petabit Network High-Available Cluster Compute (Dremel) Streaming Ingest Free Bulk Loading Replicated, Distributed Storage (high durability) REST API Client Libraries In 7 languages Web UI, CLI Distributed Memory Shuffle Tier Decoupled storage and compute for maximum flexibility. Storage API BigQuery BI Engine Compute (Stateful workers) BQML
  • 14. Proprietary + Confidential The data lake ● Timeframe: 2000s/2010s ● Purpose: Storage for data of any type in its native format without pre-processing ● Data type(s): Structured, unstructured, semi-structured in a flat architecture, ingested by streaming, micro batch, or batch ● Good for: Flexibility across use cases, high volumes of data, data exploration, granular/low level data ● Typical users: Data scientists ● Other info: Optimized for lower storage cost (cheaper hardware, open source tools), and considered highly configurable because data is not restricted to a set schema ● Data lakes are usually queried in a programmatic fashion due to the large amounts of high-variability data they contain
  • 15. Business applications Operational databases Documents Log data Object Storage Metadata Storage (Catalog) Replicated data Governance Security Operations ETL Discovery Exploration Analytics tools Reporting/BI tools Data lake, a simplified view ETL APIs
  • 16. Proprietary + Confidential On-premises data lakes are struggling to deliver value Resource utilization and overall TCO of on-premises data lakes becomes unmanageable. Data governance and security issues open up compliance concerns. Resource intensive data and analytics processing can lead to missed SLAs. Analytics experimentation is slow due to resource provisioning time. TCO challenges Agility challenges Scaling challenges Governance challenges
  • 17. Data Warehouse ● Schema on write ● Difficult to change ● Structured data ● Strong business context ● Batch ingestion ● Inherent security ● Schema on read ● Easy to change ● Raw data ● Geared for exploration ● Supports streaming ● Difficult to govern Data Lake
  • 18. Proprietary + Confidential The big big data decision: data warehouse or data lake? Use case characteristics Understanding your business Data Warehouse (TB scale) Answer “known” questions Access “known” data Structured data SQL access and manipulation Data Lake (PB scale) Answer “unknown” questions Access “unknown” data Unstructured (raw) and structured data Code-involved access and exploration Exploring your business Data type and access Paper: Build a modern, unified analytics data platform; Blog: Bringing data lakes and data warehouses together
  • 19. Proprietary + Confidential Security Connecting Tools Standardization Financial Governance Metadata Security Governance Metadata Security Governance Metadata Security Governance Metadata Security Governance Databases Data Marts Data Lakes Data Warehouses Transactions Unstructured Files Logs LoB Specific Data Real-time App Data BI Machine Learning ETL/ ELT Tools Consumer App Data Silos With more data comes more responsibility
  • 20. Proprietary + Confidential 80% of analytics work is still descriptive” MIT, 2020 Maturity ● Teams: analysts vs. engineers ● Model: self service vs. centralized ● Technology: BI, AI, and data fabric Silos ● Multiple clouds, on-premises legacy ● No consistent governance ● Duplication: data and definitions 90% of employees say that their work is slowed by unreliable data sources” Dimensional Research, 2020 Complexity ● Volume, velocity, and variety ● Data marts, EDW, data lakes, and lake houses ● Experimentation and production 86% of analysts struggle with data that's out of date” Dimensional Research, 2020 Closing the data/value gap
  • 21. Proprietary + Confidential Empowering technology and people Data lakehouse removes overhead of data lakes and data warehouses Data warehouse gets the capabilities of the data lakes Data lake gets the capabilities of the data warehouses Provides: ● Multimodal data access with higher volumes of data ● Schema on read ● The governance that data lakes lacks but data warehouses provide Data mesh removes the organizational barriers becoming the bottleneck ● Emphasizes on data Domain/Team, then technology, ● Agile teams/more insights Teams own their data and technology ● Provides API access to others teams ● Decentralized raw and processed data Provides: ● Well defined, governed, and secured data meshes ● Still able to leverage several domains with no data movement Enabling data Enabling teams
  • 23. Proprietary + Confidential Data Warehouse / Data Lake Cloud Storage Object Storage (low-cost semi-unstructured data store) Structured Storage (highly optimized analytical store) Spark Streaming SQL Beam Users User Experience Batch Data lakehouse building blocks Consistent User experience Choice of processing and analytic engines Decoupled data storage Data Warehouse / Data Lake Cloud Storage Automated Data Discovery Unified Permissioning Integrated Data Catalog Centralized Management AI/ML
  • 24. Proprietary + Confidential Data Warehouse / Data Lake Cloud Storage Cloud Storage (low-cost semi-unstructured data store) BigQuery Storage (highly optimized analytical store) Dataproc Spark, Flink, Presto, Hive BigQuery SQL Data Fusion Beam Users User Experience Vertex AI Google Cloud Data lakehouse building blocks Consistent User experience Choice of processing and analytic engines Decoupled data storage Data Warehouse / Data Lake Cloud Storage Dataplex (Governance, Management) Data Catalog (Discovery, Search, Metadata) Centralized Management Dataflow …..
  • 27. Proprietary + Confidential Distributed ownership Federated data domain teams are responsible for maintaining their data and making it useful for others, and are provided the freedom to choose the best course of action. Focus on value of data Teams are incentivized to maximize the value of their data products while staying within policies, and making their data available and useful for others. Central support Central data platform team supports distributed data domains with tooling, processes, standards, and guardrails, and automation in policy enforcement. Data mesh enables a successful data culture Through the provision of:
  • 29. Five targeted outcomes 1. Value of data is measured and recognized 2. Teams empowered to generate value from data 3. No central bottleneck 4. Each domain equipped with relevant skills and knowledge to be successful 5. Distributed ownership for data governance
  • 30. Proprietary + Confidential Data mesh data architecture and capability model
  • 31. Proprietary + Confidential One way to build a data mesh
  • 33. ● The Data Model can grow by adding new links and from there new HUB/SAT ○ Flexible changes with less adjustments ETL and reporting (minimal impact analysis) ○ Parallel loading ○ Minimal regression testing What is Data Vault? “add-only” model Rules ● No direct connections between Hubs ● Hubs don’t reference other entities ● Links reference Hubs ● Satellites reference Hubs or Links
  • 34. Data Vault modelling - how does it work? A Data Vault model consists of 3 basic entity types ○ The Hub is a business object and separates the business keys from the rest of the model. No data (!), no relationships -> no reasons to change. ○ The Link stores relationships between hubs (using the business keys). It is modelled as a many-to-many relationship. ○ Satellites store the context (the attributes of a business key or relationship). Can be added without changing Hubs and Links.
  • 35. Identity and Access Management Scheduling, Logging & Monitoring Version Control, Continuous Integration Source Systems ELT SQL SQL SQL Check datatypes Semantic integration hard business rules Soft business rules Flexible require- ments Driven business domains Staging Raw Vault Business Vault Information Mart Presentation Layer Public data Other clouds sales customer supplier Data Lake Relational databases, Legacy systems Streaming data No SQL Databases Staging area Raw vault Data vault Operational vault Metrics vault Metadata Data Governance, Data Life Cycle Management Information marts Report collection Meta Mart Metrics Mart Error Mart Sheets Flat files Cubes
  • 37. Proprietary + Confidential Paper: Build a modern distributed Data Mesh with Google Cloud Blog: Building a Unified Analytics Data Platform Paper: Build a modern, unified analytics data platform with Google Cloud Blog: Data lake and data warehouse convergence Paper: Converging Architectures: Bringing Data Lakes and Data Warehouses Together Blog: Data driven transformation using Google's unified analytics platform Paper: What type of data processing organization are you? Blog: Open data lakehouse on Google Cloud Paper: Building a data lakehouse on Google Cloud Platform Blog: Building the data science driven organization Blog: Building the data engineering driven organization Blog: Building the data analyst driven organization from the first principles Blog: Announcing BigQuery Migration Service Further reading
  • 39. Proprietary + Confidential How do you know where to start?