7. Data Warehouse
• Store data from multiple data sources
• Used for historical and trend analysis reporting
• Central repository for many subject areas
• Contains single version of truth
• Not to be used for OLTP applications
Main features
• Reduce stress on production systems
• Optimized for read access
• Keep historical records
• Restructure / rename tables, fields, model data
• Protect against source system upgrades
• Use Master Data Management
• Easy to create BI solutions on top of it (e.g.
Azure Analysis Services Cubes)
• Reduces security access to multiple production
systems
Use cases
12. Azure Data Lake Storage
• Petabyte scale storage
• Hierarchical namespace
• Hadoop compatible access with ABFS
driver
Main features
• Use Service Principles
• Use Security Groups over individual
users
• Enable Gen 2 firewall with Azure
services access
ADLS best practices
13. Azure Data Factory
• Cloud ETL service
• Scale-out serverless data integration & data
transformation
• Code-free UI
• Monitoring & Management
Main features
21. Best in class price
per performance
Developer
productivity
Workload aware
query execution
Data flexibility
Up to 94% less expensive
than competitors
Manage heterogenous
workloads through
workload priorities and
isolation
Ingest variety of data
sources to derive the
maximum benefit.
Query all data.
Use preferred tooling for
SQL data warehouse
development
Industry-leading
security
Defense-in-depth
security and 99.9%
financially backed
availability SLA
Azure Synapse – SQL Analytics
focus areas
Credits: James Serra
22. Data Lake with Data Warehouse use cases
• Data scientists/power users
• Batch processing
• Data refinement / cleaning
• ETL workloads
• Store older / backup data
• Sandbox for data exploration
• One-time reports
• Quick access to data
• Don’t know questions
Data Lake – staging & preparation
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc queries
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Self-service BI
• Known questions
Data Warehouse – Serving, security &
Compliance
23. Data Lakehouse concerns / limitations
• Reliability – keeping data lake and warehouse
consistent
• Data staleness – older data in warehouse
• Limited support for advanced analytics – Top ML
systems don’t work well on warehouses
• TCO – extra cost for data copies in warehouse
Problems
• Speed – RDBMS faster, especially MPP
• Security - No RLS, column-level, dynamic data
masking
• Complexity – metadata separate from data, file
based
• Missing features – referential integrity, TDE,
workload management, many features require
Spark lockin
Concerns wrt RDBMS
25. @nileshgule
References
Data Warehouse
❖ Why you need a data warehouse
❖ James Serra website
Azure
❖ ADLS Gen 2 Storage Account
❖ Azure Data Factory
❖ Azure Databricks
❖ Azure Data Factory Mapping Data Flows
❖ Azure Synapse Analytics
❖ Azure Synapse Link for SQL
Delta Lake & Data Mesh
❖ Databricks Deltalake
❖ Delta
❖ Data Mesh Architecture
❖ ThoughtWorks – Data Mesh
26. Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://bit.ly/youtube-nileshgule