SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Designing a
Modern Data Warehouse
in Azure
Antonios
Chatzipavlis
Data Solutions
Consultant & Trainer
1988 Beginning of my professional career
1996 I started working with SQL Server 6.0
1998 Certified as MCSD (3rd in Greece)
1999 Became an MCT
2010 Microsoft MVP on Data Platform
Created www.sqlschool.gr
2012 Became MCT Regional Lead by Microsoft Learning
2013 Certified as MCSE : Data Platform and
MCSE : Business Intelligence
2016 Certified as MCSE: Data Management & Analytics
2018 Certified as MCSA : Machine Learning
Recertified as MCSE: Data Management & Analytics
• Articles
• SQL Server in Greek
• SQL Nights
• Webcasts
• SQL Server News
• Downloads
• Resources
What we are doing Follow us
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQLschool.gr Group
A community for
Greek professionals
who use the
Microsoft
Data Platform
Ask your question at help@sqlschool.gr
Explore
everything
PASS has
to offer
Free Online Resources
Newsletters
PASS.org
Get involved
Free online
webinar
events
Local user groups
around the world
Free 1-day local
training events
Online special
interest user
groups
Business analytics
training
bit.ly/AAB2019Evaluation
A data warehouse is a subject-oriented,
integrated, time-variant and
non-volatile collection
of data in support of management’s
decision making process.
WHAT IS A DATA WAREHOUSE?
TRADITIONAL DATA WAREHOUSE
SELF-SERVICE DATA WAREHOUSE
TRADITIONAL DW LIMITATIONS
Data
sources
User
Competition
Scaling up
Data
platforms
CURRENT DW CHALLENGES
Timeliness Flexibility Quality Findability
RECENT RESEARCH SURVEYS
of responders reports that
they will replace their
primary DW platform and
analytics tools within 3
years
+50%
The data tsunami
WHY YOU NEED A
MODERN DATA
WAREHOUSE
Customer
experience
Quality
assurance
Operational
efficiency
Innovation
THE CRITERIA
FOR SELECTING A
MODERN DW
Meets Current
and Future
Needs
ON-PREMISES
VS.
CLOUD DW
• Evaluating Time to Value
• Accounting for Storage and Computing Costs
• Sizing, Balancing and Tuning
• Considering Data Preparation and ETL Costs
• Cost of Specialized Business Analytic Tools
• Scaling and Elasticity
• Delays and Downtime
• Cost of Security Breaches
• Data Protection and Recovery
STEPS TO
GETTING STARTED
WITH CLOUD DW
• Evaluate your data warehousing needs.
• Migrate or start fresh.
• Establish success criteria.
• Evaluate cloud data warehouse solutions.
• Calculate your total cost of ownership.
• Set up a proof of concept (POC).
Azure Modern Data Warehouse
MODERN DATA WAREHOUSE
MODERN DATA WAREHOUSE IN AZURE
ADVANCED ANALYTICS ON BIG DATA
REAL-TIME ANALYTICS
SQL SERVER 2019 BIG DATA CLUSTERS
INGEST DATA
ADF
• PaaS
• Mapping Data Flow transform data (ETL)
• Copy Data tool easily copy from source
to destination
• Templates
• Any new project
• Converting SSIS packages
• Row by row ETL can be slower
• Data needs to be moved to Databricks –
limited by compute size
• Mapping Data flow takes time to startup
SSIS
• SSDT – Visual Studio
• Very popular product
• Used for on-prem ETL for may year
• Too big of an effort to migrate existing
packages
• Skillset staying on-prem
• Change to IR in ADF
• Row by row ETL can be slower
• Data need to moved to IR
• Limited by node size/number of SSIS IR
STORE DATA
ADLS Gen 2
• PaaS
• Best features of blob
storage
• Not all features are
available yet
• Some products not support
yet
• 5TB file size limit
Blob Storage
• PaaS
• Original storage
• Most popular
• Don’t use for new projects
• Account limit 2 PB for US
and Europe
• 4,75TB file size limit
SQL Server 2019 Big Data
Cluster
• IaaS
• Combines SQL Server
database engine, Spark,
HDFS (ADLS Gen2) into a
unified data platform
• Deployed as containers on
Kubernetes
• Polybase
• Hybrid cloud
• Data virtualization
• AI Platform
PREP DATA
Azure Databricks
• PaaS
• Processing massive
amounts of data
• Training & deploy
models
• Manage workflows
• Spark & notebooks
• Integration with
ADLS, SQL DW, PBI
• Writing Code
• High learning curve
Azure HDInsight
• PaaS
• Deploys &
provisions Apache
Hadoop clusters
• No integration with
SQL DW
• Always running and
incurring cost
• Hortonworks
merged with
Cloudera
Polybase & Stored
Procedures in SQL
DW
• IaaS
• T-SQL queries via
external tables
• Tuning queries
• Increase storage
space
PowerBI Dataflow
• PowerBI service
• Power Query
• Self-service data
prep
• Individual solution
• Small workloads
• Don’t use this to
replace a DW or
ADF
MODEL & SERVE DATA
Azure SQL DW
• PaaS
• Fully managed
petabyte scale
cloud DW
• Can scale compute
and storage
independently
• Can be paused
• MPP
Azure Analysis
Services
• PaaS
• Tabular model
• Fast queries
• High concurrency
• Semantic layer
• Vertical scale-out
• High availability
• Advanced time-
calculations
• Time to process
the cube
Azure SQL
Database
• PaaS
• Suitable for small
DW
• Size limits/tier
• Optimized for
OLTP
SQL Server in
VM
• IaaS
• MDX models
Cosmos DB
• PaaS
• Globally
distributed
• Multi-model
database service
• Spark to Cosmos
DB connector for
DW aggregations
ETL vs ELT
ETL ELT
Time – Load Uses staging area and system, extra time to load data All in one system, load only once
Time – Transformation
Need to wait, especially for big data sizes - as data grows,
transformation time increases
All in one system, speed is not dependent on data size
Time – Maintenance
High maintenance - choice of data to load and transform and
must do it again if deleted or want to enhance the main data
repository
Low maintenance - all data is always available
Implementation complexity At early stage, requires less space and result is clean
Requires in-depth knowledge of tools and expert design of the
main large repository
Analysis & Processing style
Based on multiple scripts to create the views - deleting view
means deleting data
Creating adhoc views - low cost for building and maintaining
Data limitation or restriction By presuming and choosing data a priori By HW (none) and data retention policy
DW Support
Prevalent legacy model used for on-premises and relational,
structured data
Tailored to using in scalable cloud infrastructure to support
structured, unstructured such
big data sources
Data Lake Support Not part of approach Enables use of lake with unstructured data supported
Usability Fixed tables, Fixed timeline, Used mainly by IT
Ad Hoc, Agility, Flexibility, Usable by everyone from developer to
citizen integrator
Cost-effective Not cost-effective for small and medium businesses
Scalable and available to all business sizes using online SaaS
solutions
LAMBDA ARCHITECTURE
LAMBDA
ARCHITECTURE IN
AZURE
COMMON
DATA MODEL
Antonios
Chatzipavlis
Data Solutions
Consultant & Trainer
./sqlschoolgr - ./groups/sqlschool
@antoniosch - @sqlschool
yt/c/SqlschoolGr
SQLschool.gr Group
Thank you!
A community for Greek professionals who use the Microsoft Data Platform
Copyright Š 2018 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

Mais conteĂşdo relacionado

Mais procurados

Mais procurados (20)

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure purview
Azure purviewAzure purview
Azure purview
 
SQL to Azure Migrations
SQL to Azure MigrationsSQL to Azure Migrations
SQL to Azure Migrations
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 

Semelhante a Designing a modern data warehouse in azure

How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 

Semelhante a Designing a modern data warehouse in azure (20)

Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 

Mais de Antonios Chatzipavlis

Mais de Antonios Chatzipavlis (20)

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Designing a modern data warehouse in azure

  • 1.
  • 2. Designing a Modern Data Warehouse in Azure
  • 3. Antonios Chatzipavlis Data Solutions Consultant & Trainer 1988 Beginning of my professional career 1996 I started working with SQL Server 6.0 1998 Certified as MCSD (3rd in Greece) 1999 Became an MCT 2010 Microsoft MVP on Data Platform Created www.sqlschool.gr 2012 Became MCT Regional Lead by Microsoft Learning 2013 Certified as MCSE : Data Platform and MCSE : Business Intelligence 2016 Certified as MCSE: Data Management & Analytics 2018 Certified as MCSA : Machine Learning Recertified as MCSE: Data Management & Analytics
  • 4. • Articles • SQL Server in Greek • SQL Nights • Webcasts • SQL Server News • Downloads • Resources What we are doing Follow us fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQLschool.gr Group A community for Greek professionals who use the Microsoft Data Platform Ask your question at help@sqlschool.gr
  • 5. Explore everything PASS has to offer Free Online Resources Newsletters PASS.org Get involved Free online webinar events Local user groups around the world Free 1-day local training events Online special interest user groups Business analytics training
  • 7. A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. WHAT IS A DATA WAREHOUSE?
  • 11. CURRENT DW CHALLENGES Timeliness Flexibility Quality Findability
  • 12. RECENT RESEARCH SURVEYS of responders reports that they will replace their primary DW platform and analytics tools within 3 years +50%
  • 13.
  • 15. WHY YOU NEED A MODERN DATA WAREHOUSE Customer experience Quality assurance Operational efficiency Innovation
  • 16. THE CRITERIA FOR SELECTING A MODERN DW Meets Current and Future Needs
  • 17. ON-PREMISES VS. CLOUD DW • Evaluating Time to Value • Accounting for Storage and Computing Costs • Sizing, Balancing and Tuning • Considering Data Preparation and ETL Costs • Cost of Specialized Business Analytic Tools • Scaling and Elasticity • Delays and Downtime • Cost of Security Breaches • Data Protection and Recovery
  • 18. STEPS TO GETTING STARTED WITH CLOUD DW • Evaluate your data warehousing needs. • Migrate or start fresh. • Establish success criteria. • Evaluate cloud data warehouse solutions. • Calculate your total cost of ownership. • Set up a proof of concept (POC).
  • 19. Azure Modern Data Warehouse
  • 24. SQL SERVER 2019 BIG DATA CLUSTERS
  • 25. INGEST DATA ADF • PaaS • Mapping Data Flow transform data (ETL) • Copy Data tool easily copy from source to destination • Templates • Any new project • Converting SSIS packages • Row by row ETL can be slower • Data needs to be moved to Databricks – limited by compute size • Mapping Data flow takes time to startup SSIS • SSDT – Visual Studio • Very popular product • Used for on-prem ETL for may year • Too big of an effort to migrate existing packages • Skillset staying on-prem • Change to IR in ADF • Row by row ETL can be slower • Data need to moved to IR • Limited by node size/number of SSIS IR
  • 26. STORE DATA ADLS Gen 2 • PaaS • Best features of blob storage • Not all features are available yet • Some products not support yet • 5TB file size limit Blob Storage • PaaS • Original storage • Most popular • Don’t use for new projects • Account limit 2 PB for US and Europe • 4,75TB file size limit SQL Server 2019 Big Data Cluster • IaaS • Combines SQL Server database engine, Spark, HDFS (ADLS Gen2) into a unified data platform • Deployed as containers on Kubernetes • Polybase • Hybrid cloud • Data virtualization • AI Platform
  • 27. PREP DATA Azure Databricks • PaaS • Processing massive amounts of data • Training & deploy models • Manage workflows • Spark & notebooks • Integration with ADLS, SQL DW, PBI • Writing Code • High learning curve Azure HDInsight • PaaS • Deploys & provisions Apache Hadoop clusters • No integration with SQL DW • Always running and incurring cost • Hortonworks merged with Cloudera Polybase & Stored Procedures in SQL DW • IaaS • T-SQL queries via external tables • Tuning queries • Increase storage space PowerBI Dataflow • PowerBI service • Power Query • Self-service data prep • Individual solution • Small workloads • Don’t use this to replace a DW or ADF
  • 28. MODEL & SERVE DATA Azure SQL DW • PaaS • Fully managed petabyte scale cloud DW • Can scale compute and storage independently • Can be paused • MPP Azure Analysis Services • PaaS • Tabular model • Fast queries • High concurrency • Semantic layer • Vertical scale-out • High availability • Advanced time- calculations • Time to process the cube Azure SQL Database • PaaS • Suitable for small DW • Size limits/tier • Optimized for OLTP SQL Server in VM • IaaS • MDX models Cosmos DB • PaaS • Globally distributed • Multi-model database service • Spark to Cosmos DB connector for DW aggregations
  • 29.
  • 30. ETL vs ELT ETL ELT Time – Load Uses staging area and system, extra time to load data All in one system, load only once Time – Transformation Need to wait, especially for big data sizes - as data grows, transformation time increases All in one system, speed is not dependent on data size Time – Maintenance High maintenance - choice of data to load and transform and must do it again if deleted or want to enhance the main data repository Low maintenance - all data is always available Implementation complexity At early stage, requires less space and result is clean Requires in-depth knowledge of tools and expert design of the main large repository Analysis & Processing style Based on multiple scripts to create the views - deleting view means deleting data Creating adhoc views - low cost for building and maintaining Data limitation or restriction By presuming and choosing data a priori By HW (none) and data retention policy DW Support Prevalent legacy model used for on-premises and relational, structured data Tailored to using in scalable cloud infrastructure to support structured, unstructured such big data sources Data Lake Support Not part of approach Enables use of lake with unstructured data supported Usability Fixed tables, Fixed timeline, Used mainly by IT Ad Hoc, Agility, Flexibility, Usable by everyone from developer to citizen integrator Cost-effective Not cost-effective for small and medium businesses Scalable and available to all business sizes using online SaaS solutions
  • 34. Antonios Chatzipavlis Data Solutions Consultant & Trainer ./sqlschoolgr - ./groups/sqlschool @antoniosch - @sqlschool yt/c/SqlschoolGr SQLschool.gr Group Thank you!
  • 35. A community for Greek professionals who use the Microsoft Data Platform Copyright Š 2018 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION