SlideShare uma empresa Scribd logo
1 de 13
ETL Technologies
Gaurav Bhatnagar
Draft v.01
Introduction
2
ETL(EXTRACT, TRANSFORM, LOAD)
• The ETL process collects the raw data from various data sources (your CRM, ad accounts, ERP,
email servers, …) and saves them to the staging area.
• Before data can be loaded in the target data warehouse or database of your choice, the data
undergoes extensive transformations.
• Depending on your business logic, you might mask sensitive personal information, remove
outliers, or aggregate metrics to make your analysts’ life easier, before finally loading data into
the data storage.
ELT(EXTRACT, LOAD, TRANSFORM)
• A variant of ETL wherein the extracted data is first loaded into the target system.
• Transformations are performed after the data is loaded into the data warehouse.
• ELT typically works well when the target system is powerful enough to handle transformations.
Analytical
• Databases like Amazon Redshift and Google Big Query are often used in ELT pipelines because
they are highly efficient in performing transformations
ETL VS ELT
Workflow
3 ETL VS ELT
4
Differences
ETL ELT
1) Support for Data Warehouse Yes, ETL is the traditional process for
transforming and integrating
structured or relational data into a
cloud-based or on-premises data
warehouse.
Yes, ELT is the modern process for
transforming and integrating
structured or unstructured data into a
cloud-based data warehouse.
2) Support for Data
Lake/Mart/Lakehouse
No, ETL is not an appropriate
process for data lakes, data marts or
data lakehouses.
Yes, the ELT process is tailored to
provide a data pipeline for data lakes,
data marts or data lakehouses.
3) Size/type of data set ETL is most appropriate for
processing smaller, relational data
sets which require complex
transformations and have been
predetermined as being relevant to
the analysis goals.
ELT can handle any size or type of
data and is well suited for processing
both structured and unstructured big
data. Since the entire data set is
loaded, analysts can choose at any
time which data to transform and use
for analysis.
ETL VS ELT
5
Differences
ETL ELT
4) Implementation The ETL process has been around
for decades and there is a mature
ecosystem of ETL tools and experts
readily available to help with
implementation.
The ELT process is a newer
approach and the ecosystem of tools
and experts needed to implement it
is still growing.
5) Transformation In the ETL process, data
transformation is performed in a
staging area outside of the data
warehouse and the entire data must
be transformed before loading. As a
result, transforming larger data sets
can take a long time up front but
analysis can take place immediately
once the ETL process is complete.
In the ELT process, data
transformation is performed on an
as-needed basis in the target system
itself. As a result, the transformation
step takes little time but can slow
down the querying and analysis
processes if there is not sufficient
processing power.
ETL VS ELT
6
Differences
ETL VS ELT
ETL ELT
6. Loading The ETL loading step requires data to be
loaded into a staging area before being loaded
into the target system. This multi-step process
takes longer than the ELT process
In ELT, the full data set is loaded directly into
the target system. Since there is only one
step, and it only happens one time, loading in
the ELT process is faster than ETL.
7) Cost ETL can be cost-prohibitive for many small and
medium businesses.
ELT benefits from a robust ecosystem of
cloud-based platforms which offer much
lower costs and a variety of plan options to
store and process data.
8) Compliance ETL is better suited for compliance with GDPR,
HIPAA, and CCPA standards given that users
can omit any sensitive data prior to loading in
the target system.
ELT carries more risk of exposing private
data and not complying with GDPR, HIPAA,
and CCPA standards given that all data is
loaded into the target system.
7
Use Case
ETL VS ELT
ETL ELT
TRANSFORM TECHNOLOGIES Scripting languages, SQL
procedures
Data warehouse specific solutions
PHYSICAL SPACE REQUIRED TO
STORE DATA
Lower Higher
MATURITY Tested and proven Novel and (sometimes)
experimental
ENGINEERING EXPERTISE
REQUIRED
Medium High
DATA TYPE All, but best for structured (relational)
data
All, but excels at unstructured data
PROS Simpler to deploy and maintain. A lot
of (human and technical) resources
available.
Can handle massive amounts of
data. Best for unstructured data.
CONS Scaling - Becomes increasingly more
complex for large data deployments.
Needs a higher level of expertise to
deploy and maintain. Edge cases
are not always polished for reliability
AWS GLU
ETL VS ELT
 AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move,
and integrate data from multiple sources. You can use it for analytics, machine learning, and application
development. It also includes additional productivity and data ops tooling for authoring, running jobs, and
implementing business workflows.
 With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a
centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to
load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena,
Amazon EMR, and Amazon Redshift Spectrum.
AWS GLUE COMPONENTS::
 AWS Glue console
 AWS Glue Data Catalog
 AWS Glue crawlers and classifiers
 AWS Glue crawlers and classifiers
 AWS Glue ETL operations
 Streaming ETL in AWS Glue
 The AWS Glue jobs system
8
DATA FLOW IN AWS GLU
ETL VS ELT
9
AZURE DATA FACTORY
ETL VS ELT
10
 Azure Data Factory falls under the identify domain of Services in the SEO(Search Engine
Optimization )catalog, and it’s a cloud based integration service.
 Basically it works on data .It Orchestrates and automates the movement or transformation
of data.
 As data is coming from a number of different products ,to analyze and store all this data we
need a power full tool ,so Azure data factory will Help us
How ADF Will Help us??
 Storing the data with the help of Azure Data Lake storage
 Analyzing the data
 Transforming the data with the help of pipelines
 Publishing the Organized data
 Visualizing the data with third party applications like Apache spark and Hadoop
ETL VS ELT
11
FLOW PROCESS OF DATA FACTORY
BUSINEES OBJECT DATA SERVICES(BODS)
ETL VS ELT
12
 SAP BODS is an ETL tool for extracting data from disparate systems, transform data into
meaningful information, and load data in a data warehouse. It is designed to deliver enterprise-
class solutions for data integration, data quality, data processing and data profiling. The full form of
SAP BODS is Business Objects Data Services.
• Repository, Management Console, Designer, Job Server, Access Server, are important
components of SAP BODS Architecture
• SAP Business Objects offers better profiling because of too many acquisitions of other companies.
Conclusion
We need to look at business/ technical problems , What would be our reference
data model architecture and then come up with roadmaps towards the same.
• ETL is best suited for fast analytics in smaller-to-medium data environments,
where the source data and data operations are well-controlled and do not
evolve constantly (do not need flexibility).
• ELT, in contrast, is best suited for working with semi-structured or
unstructured data, in big data environments, where the changing data
operation requirements foresee a lot of needed flexibility.

Mais conteúdo relacionado

Mais procurados

Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 

Mais procurados (20)

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 

Semelhante a ETL Technologies.pptx

What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...
kzayra69
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 

Semelhante a ETL Technologies.pptx (20)

Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
What is ETL and Zero ETL | Extract, Transform, Load
What is ETL and Zero ETL | Extract, Transform, LoadWhat is ETL and Zero ETL | Extract, Transform, Load
What is ETL and Zero ETL | Extract, Transform, Load
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become Obsolete
 
Data junction tool
Data junction toolData junction tool
Data junction tool
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

ETL Technologies.pptx

  • 2. Introduction 2 ETL(EXTRACT, TRANSFORM, LOAD) • The ETL process collects the raw data from various data sources (your CRM, ad accounts, ERP, email servers, …) and saves them to the staging area. • Before data can be loaded in the target data warehouse or database of your choice, the data undergoes extensive transformations. • Depending on your business logic, you might mask sensitive personal information, remove outliers, or aggregate metrics to make your analysts’ life easier, before finally loading data into the data storage. ELT(EXTRACT, LOAD, TRANSFORM) • A variant of ETL wherein the extracted data is first loaded into the target system. • Transformations are performed after the data is loaded into the data warehouse. • ELT typically works well when the target system is powerful enough to handle transformations. Analytical • Databases like Amazon Redshift and Google Big Query are often used in ELT pipelines because they are highly efficient in performing transformations ETL VS ELT
  • 4. 4 Differences ETL ELT 1) Support for Data Warehouse Yes, ETL is the traditional process for transforming and integrating structured or relational data into a cloud-based or on-premises data warehouse. Yes, ELT is the modern process for transforming and integrating structured or unstructured data into a cloud-based data warehouse. 2) Support for Data Lake/Mart/Lakehouse No, ETL is not an appropriate process for data lakes, data marts or data lakehouses. Yes, the ELT process is tailored to provide a data pipeline for data lakes, data marts or data lakehouses. 3) Size/type of data set ETL is most appropriate for processing smaller, relational data sets which require complex transformations and have been predetermined as being relevant to the analysis goals. ELT can handle any size or type of data and is well suited for processing both structured and unstructured big data. Since the entire data set is loaded, analysts can choose at any time which data to transform and use for analysis. ETL VS ELT
  • 5. 5 Differences ETL ELT 4) Implementation The ETL process has been around for decades and there is a mature ecosystem of ETL tools and experts readily available to help with implementation. The ELT process is a newer approach and the ecosystem of tools and experts needed to implement it is still growing. 5) Transformation In the ETL process, data transformation is performed in a staging area outside of the data warehouse and the entire data must be transformed before loading. As a result, transforming larger data sets can take a long time up front but analysis can take place immediately once the ETL process is complete. In the ELT process, data transformation is performed on an as-needed basis in the target system itself. As a result, the transformation step takes little time but can slow down the querying and analysis processes if there is not sufficient processing power. ETL VS ELT
  • 6. 6 Differences ETL VS ELT ETL ELT 6. Loading The ETL loading step requires data to be loaded into a staging area before being loaded into the target system. This multi-step process takes longer than the ELT process In ELT, the full data set is loaded directly into the target system. Since there is only one step, and it only happens one time, loading in the ELT process is faster than ETL. 7) Cost ETL can be cost-prohibitive for many small and medium businesses. ELT benefits from a robust ecosystem of cloud-based platforms which offer much lower costs and a variety of plan options to store and process data. 8) Compliance ETL is better suited for compliance with GDPR, HIPAA, and CCPA standards given that users can omit any sensitive data prior to loading in the target system. ELT carries more risk of exposing private data and not complying with GDPR, HIPAA, and CCPA standards given that all data is loaded into the target system.
  • 7. 7 Use Case ETL VS ELT ETL ELT TRANSFORM TECHNOLOGIES Scripting languages, SQL procedures Data warehouse specific solutions PHYSICAL SPACE REQUIRED TO STORE DATA Lower Higher MATURITY Tested and proven Novel and (sometimes) experimental ENGINEERING EXPERTISE REQUIRED Medium High DATA TYPE All, but best for structured (relational) data All, but excels at unstructured data PROS Simpler to deploy and maintain. A lot of (human and technical) resources available. Can handle massive amounts of data. Best for unstructured data. CONS Scaling - Becomes increasingly more complex for large data deployments. Needs a higher level of expertise to deploy and maintain. Edge cases are not always polished for reliability
  • 8. AWS GLU ETL VS ELT  AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use it for analytics, machine learning, and application development. It also includes additional productivity and data ops tooling for authoring, running jobs, and implementing business workflows.  With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. AWS GLUE COMPONENTS::  AWS Glue console  AWS Glue Data Catalog  AWS Glue crawlers and classifiers  AWS Glue crawlers and classifiers  AWS Glue ETL operations  Streaming ETL in AWS Glue  The AWS Glue jobs system 8
  • 9. DATA FLOW IN AWS GLU ETL VS ELT 9
  • 10. AZURE DATA FACTORY ETL VS ELT 10  Azure Data Factory falls under the identify domain of Services in the SEO(Search Engine Optimization )catalog, and it’s a cloud based integration service.  Basically it works on data .It Orchestrates and automates the movement or transformation of data.  As data is coming from a number of different products ,to analyze and store all this data we need a power full tool ,so Azure data factory will Help us How ADF Will Help us??  Storing the data with the help of Azure Data Lake storage  Analyzing the data  Transforming the data with the help of pipelines  Publishing the Organized data  Visualizing the data with third party applications like Apache spark and Hadoop
  • 11. ETL VS ELT 11 FLOW PROCESS OF DATA FACTORY
  • 12. BUSINEES OBJECT DATA SERVICES(BODS) ETL VS ELT 12  SAP BODS is an ETL tool for extracting data from disparate systems, transform data into meaningful information, and load data in a data warehouse. It is designed to deliver enterprise- class solutions for data integration, data quality, data processing and data profiling. The full form of SAP BODS is Business Objects Data Services. • Repository, Management Console, Designer, Job Server, Access Server, are important components of SAP BODS Architecture • SAP Business Objects offers better profiling because of too many acquisitions of other companies.
  • 13. Conclusion We need to look at business/ technical problems , What would be our reference data model architecture and then come up with roadmaps towards the same. • ETL is best suited for fast analytics in smaller-to-medium data environments, where the source data and data operations are well-controlled and do not evolve constantly (do not need flexibility). • ELT, in contrast, is best suited for working with semi-structured or unstructured data, in big data environments, where the changing data operation requirements foresee a lot of needed flexibility.

Notas do Editor

  1. We are building up a base of integrated expertise by data transfer. key use case example using the warehouse of information sharing we can easily reuse the data into other systems their by enabling collaboration with various ecosystems empowering customer centric thinking with southern water IT landscape . To do this digital first is the key
  2. Essentially, we are trying to establish an echo system with fundament for sharing data