SlideShare uma empresa Scribd logo
1 de 19
Kettle – ETL Tool
Sreenivas K
Agenda

Introduction
− ETL Process
− Pentaho's Kettle

Data Integration Challenges

Prerequisites and Recent Releases

Pentaho DI Components

Spoon
− Transformations
− Jobs
Introduction – ETL Process

Major Components
− Extracting

Gathering raw data from source systems and storing it in ETL staging
environment

Data Profiling

Identifying data that changed since last load.
− Transforming- Cleaning and Conforming

Processing data to improve its quality, format it, merge from multiple
sources, enforce conformed dimensions

Data cleansing

Recording error events

Audit dimensions

Creating and maintaining conformed dimensions and facts
Introduction – ETL Process
− Loading

Loading data into data warehouse tables

Managing hierarchies in dimensions

Managing special dimensions such as date and time, junk, mini, shrunken,
small static, and user-maintained dimensions

Fact table loading

Building and maintaining bridge dimension tables

Handling late arriving data

Management of conformed dimensions

Administration of fact tables

Building aggregations

Building OLAP cubes

Transferring DW data to other environment for specific purposes
Data Transformation and
Integration Examples

Data filtering
− Is not null, greater than, less than, includes

Field manipulation
− Trimming, padding, upper and lowercase conversion

Data calculations
− + - X / , average, absolute value, arctangent, natural logarithm

Date manipulation
− First day of month, Last day of month, add months, week of year, day of year

Data type conversion
− String to number, number to string, date to number

Merging fields & splitting fields

Looking up date
− Look up in a database, in a text file, an excel sheet, …
Introduction – Pentaho Kettle

Kettle – Kettle Extraction Transformation Transportation &
Loading tool

Its open source business intelligence suite for powerful
data integration by Pentaho. Founded in 2004.

Products of Pentaho
− Mondrain – OLAP server written in Java
− Kettle – ETL tool
Data Integration - Challenges

Data is everywhere

Data is inconsistent
− Records are different in each system

Performance issues
− Running queries to summarize data for stipulated
long period takes operating system for task

Data is never all in Data Warehouse
− Excel sheet, acquisition, new application
Prerequisites Recent Releases

Java Runtime Environment
1.5 and above

Compatible with almost any
platform

Compatible with wide range
of Databases technologies.

4/25 Data Integration 3.0.3 GA

4/18 Data Integration 3.1 Milestone

2/8 Data Integration 3.0.2 GA

12/12 Data Integration 3.0.1 GA

11/15 Data Integration 3.0 GA

10/31 Data Integration 3.0 RC2

10/24 Data Integration 2.5.2 GA

10/08 Data Integration 3.0 RC1

08/24 Data Integration 2.5.1 GA
Pentaho Components

Spoon
− GUI that allows you to design transformations and jobs that can
be run with the Kettle tools — Pan and Kitchen
− Transformations and Jobs can describe themselves using an XML
file or can be put in a Kettle database repository.
− Spoon is available as executable script and batch file to make use
of tool in heterogeneous environment.

Pan
− A program to execute transformations designed by Spoon in XML or
database repository.
− Transformations are scheduled in batch mode to be run automatically at
regular intervals

Kitchen
− Execute jobs designed by Spoon in XML or database repository

Repository Connection establishment

Auto login
− By setting manually KETTLE_REPOSITORY,
KETTLE_USER and KETTLE_PASSWORD
environmental variables.

Login
− By default PDI provides login username and
password ad admin.

Transformation
− Value: Values are part of a row
and can contain any type of data
− Row: a row exists of 0 or more
values
− Output stream: an output
stream is a stack of rows that
leaves a step.
− Input stream: an input stream is
a stack of rows that enters a
step.
− Hop: A hop is a graphical
representation of one or more
data streams between 2 steps.
− Note: A note is a piece of
information that can be added to
a transformation
Engine capable of performing a
multitude of functions such as reading,
manipulating and writing data to and
from various data sources.

Jobs
− Job Entry: A job entry is
one part of a job and
performs a certain
− Hop: A hop is a graphical
representation of one or
more data streams
between 2 steps
− Note: a note is a piece of
information that can be added to
a job
A way of calling transformations and
controlling the sequence of their
execution. Usually jobs are
scheduled in batch mode to be run
automatically at regular intervals.
Input Steps
Output Steps
Lookup Steps
Transformation
Steps
Join Steps
DW Steps
Mapping Steps
Job Steps
ETL Tool Kettle - A Guide to Pentaho's Data Integration Software
ETL Tool Kettle - A Guide to Pentaho's Data Integration Software
ETL Tool Kettle - A Guide to Pentaho's Data Integration Software

Mais conteúdo relacionado

Mais procurados

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouseAbhi Bhardwaj
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonRoberto Espinosa
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapDavid Walker
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Denodo
 

Mais procurados (20)

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 

Destaque

Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSafe Software
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHeungsoon Yang
 
Pentaho ETL ハンズオン
Pentaho ETL ハンズオンPentaho ETL ハンズオン
Pentaho ETL ハンズオンTeruo Kawasaki
 
Open Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonOpen Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonRogue Wave Software
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoAshnikbiz
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL치민 최
 
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Channy Yun
 
빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)Channy Yun
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 

Destaque (13)

Pentaho PDI
Pentaho PDIPentaho PDI
Pentaho PDI
 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
 
vertica_tmp_4.5
vertica_tmp_4.5vertica_tmp_4.5
vertica_tmp_4.5
 
Penatho
PenathoPenatho
Penatho
 
Hire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI ToolsHire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI Tools
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
 
Pentaho ETL ハンズオン
Pentaho ETL ハンズオンPentaho ETL ハンズオン
Pentaho ETL ハンズオン
 
Open Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonOpen Source Reporting Tool Comparison
Open Source Reporting Tool Comparison
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL
 
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
 
빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 

Semelhante a ETL Tool Kettle - A Guide to Pentaho's Data Integration Software

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01jade_22
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfoliorolee23
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterJohn Leonard
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETLganblues
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffJeff McQuigg
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolioquerimit
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxcamyla81
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODINagendra K
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_ISRamesh Ch
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_ResumeAmit Kumar
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_ISRamesh Ch
 
Informatica overview
Informatica overviewInformatica overview
Informatica overviewSwetha Naveen
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconDataWorks Summit
 

Semelhante a ETL Tool Kettle - A Guide to Pentaho's Data Integration Software (20)

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot faster
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolio
 
ETL
ETL ETL
ETL
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
 
Data migration
Data migrationData migration
Data migration
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache Falcon
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

ETL Tool Kettle - A Guide to Pentaho's Data Integration Software

  • 1. Kettle – ETL Tool Sreenivas K
  • 2. Agenda  Introduction − ETL Process − Pentaho's Kettle  Data Integration Challenges  Prerequisites and Recent Releases  Pentaho DI Components  Spoon − Transformations − Jobs
  • 3. Introduction – ETL Process  Major Components − Extracting  Gathering raw data from source systems and storing it in ETL staging environment  Data Profiling  Identifying data that changed since last load. − Transforming- Cleaning and Conforming  Processing data to improve its quality, format it, merge from multiple sources, enforce conformed dimensions  Data cleansing  Recording error events  Audit dimensions  Creating and maintaining conformed dimensions and facts
  • 4. Introduction – ETL Process − Loading  Loading data into data warehouse tables  Managing hierarchies in dimensions  Managing special dimensions such as date and time, junk, mini, shrunken, small static, and user-maintained dimensions  Fact table loading  Building and maintaining bridge dimension tables  Handling late arriving data  Management of conformed dimensions  Administration of fact tables  Building aggregations  Building OLAP cubes  Transferring DW data to other environment for specific purposes
  • 5. Data Transformation and Integration Examples  Data filtering − Is not null, greater than, less than, includes  Field manipulation − Trimming, padding, upper and lowercase conversion  Data calculations − + - X / , average, absolute value, arctangent, natural logarithm  Date manipulation − First day of month, Last day of month, add months, week of year, day of year  Data type conversion − String to number, number to string, date to number  Merging fields & splitting fields  Looking up date − Look up in a database, in a text file, an excel sheet, …
  • 6. Introduction – Pentaho Kettle  Kettle – Kettle Extraction Transformation Transportation & Loading tool  Its open source business intelligence suite for powerful data integration by Pentaho. Founded in 2004.  Products of Pentaho − Mondrain – OLAP server written in Java − Kettle – ETL tool
  • 7. Data Integration - Challenges  Data is everywhere  Data is inconsistent − Records are different in each system  Performance issues − Running queries to summarize data for stipulated long period takes operating system for task  Data is never all in Data Warehouse − Excel sheet, acquisition, new application
  • 8. Prerequisites Recent Releases  Java Runtime Environment 1.5 and above  Compatible with almost any platform  Compatible with wide range of Databases technologies.  4/25 Data Integration 3.0.3 GA  4/18 Data Integration 3.1 Milestone  2/8 Data Integration 3.0.2 GA  12/12 Data Integration 3.0.1 GA  11/15 Data Integration 3.0 GA  10/31 Data Integration 3.0 RC2  10/24 Data Integration 2.5.2 GA  10/08 Data Integration 3.0 RC1  08/24 Data Integration 2.5.1 GA
  • 9. Pentaho Components  Spoon − GUI that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen − Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. − Spoon is available as executable script and batch file to make use of tool in heterogeneous environment.  Pan − A program to execute transformations designed by Spoon in XML or database repository. − Transformations are scheduled in batch mode to be run automatically at regular intervals  Kitchen − Execute jobs designed by Spoon in XML or database repository
  • 10.  Repository Connection establishment  Auto login − By setting manually KETTLE_REPOSITORY, KETTLE_USER and KETTLE_PASSWORD environmental variables.  Login − By default PDI provides login username and password ad admin.
  • 11.
  • 12.
  • 13.
  • 14.  Transformation − Value: Values are part of a row and can contain any type of data − Row: a row exists of 0 or more values − Output stream: an output stream is a stack of rows that leaves a step. − Input stream: an input stream is a stack of rows that enters a step. − Hop: A hop is a graphical representation of one or more data streams between 2 steps. − Note: A note is a piece of information that can be added to a transformation Engine capable of performing a multitude of functions such as reading, manipulating and writing data to and from various data sources.
  • 15.  Jobs − Job Entry: A job entry is one part of a job and performs a certain − Hop: A hop is a graphical representation of one or more data streams between 2 steps − Note: a note is a piece of information that can be added to a job A way of calling transformations and controlling the sequence of their execution. Usually jobs are scheduled in batch mode to be run automatically at regular intervals.
  • 16. Input Steps Output Steps Lookup Steps Transformation Steps Join Steps DW Steps Mapping Steps Job Steps