SlideShare uma empresa Scribd logo
1 de 50
Treasure Data Hands-On: Managing Slowly
Changing Dimensions Using TD Workflow
Agenda
● Introduction
● Treasure Data Workflow
● Overview of Slowly Changing Dimensions
● Window Functions
● Handling Type 2 SCDs using Treasure Data
Introduction
• Scott Mitchell
• Senior Solution Engineer
• Work with Enterprise clients to
maximize the activation of the
client data
• smitchell@treasure-data.com
Introduction
Treasure Data is a Customer Data Platform
“Customer Data Platform (CDP) is a marketer-based management system
that creates a persistent, unified customer database that is accessible to
other systems. Data is pulled from multiple sources, cleaned, and combined
to create a single customer view. This structured data is then made available
to other marketing systems. CDP provides real-time segmentation for
sophisticated personalized marketing.”
https://en.wikipedia.org/wiki/Customer_Data_Platform
Our Customer Data Platform: Foundation
Data Management
1st party data
(Your data)
● Web
● Mobile
● Apps
● CRMs
● Offline
2nd & 3rd party DMPs
(enrichment)
Tool Integration
● Campaigns
● Advertising
● Social media
● Reporting
● BI & data
science
ID Unification
Persistent Storage
Workflow Orchestration
ActivationAll Your Data
Segmentation
Profiles Segments
Measurement
Treasure Data Workflow
DATA ORCHESTRATION AND WORKFLOW MANAGEMENT
•Workflow management across data input, processing and output
•Supports both scheduled & trigger-based execution
•Cloud-based and Client-hosted. Client-hosted version can run custom code.
•Cloud-based version has both web UI & REST API
The core engine is built on our open source project
Digdag
Treasure Workflow allow users to build repeatable data processing pipelines that consist of
Treasure Data jobs.
Overview
Why use Treasure Workflow?
1. Enhanced Organization
• Organize your processing workflows into groups of similarly-purposed tasks
2. Reduce Errors
• No longer must manage dependencies by scheduled-time alone
3. Ease Error Handling
• Split large scripts & queries into smaller, more manageable, jobs
4. Improve Collaboration
• Organize your job flows into projects
Benefits
WORKFLOW DEFINITION: CLOSER LOOK
timezone: Asia/Tokyo
schedule:
daily>: 07:00:00
_export:
td:
database: nishi
+load:
td_load>: import/s3_load.yml
database: nishi
table: monthly_goods_sales
+daily:
td>: queries/daily_open.sql
create_table: daily_open
+monthly:
td>: queries/monthly_open.sql
result_connection: nishi_s3
result_settings:
bucket: nishitetsu-test
path: /monthly_open.csv
•File extension should be “.dig” ‘to be
recognized as workflow
•Standard YAML
•Task names are prefixed by “+”
•Operators are postfixed by “>”
•Schedules can be set with schedule
•Variables are supported via ${variable_name}
REPRESENTATIVE OPERATORS
Category Name Description
Control Flow
call>: Call another workflow
loop>: Repeat tasks a specified # of times
for_each>: Loop through a specified list
if>: if/else control flow
Treasure Data
td>: Run a specified TD query
td_run>: Run a saved query
td_ddl>: Create, delete, rename, truncate tables
td_load>: Invoke an input data transfer
td_for_each>: Loop through a query result row by row
AWS
s3_wait>: Wait for new files in S3 & download
redshift>: Run Redshift query
redshift_load>: Load data into Redshift
redshift_unload>: Unload data from Redshift
Google Cloud Platform
bq>: Run BigQuery query
bq_extract>: Unload data from BigQuery to GCS
Slowly Changing
Dimensions
Slowly Changing Dimensions
• Particular dimensions within a dataset that are prone to change
unpredictably
• Example: the phone number or email field of a CRM dataset
• Data available from a CRM usually represents the current, up-to-date value
of each field for each customer
• Storing a history this customer data requires managing these slowly
changing dimensions (SCDs)
Different Ways to Handle SCDs
• Type 1
• Type 2
• Type 3
• Type 4
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 1:
company_id company_name company_state
123 Sterling Cooper California
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 3: Store the latest two values in one row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 3:
company_id company_name company_state_current company_state_previous
123 Sterling Cooper California New York
Type 4: Use a separate history table
SCD Type 4:
company_id company_name company_state
123 Sterling Cooper California
company
company_id company_name company_state last_modified_date
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company_history
Window Functions
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Window Functions
• Window functions perform calculations across rows of the query result
• They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause
• They are written in the ‘SELECT’ clause and display results in their own
column
• They have three parts:
Window Functions
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC)
ordering specificationfunction partition specification
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 2
123 Sterling Cooper New York 2007-06-19 1
124 CGC Connecticut 2018-05-22 2
124 CGC New York 2010-08-22 1
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Window Functions
SELECT
company_id,
company_name,
company_state,
CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
target_company
Thank You
And
Questions
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
2. Delete from the data lake any current rows that have a matching id in the new data
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
3. Insert the temp rows into the target table
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
Contact Information
• Scott Mitchell
• Senior Solution Engineer
• smitchell@treasure-data.com

Mais conteúdo relacionado

Mais procurados

SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Processsilvaft
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingDunn Solutions Group
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
04. Logical Data Definition template
04. Logical Data Definition template04. Logical Data Definition template
04. Logical Data Definition templateAlan D. Duncan
 
Master Your Data. Master Your Business
Master Your Data. Master Your BusinessMaster Your Data. Master Your Business
Master Your Data. Master Your BusinessDLT Solutions
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptCanara bank
 
Row-level security and Dynamic Data Masking
Row-level security and Dynamic Data MaskingRow-level security and Dynamic Data Masking
Row-level security and Dynamic Data MaskingSolidQ
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data ArchitectureSammer Qader
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingDATAVERSITY
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata StrategiesDATAVERSITY
 
Difference between fact tables and dimension tables
Difference between fact tables and dimension tablesDifference between fact tables and dimension tables
Difference between fact tables and dimension tablesKamran Haider
 

Mais procurados (20)

SAP BI Requirements Gathering Process
SAP BI Requirements Gathering ProcessSAP BI Requirements Gathering Process
SAP BI Requirements Gathering Process
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
BI Business Requirements - A Framework For Business Analysts
BI Business Requirements -  A Framework For Business AnalystsBI Business Requirements -  A Framework For Business Analysts
BI Business Requirements - A Framework For Business Analysts
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
04. Logical Data Definition template
04. Logical Data Definition template04. Logical Data Definition template
04. Logical Data Definition template
 
Master Your Data. Master Your Business
Master Your Data. Master Your BusinessMaster Your Data. Master Your Business
Master Your Data. Master Your Business
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.ppt
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
Row-level security and Dynamic Data Masking
Row-level security and Dynamic Data MaskingRow-level security and Dynamic Data Masking
Row-level security and Dynamic Data Masking
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data Modeling
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata Strategies
 
Difference between fact tables and dimension tables
Difference between fact tables and dimension tablesDifference between fact tables and dimension tables
Difference between fact tables and dimension tables
 

Semelhante a Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfoliogeometro17
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus workKevinPSF
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQLHarshit Parekh
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfoliopbxavier
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSDmitry Anoshin
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfalokindustries1
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cRachelBarker26
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementAhmed Elshayeb
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Sergii Khomenko
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptxMullaMainuddin
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationJerod Johnson
 

Semelhante a Hands-On: Managing Slowly Changing Dimensions Using TD Workflow (20)

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfolio
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQL
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfolio
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19c
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Sql Portfolio
Sql PortfolioSql Portfolio
Sql Portfolio
 
Df12 Performance Tuning
Df12 Performance TuningDf12 Performance Tuning
Df12 Performance Tuning
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order Management
 
Advanced Relevancy Ranking
Advanced Relevancy RankingAdvanced Relevancy Ranking
Advanced Relevancy Ranking
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
 

Mais de Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 

Mais de Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 

Último

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

  • 1. Treasure Data Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
  • 2. Agenda ● Introduction ● Treasure Data Workflow ● Overview of Slowly Changing Dimensions ● Window Functions ● Handling Type 2 SCDs using Treasure Data
  • 3. Introduction • Scott Mitchell • Senior Solution Engineer • Work with Enterprise clients to maximize the activation of the client data • smitchell@treasure-data.com
  • 4. Introduction Treasure Data is a Customer Data Platform “Customer Data Platform (CDP) is a marketer-based management system that creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned, and combined to create a single customer view. This structured data is then made available to other marketing systems. CDP provides real-time segmentation for sophisticated personalized marketing.” https://en.wikipedia.org/wiki/Customer_Data_Platform
  • 5. Our Customer Data Platform: Foundation Data Management 1st party data (Your data) ● Web ● Mobile ● Apps ● CRMs ● Offline 2nd & 3rd party DMPs (enrichment) Tool Integration ● Campaigns ● Advertising ● Social media ● Reporting ● BI & data science ID Unification Persistent Storage Workflow Orchestration ActivationAll Your Data Segmentation Profiles Segments Measurement
  • 7. DATA ORCHESTRATION AND WORKFLOW MANAGEMENT •Workflow management across data input, processing and output •Supports both scheduled & trigger-based execution •Cloud-based and Client-hosted. Client-hosted version can run custom code. •Cloud-based version has both web UI & REST API The core engine is built on our open source project Digdag
  • 8. Treasure Workflow allow users to build repeatable data processing pipelines that consist of Treasure Data jobs. Overview
  • 9. Why use Treasure Workflow? 1. Enhanced Organization • Organize your processing workflows into groups of similarly-purposed tasks 2. Reduce Errors • No longer must manage dependencies by scheduled-time alone 3. Ease Error Handling • Split large scripts & queries into smaller, more manageable, jobs 4. Improve Collaboration • Organize your job flows into projects Benefits
  • 10. WORKFLOW DEFINITION: CLOSER LOOK timezone: Asia/Tokyo schedule: daily>: 07:00:00 _export: td: database: nishi +load: td_load>: import/s3_load.yml database: nishi table: monthly_goods_sales +daily: td>: queries/daily_open.sql create_table: daily_open +monthly: td>: queries/monthly_open.sql result_connection: nishi_s3 result_settings: bucket: nishitetsu-test path: /monthly_open.csv •File extension should be “.dig” ‘to be recognized as workflow •Standard YAML •Task names are prefixed by “+” •Operators are postfixed by “>” •Schedules can be set with schedule •Variables are supported via ${variable_name}
  • 11. REPRESENTATIVE OPERATORS Category Name Description Control Flow call>: Call another workflow loop>: Repeat tasks a specified # of times for_each>: Loop through a specified list if>: if/else control flow Treasure Data td>: Run a specified TD query td_run>: Run a saved query td_ddl>: Create, delete, rename, truncate tables td_load>: Invoke an input data transfer td_for_each>: Loop through a query result row by row AWS s3_wait>: Wait for new files in S3 & download redshift>: Run Redshift query redshift_load>: Load data into Redshift redshift_unload>: Unload data from Redshift Google Cloud Platform bq>: Run BigQuery query bq_extract>: Unload data from BigQuery to GCS
  • 13. Slowly Changing Dimensions • Particular dimensions within a dataset that are prone to change unpredictably • Example: the phone number or email field of a CRM dataset • Data available from a CRM usually represents the current, up-to-date value of each field for each customer • Storing a history this customer data requires managing these slowly changing dimensions (SCDs)
  • 14. Different Ways to Handle SCDs • Type 1 • Type 2 • Type 3 • Type 4
  • 15. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record:
  • 16. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California
  • 17. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 1: company_id company_name company_state 123 Sterling Cooper California
  • 18. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 19. Type 3: Store the latest two values in one row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 3: company_id company_name company_state_current company_state_previous 123 Sterling Cooper California New York
  • 20. Type 4: Use a separate history table SCD Type 4: company_id company_name company_state 123 Sterling Cooper California company company_id company_name company_state last_modified_date 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company_history
  • 22. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 23. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 24. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 25. Window Functions • Window functions perform calculations across rows of the query result • They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause • They are written in the ‘SELECT’ clause and display results in their own column • They have three parts:
  • 26. Window Functions rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) ordering specificationfunction partition specification
  • 27. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2
  • 28. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 2
  • 29. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 2 123 Sterling Cooper New York 2007-06-19 1 124 CGC Connecticut 2018-05-22 2 124 CGC New York 2010-08-22 1
  • 30. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 31. Window Functions SELECT company_id, company_name, company_state, CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 32. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 33. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 35. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 36. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 37. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company target_company
  • 38. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0 target_company
  • 40. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 41. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 42. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 43. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 2. Delete from the data lake any current rows that have a matching id in the new data company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 44. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 3. Insert the temp rows into the target table company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 45. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 46. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 47. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 4. Insert the new data into the target table
  • 48. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 49. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 50. Contact Information • Scott Mitchell • Senior Solution Engineer • smitchell@treasure-data.com