SlideShare uma empresa Scribd logo
1 de 27
Let’s Build a Service Oriented
Data Pipeline!
June 2016
Software Developer | Hootsuite
Yasha Podeswa
Before: Oceanographer
Me!
Now: Software Developer at Hootsuite
Me!
Introduce a problem that requires a new data pipeline
Design it in a service oriented style
Build it on stage!
This Talk
Passive Aggressive Inc. just
cancelled their subscription!
Desperate Dan in trouble!
The Problem
Want to Build a Tool Like This
Want to Build a Tool Like This
Want to Build a Tool Like This
What We’re Starting With
What We’re Starting With
Things Users
Did
What We’re Starting With
Things
Organizations
Did
What We’re Starting With
Crap
High Level Plan
JSON files
Calculate stats
about
organizations
DB
High Level Plan
JSON files
Calculate stats
about
organizations
DB
Extract
Transform
Load
High Level Plan
JSON files
Calculate stats
about
organizations
DB
Extract
Transform
Load
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
Useful for lots of
things!
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
Shouldn’t run until
dependent job done
Need a “Service” Communication and
Orchestration Layer!
Let’s build it!
First App Event Cleaning and Loading
Read logs from S3, clean and sort into different types of
events, load into data warehouse
Vanilla Scala app
AWS Lambda
Second App Organization Stat Calculation
Read cleaned/sorted events from data warehouse, calculate
stats about organization, load stats to data warehouse
Vanilla Scala app
AWS Lambda
Third App Airflow
Hook up the Lambda apps in a dependency graph
● Scheduling
● Retries
● Monitoring
Steal my code!
https://github.com/yashap/etl-load-events
https://github.com/yashap/etl-organization-stats
https://github.com/yashap/airflow
Questions?

Mais conteúdo relacionado

Destaque

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 

Destaque (14)

Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
 
Writing a Jenkins / Hudson plugin
Writing a Jenkins / Hudson pluginWriting a Jenkins / Hudson plugin
Writing a Jenkins / Hudson plugin
 
Curator intro
Curator introCurator intro
Curator intro
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Modeling Microservices
Modeling MicroservicesModeling Microservices
Modeling Microservices
 
Building a real-time streaming platform using Kafka Connect + Kafka Streams
Building a real-time streaming platform using Kafka Connect + Kafka StreamsBuilding a real-time streaming platform using Kafka Connect + Kafka Streams
Building a real-time streaming platform using Kafka Connect + Kafka Streams
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
 
reveal.js 3.0.0
reveal.js 3.0.0reveal.js 3.0.0
reveal.js 3.0.0
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Semelhante a Let's Build a Service Oriented Data Pipeline!

Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & Overview
Li Ken Chong
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical Overview
David J Rosenthal
 

Semelhante a Let's Build a Service Oriented Data Pipeline! (20)

SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
 
SPS Brno 2017 - Go with the Microsoft flow
SPS Brno 2017 - Go with the Microsoft flowSPS Brno 2017 - Go with the Microsoft flow
SPS Brno 2017 - Go with the Microsoft flow
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BI
 
Jira - Solving Reporting Problems using eazyBI
Jira - Solving Reporting Problems using eazyBIJira - Solving Reporting Problems using eazyBI
Jira - Solving Reporting Problems using eazyBI
 
Synapse NanoApps
Synapse NanoAppsSynapse NanoApps
Synapse NanoApps
 
Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & Overview
 
Intro to PowerApps and Flow
Intro to PowerApps and FlowIntro to PowerApps and Flow
Intro to PowerApps and Flow
 
Automating your tasks with microsoft flow
Automating your tasks with microsoft flowAutomating your tasks with microsoft flow
Automating your tasks with microsoft flow
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical Overview
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Ha100 notes units 1 and 2 sp08
Ha100 notes units 1 and 2   sp08Ha100 notes units 1 and 2   sp08
Ha100 notes units 1 and 2 sp08
 
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenariosSPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
 
SharePoint 2010: Insights into BI
SharePoint 2010: Insights into BISharePoint 2010: Insights into BI
SharePoint 2010: Insights into BI
 
SQL Saturday Redmond The Power Platform
SQL Saturday Redmond The Power Platform SQL Saturday Redmond The Power Platform
SQL Saturday Redmond The Power Platform
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenariosSPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
 
Project Insights for Data Driven Decisions
Project Insights for Data Driven DecisionsProject Insights for Data Driven Decisions
Project Insights for Data Driven Decisions
 
Loading text data from SAP source systems
Loading text data from SAP source systemsLoading text data from SAP source systems
Loading text data from SAP source systems
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Let's Build a Service Oriented Data Pipeline!