SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Rim Zaydullin (zaydullinr@seagroup.com)

Platform Engineering Group (PEG), Shopee 2018
BUILDING DATA PIPELINES IN SHOPEE
WITH
WHY?
*LONG INTRO
Before diving in we need to
understand context and reasoning

As some of you are from the outside
the company, I need to give a bit more
details on how things work

So, bear with me
Behind the scenes of any internet company
Any projects begins with real life

And real life shows that every
company has a mess of a various
scale



Separate parts or subsystems can be
very clean and pretty, but we never
stop our progress and even clean
systems deteriorate with time, due to
project evolution and new features,
nigher loads that require new
architectures, etc.

Engineers are the creators and the
cleaners of this mess, today we’ll talk
about cleaning-up

One example would be:
Shopee app We’re Shopee, we’re doing e-
commerse :D

People buy and sell stuff, and when
they do, they have this useful info
numbers on their “orders” page

To ship, etc. Now, we found out that
having these numbers can cause
some nasty pain during sale events

Why?



CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
CORE SERVER*
*INSANELY SIMPLIFIEDVIEW
Some intro about core server
create transaction
commit transaction
query

query

query

query

query

SLOW (locking) query

query
query
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
BOTTLENECKS!
When ppl buy stuff, number “to_ship”
changes for the seller.

All those numbers, to_ship, to_receive
(returns), etc are bunch of values in a
single row in a table.

When a lot of ppl buy stuff, this row
has many simultaneous updates
which leads to row locks, which leads
to transactions being timed out and
we have this avalanche effect

avalanche, when users can’t make a
purchase, they retry this whole big
transactions again and again, we
can’t serve new users, they
accumulate, everyone’s retrying to
make a purchase again and the whole
system is bought to a crawl

Shopee users are not happy, out DBA
are not happy, we gotta do something
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
Let’s process slow (locking)
queries in background,
asynchronously
?
?
??
?
This info numbers are not absolutely
important in the big scheme of things,
they can be processed in background. 

They can even be a bit delayed, it’s no
problem.

So, we need some new system
outside of core server, that could
handle these requests in background
DB3
Let’s process slow (locking)
queries in background,
asynchronously
?
?
??
?
This info numbers are not absolutely
important in the big scheme of things,
they can be processed in background. 

They can even be a bit delayed, it’s no
problem.

So, we need some new system
outside of core server, that could
handle these requests in background

In fact we don’t need core server to
care about this logic at all. External
system could track buyer actions from
DB changes and update seller records
accordingly
Source
DB
Destination
DB
Magic Data
Pipeline??
Looks like we need something like
this? General solution
CORE SERVER
Mobile app
&
Web clients
SERVICE
redis queue A
redis queue B
transformation
server
DB1 DB2 DB3
CODE / INFRA BLOAT!
Let’s continue cleaning things up!Another
example!

Explain what’s going on.

It’s already outside the core server, but
requires core server to have additional code
(that needs support, monitoring and is not a
general solution)

External system can be a complicated mess
that’s reinvented over and over again by
different teams
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
CODE / INFRA BLOAT!
CORE SERVER
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
CODE / INFRA BLOAT
This magic piece of infra is a bicycle
reinvention every time. It needs
servers, maintenance and it’s a
custom solution every time
SERVICE
redis queue A
redis queue B
transformation
server
CORE SERVER
Mobile app
&
Web clients
SERVICE
DB1 DB2 DB3
CODE / INFRA BLOAT
?
?
??
?
Data transformation
Source
DB
Destination
Service
Magic Data
Pipeline??
Again, looks like we need something
like this? General solution
HOW?
EXISTING DBTOOLS?
TRIGGERS?

FUNCTIONS?
- Triggers allows to modify only storage itself using set of
predefined functions. React to insert/update/delete queries,
executes before or after the query
- Works only on DB host itself
- Limited in data processing capabilities
- Are bound to specific DB (mysql, oracle, etc)
- Can not send request to outside systems, queues
- Extending functionality is pretty much impossible
All problems in computer science can be solved
by another level of indirection. © David Wheeler
* the guy who invented
subroutines in software
He knows a lot about indirection!
Data
Source
Data
Destination
Magic Data
Pipeline??
Again, looks like we need something
like this? General solution
- Works as independent service
- Has flexible data processing capabilities
- Not bound to specific data sources or destinations
- Connects completely unrelated systems in generic way
- Is easily extensible to support new systems
- It’s like DB functions/triggers taken to another level
Data
Source
Data
Destination
Magic Data
Pipeline??
But, the requirements are tough
- No additional point of failure
- Source consistency preservation
- Zero loss, low latency
- Highly available, scalable
• REPLICATION
• SIMPLE TRANSFORMS
INITIAL IDEA(S)
SIMPLE WEB INTERFACE
source transformation destination
Table Type Sharding Key Operations
Description
source
Enabled Source column Destination column
Source table
Destination table
Columns mapping
Operations mapping
OTHERS
LinkedIn's Change Data Capture Pipeline
SOCC 2012
We looked at other systems there are
not many in open source. It’s all
mostly internal systems never shared
with the outside world

This specific system is closely
connected with Oracle DB that’s used
at linkedin
DEC
DATABASE
QUEUE
DATABASE
QUEUE
SOURCE
MAPPING &
SIMPLE TRANSFORM
K /V
DESTINATION
• HARDCODED FUNCTIONS
• SIMPLE JSON CONFIGS
INITIAL DESIGN EXPLANATION

WE NEED SIMPLE SYSTEM

OH WAIT…

By nature, the more complex the
system is, the more prone it is to
breaking.

BUT
REAL CASES ARE MORE COMPLEX
WAY MORE COMPLEXSometimes we need a transformation
function, that generated a request to
celery, for example. How are we going
to do that?
DEC
DATABASE
QUEUE
DATABASE
QUEUE
SOURCE
K /V
DESTINATION
• HARDCODED FUNCIONS
• SIMPLE JSON CONFIGS
• SCRIPTABLE ENGINE
MAPPING &TRANSFORM
• REPLICATION + SHARDING
• MAPPING + SIMPLE TRANSFORMS
• SCRIPTABLE ENGINE (LUA!)
• HA, LOW LATENCY, ZERO DATA LOSS
TRACKING DB EVENTS
TRACKING DB EVENTS
- GDS connects directly to MySQL instance as slave

- Receives logical replication log (modifications only)

- Converts received events to json

- Pushes those json events onto Kafka topic(s)

- Highly configurable
Event!
TRACKING DB EVENTS
SOURCE MAPPING &TRANSFORM DESTINATION
DEC ARCHITECTURE:
1) Reads events from datasource

2) Applies transformations to events using simple transforms or (LUA script)

3) Serializes resulting queries to internal format using msgpack

4) Writes resulting binary queries to configured kafka topics
1) Reads binary queries

2) Deserializes queries and sends them to specified destination

3) Takes care of retry logic and events deduplication
CONFIGURATION
Make sure your data source is configured
(we have a DB replication stream from GDS)
Step 1
Make sure DEC configuration has

correct data source and data destination
Step 2
CONFIGURATION
Implement and deploy necessary data
transformation scripts.
Step 3
CONFIGURATION
CONSUMER
EVENT TRANSFORMATIONS
1) DEC Consumer takes event from GDS queue
2) Filters event by table/event type (insert/update/delete)
3) Process with corresponding LUA script

CONSUMER EVENT TRANSFORMATIONS
CONSUMER EVENT TRANSFORMATIONS
CONSUMER EVENT TRANSFORMATIONS
2018/12/17 16:41:20.882435 [INFO] [buyer_seller_count.dec_shopee_order_details.order_2_seller]
[4097018849][619930454]

SQL: UPDATE order_cnt_seller_tab_00000002 SET `mtime` = 1545036080, `seller_toreceice` =
`seller_toreceice` + 1 , `seller_toship` = `seller_toship` - 1 WHERE `shopid` = 27045752;
rows affected: 1
SO,WHERE WE’RE AT?
LIVE FOR ALL 7 COUNTRIES
USED BY 3TEAMS,
MORE COMING ON BOARD
create transaction
commit transaction
query

query

query

query

query

SLOW (locking) query

query
query
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
BOTTLENECKS!
CORE SERVER
Mobile app
&
Web clients
DB1 DB2 DB3
DEC
NO BOTTLENECKS!
CORE SERVER
Mobile app
&
Web clients
SERVICE
redis queue A
redis queue B
transformation
DB1 DB2 DB3
CODE / INFRA BLOAT
CORE SERVER
Mobile app
&
Web clients
SERVICE
DB1 DB2 DB3
NO CODE / INFRA BLOAT
DEC
CORE SERVER
Mobile app
&
Web clients
SERVICE
DB1 DB2 DB3
DEC
SERVICE
SERVICE
CONCLUDING
All software projects are evolving and it’s always a mess

but we need to create decent tools to keep the entropy at bay
and DEC is one such attempt in this never ending battle :)
THANKYOU!
Q&A
Rim Zaydullin (zaydullinr@seagroup.com)

Platform Engineering Group (PEG), Shopee 2018

Mais conteúdo relacionado

Mais procurados

MQTTとAMQPと.NET
MQTTとAMQPと.NETMQTTとAMQPと.NET
MQTTとAMQPと.NETterurou
 
Ndc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCNdc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCHo Gyu Lee
 
15分で分かる NoOps
15分で分かる NoOps15分で分かる NoOps
15分で分かる NoOpsHiromasa Oka
 
Managing Egress with Istio
Managing Egress with IstioManaging Egress with Istio
Managing Egress with IstioSolo.io
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례NAVER LABS
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010devCAT Studio, NEXON
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)Heungsub Lee
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019devCAT Studio, NEXON
 
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍인프콘 2022 - Rust 크로스 플랫폼 프로그래밍
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍Chris Ohk
 
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁Yi-kwon Hwang
 
NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현noerror
 
Extending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsExtending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsStefan Schimanski
 
중니어의 고뇌: 1인분 개발자, 다음을 찾아서
중니어의 고뇌: 1인분 개발자, 다음을 찾아서중니어의 고뇌: 1인분 개발자, 다음을 찾아서
중니어의 고뇌: 1인분 개발자, 다음을 찾아서Yurim Jin
 
Quarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniquesQuarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniquesRed Hat Developers
 
Jakarta EE + MicroProfile との付き合い方
Jakarta EE + MicroProfile との付き合い方Jakarta EE + MicroProfile との付き合い方
Jakarta EE + MicroProfile との付き合い方Hirofumi Iwasaki
 
Introduce Google Kubernetes
Introduce Google KubernetesIntroduce Google Kubernetes
Introduce Google KubernetesYongbok Kim
 
FIWAREシステム内の短期履歴の管理
FIWAREシステム内の短期履歴の管理FIWAREシステム内の短期履歴の管理
FIWAREシステム内の短期履歴の管理fisuda
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3Heungsub Lee
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetesKrishna-Kumar
 
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にあるde:code 2017
 

Mais procurados (20)

MQTTとAMQPと.NET
MQTTとAMQPと.NETMQTTとAMQPと.NET
MQTTとAMQPと.NET
 
Ndc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCNdc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABC
 
15分で分かる NoOps
15分で分かる NoOps15分で分かる NoOps
15分で分かる NoOps
 
Managing Egress with Istio
Managing Egress with IstioManaging Egress with Istio
Managing Egress with Istio
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
 
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍인프콘 2022 - Rust 크로스 플랫폼 프로그래밍
인프콘 2022 - Rust 크로스 플랫폼 프로그래밍
 
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
 
NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현
 
Extending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsExtending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitions
 
중니어의 고뇌: 1인분 개발자, 다음을 찾아서
중니어의 고뇌: 1인분 개발자, 다음을 찾아서중니어의 고뇌: 1인분 개발자, 다음을 찾아서
중니어의 고뇌: 1인분 개발자, 다음을 찾아서
 
Quarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniquesQuarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniques
 
Jakarta EE + MicroProfile との付き合い方
Jakarta EE + MicroProfile との付き合い方Jakarta EE + MicroProfile との付き合い方
Jakarta EE + MicroProfile との付き合い方
 
Introduce Google Kubernetes
Introduce Google KubernetesIntroduce Google Kubernetes
Introduce Google Kubernetes
 
FIWAREシステム内の短期履歴の管理
FIWAREシステム内の短期履歴の管理FIWAREシステム内の短期履歴の管理
FIWAREシステム内の短期履歴の管理
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある
[DO07] マイクロサービスに必要な技術要素はすべて Spring Cloud にある
 

Semelhante a Building data pipelines at Shopee with DEC

CQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureCQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureThomas Jaskula
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid WorldTanel Poder
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Dynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesDynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesCognizant
 
DB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsDB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsManikandan Suresh
 
Assessing technology landscape
Assessing technology landscapeAssessing technology landscape
Assessing technology landscapeDom Mike
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10kaashiv1
 
DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupSaewoong Lee
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...Precisely
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
Informatica
InformaticaInformatica
Informaticamukharji
 
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...Thibaud Desodt
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Dataexponential-inc
 

Semelhante a Building data pipelines at Shopee with DEC (20)

CQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureCQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architecture
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Dynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesDynamics of Leading Legacy Databases
Dynamics of Leading Legacy Databases
 
DB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsDB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more details
 
Final project cafe coffe
Final project cafe coffeFinal project cafe coffe
Final project cafe coffe
 
Ibm redbook
Ibm redbookIbm redbook
Ibm redbook
 
Assessing technology landscape
Assessing technology landscapeAssessing technology landscape
Assessing technology landscape
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10
 
Ebook10
Ebook10Ebook10
Ebook10
 
DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
Informatica
InformaticaInformatica
Informatica
 
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Db trends final
Db trends   finalDb trends   final
Db trends final
 

Último

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdfKamal Acharya
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxhublikarsn
 
Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2ChandrakantDivate1
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information SystemsAnge Felix NSANZIYERA
 
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...Payal Garg #K09
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 

Último (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
 
Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...
Unsatisfied Bhabhi ℂall Girls Ahmedabad Book Esha 6378878445 Top Class ℂall G...
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 

Building data pipelines at Shopee with DEC

  • 1. Rim Zaydullin (zaydullinr@seagroup.com)
 Platform Engineering Group (PEG), Shopee 2018 BUILDING DATA PIPELINES IN SHOPEE WITH
  • 2. WHY? *LONG INTRO Before diving in we need to understand context and reasoning As some of you are from the outside the company, I need to give a bit more details on how things work So, bear with me
  • 3. Behind the scenes of any internet company Any projects begins with real life And real life shows that every company has a mess of a various scale
 
 Separate parts or subsystems can be very clean and pretty, but we never stop our progress and even clean systems deteriorate with time, due to project evolution and new features, nigher loads that require new architectures, etc. Engineers are the creators and the cleaners of this mess, today we’ll talk about cleaning-up One example would be:
  • 4. Shopee app We’re Shopee, we’re doing e- commerse :D People buy and sell stuff, and when they do, they have this useful info numbers on their “orders” page To ship, etc. Now, we found out that having these numbers can cause some nasty pain during sale events Why?
 

  • 5. CORE SERVER Mobile app & Web clients DB1 DB2 DB3 CORE SERVER* *INSANELY SIMPLIFIEDVIEW Some intro about core server
  • 6. create transaction commit transaction query
 query
 query
 query
 query
 SLOW (locking) query
 query query CORE SERVER Mobile app & Web clients DB1 DB2 DB3 BOTTLENECKS! When ppl buy stuff, number “to_ship” changes for the seller. All those numbers, to_ship, to_receive (returns), etc are bunch of values in a single row in a table. When a lot of ppl buy stuff, this row has many simultaneous updates which leads to row locks, which leads to transactions being timed out and we have this avalanche effect avalanche, when users can’t make a purchase, they retry this whole big transactions again and again, we can’t serve new users, they accumulate, everyone’s retrying to make a purchase again and the whole system is bought to a crawl Shopee users are not happy, out DBA are not happy, we gotta do something
  • 7. CORE SERVER Mobile app & Web clients DB1 DB2 DB3 Let’s process slow (locking) queries in background, asynchronously ? ? ?? ? This info numbers are not absolutely important in the big scheme of things, they can be processed in background. They can even be a bit delayed, it’s no problem. So, we need some new system outside of core server, that could handle these requests in background
  • 8. DB3 Let’s process slow (locking) queries in background, asynchronously ? ? ?? ? This info numbers are not absolutely important in the big scheme of things, they can be processed in background. They can even be a bit delayed, it’s no problem. So, we need some new system outside of core server, that could handle these requests in background In fact we don’t need core server to care about this logic at all. External system could track buyer actions from DB changes and update seller records accordingly
  • 9. Source DB Destination DB Magic Data Pipeline?? Looks like we need something like this? General solution
  • 10. CORE SERVER Mobile app & Web clients SERVICE redis queue A redis queue B transformation server DB1 DB2 DB3 CODE / INFRA BLOAT! Let’s continue cleaning things up!Another example! Explain what’s going on. It’s already outside the core server, but requires core server to have additional code (that needs support, monitoring and is not a general solution) External system can be a complicated mess that’s reinvented over and over again by different teams
  • 11. CORE SERVER Mobile app & Web clients DB1 DB2 DB3 CODE / INFRA BLOAT!
  • 13. CORE SERVER Mobile app & Web clients DB1 DB2 DB3 CODE / INFRA BLOAT This magic piece of infra is a bicycle reinvention every time. It needs servers, maintenance and it’s a custom solution every time SERVICE redis queue A redis queue B transformation server
  • 14. CORE SERVER Mobile app & Web clients SERVICE DB1 DB2 DB3 CODE / INFRA BLOAT ? ? ?? ? Data transformation
  • 15. Source DB Destination Service Magic Data Pipeline?? Again, looks like we need something like this? General solution
  • 16. HOW?
  • 17. EXISTING DBTOOLS? TRIGGERS?
 FUNCTIONS? - Triggers allows to modify only storage itself using set of predefined functions. React to insert/update/delete queries, executes before or after the query - Works only on DB host itself - Limited in data processing capabilities - Are bound to specific DB (mysql, oracle, etc) - Can not send request to outside systems, queues - Extending functionality is pretty much impossible
  • 18. All problems in computer science can be solved by another level of indirection. © David Wheeler * the guy who invented subroutines in software He knows a lot about indirection!
  • 19. Data Source Data Destination Magic Data Pipeline?? Again, looks like we need something like this? General solution - Works as independent service - Has flexible data processing capabilities - Not bound to specific data sources or destinations - Connects completely unrelated systems in generic way - Is easily extensible to support new systems - It’s like DB functions/triggers taken to another level
  • 20. Data Source Data Destination Magic Data Pipeline?? But, the requirements are tough - No additional point of failure - Source consistency preservation - Zero loss, low latency - Highly available, scalable
  • 21. • REPLICATION • SIMPLE TRANSFORMS INITIAL IDEA(S)
  • 22. SIMPLE WEB INTERFACE source transformation destination Table Type Sharding Key Operations Description
  • 23. source Enabled Source column Destination column Source table Destination table Columns mapping Operations mapping
  • 24. OTHERS LinkedIn's Change Data Capture Pipeline SOCC 2012 We looked at other systems there are not many in open source. It’s all mostly internal systems never shared with the outside world This specific system is closely connected with Oracle DB that’s used at linkedin
  • 25. DEC DATABASE QUEUE DATABASE QUEUE SOURCE MAPPING & SIMPLE TRANSFORM K /V DESTINATION • HARDCODED FUNCTIONS • SIMPLE JSON CONFIGS INITIAL DESIGN EXPLANATION WE NEED SIMPLE SYSTEM OH WAIT… By nature, the more complex the system is, the more prone it is to breaking. BUT
  • 26. REAL CASES ARE MORE COMPLEX
  • 27. WAY MORE COMPLEXSometimes we need a transformation function, that generated a request to celery, for example. How are we going to do that?
  • 28. DEC DATABASE QUEUE DATABASE QUEUE SOURCE K /V DESTINATION • HARDCODED FUNCIONS • SIMPLE JSON CONFIGS • SCRIPTABLE ENGINE MAPPING &TRANSFORM
  • 29. • REPLICATION + SHARDING • MAPPING + SIMPLE TRANSFORMS • SCRIPTABLE ENGINE (LUA!) • HA, LOW LATENCY, ZERO DATA LOSS
  • 31. TRACKING DB EVENTS - GDS connects directly to MySQL instance as slave
 - Receives logical replication log (modifications only)
 - Converts received events to json
 - Pushes those json events onto Kafka topic(s)
 - Highly configurable
  • 34. DEC ARCHITECTURE: 1) Reads events from datasource 2) Applies transformations to events using simple transforms or (LUA script) 3) Serializes resulting queries to internal format using msgpack 4) Writes resulting binary queries to configured kafka topics 1) Reads binary queries 2) Deserializes queries and sends them to specified destination 3) Takes care of retry logic and events deduplication
  • 35. CONFIGURATION Make sure your data source is configured (we have a DB replication stream from GDS) Step 1
  • 36. Make sure DEC configuration has
 correct data source and data destination Step 2 CONFIGURATION
  • 37. Implement and deploy necessary data transformation scripts. Step 3 CONFIGURATION
  • 38. CONSUMER EVENT TRANSFORMATIONS 1) DEC Consumer takes event from GDS queue 2) Filters event by table/event type (insert/update/delete) 3) Process with corresponding LUA script

  • 41. CONSUMER EVENT TRANSFORMATIONS 2018/12/17 16:41:20.882435 [INFO] [buyer_seller_count.dec_shopee_order_details.order_2_seller] [4097018849][619930454]
 SQL: UPDATE order_cnt_seller_tab_00000002 SET `mtime` = 1545036080, `seller_toreceice` = `seller_toreceice` + 1 , `seller_toship` = `seller_toship` - 1 WHERE `shopid` = 27045752; rows affected: 1
  • 42. SO,WHERE WE’RE AT? LIVE FOR ALL 7 COUNTRIES USED BY 3TEAMS, MORE COMING ON BOARD
  • 43. create transaction commit transaction query
 query
 query
 query
 query
 SLOW (locking) query
 query query CORE SERVER Mobile app & Web clients DB1 DB2 DB3 BOTTLENECKS!
  • 44. CORE SERVER Mobile app & Web clients DB1 DB2 DB3 DEC NO BOTTLENECKS!
  • 45. CORE SERVER Mobile app & Web clients SERVICE redis queue A redis queue B transformation DB1 DB2 DB3 CODE / INFRA BLOAT
  • 46. CORE SERVER Mobile app & Web clients SERVICE DB1 DB2 DB3 NO CODE / INFRA BLOAT DEC
  • 47. CORE SERVER Mobile app & Web clients SERVICE DB1 DB2 DB3 DEC SERVICE SERVICE
  • 48. CONCLUDING All software projects are evolving and it’s always a mess
 but we need to create decent tools to keep the entropy at bay and DEC is one such attempt in this never ending battle :)