"Stateful app as an efficient way to build dispatching for riders and drivers", Oleksandr Chumak

Fwdays
FwdaysFwdays
"Stateful app as an efficient way to build dispatching for riders and drivers",  Oleksandr Chumak
Uklon in numbers
12 130+
Engineers
Product Teams
16 M
Android/iOS
downloads
1.5M+
Riders DAU
30+
microservices
200k+
Drivers DAU
3
Countries
30
Cities
"Stateful app as an efficient way to build dispatching for riders and drivers",  Oleksandr Chumak
Uklon
RiderApp DriverApp
How to reduce CPU consumption by 10 times due
to stateful-processing and ensure high reliability
What is the report about?
3
What are the solutions employed
by our competitors?
1
Scaling of stateful services
Reliability of stateful services
Workloads that make the stateless approach inefficient
Basic concepts
Agenda
Workloads that make the
stateless approach inefficient
1. massive frequent write operations are needed to track the objects'
current locations. As drivers can move as fast as 20 meters per second,
it is therefore important to update drivers' locations at a second.
Several challenges within
the ride-hailing are…
2. a K-nearest neighbour (kNN) query poses tremendous challenges,
compared to a simple Get query, in a key-value data store such as
Redis.
Feature #1
Orders Dispatching
Find the best driver for the order
Feature #2
Orders Broadcasting
Streaming your order to many drivers
DriverApp
Feature #3
Batch dispatching
Greedy algorithm Batching algorithm
The Process of Order Dispatching
with Batch Windows
2 min
9 min
4 min
4 min
Total wait time = 11 min Total wait time = 8 min
image
Feature #4
Driver ETA Tracker
Requirements:
1. Active Orders = tens of thousands
2. Drivers send their location every
2-5 seconds
1. Order offers. Find the best driver near you.
2. Order broadcasts. Fan-out orders to multiple drivers.
3. Order chaining. Find the next order for the driver, while
completing the current one.
4. Order batching (optimization). Reduce the total waiting time
for all passengers.
5. Sector queue (airports, train stations).
6. Driver ETA tracking for accepted order.
7. Matching driver’s GPS location to map graph node.
Other Workloads
Simplified Overview of
the Architecture
Stateful
● Load balancing algorithms
● Scalability
○ Partitioning
○ Replication
● Fault tolerance and Cold start
4
Stateful
architectures
Open Problems
1
Key concept
1. Local state is stored in memory KV structures
2. The local state restored from the durable log.
In same cases, local state change may have
been checkpointed to remote KV store (or into
a separate kafka topic)
3. Local state updates occur within a
single-threaded. No concurrency, Monotonic
Writes
NFR (Kyiv only)
Writes
1.1) 5000-10000 rps
1.2) 100-500 rps
Reads
2.1) 500 rps (handle 100-500 drivers
per request)
2.2) fetch 50000-200000 rows/sec
(100-400MB/sec)
driver entity: 2 KB (50 perc)/ 13 KB (99 perc)
total size for 100K = 200 MB
Key differences
Stateless (remote KV)
● Provide GET/PUT/DELETE API
● A high CPU cost due to
marshalling and serialization
● Additional network latency
● Frequently necessitates
additional local caching
Stateful (in-memory/local KV)
● Domain specific API. Ex:
○ Find nearest drivers
○ Calculate ETA
● Data locality
● Shared-nothing
1
Access patterns for
In-memory KV
1. Key lookup
2. Index seek (Offers, Broadcast)
3. All scans / Range scans
Concept #1: Co-partitioning
Two topics are described as
co-partitioned if:
1. Their keys have the same schemas
2. They are materialized by topics
with the same number of partitions
3. Their producers have similar
'partitioner'
Concept #1: Co-partitioning
Concept #2: Re-keying partitions
● Related events are not
co-partitioned
● Well-balanced partitions
● These can be unbalanced partitions and,
as a result, consumers
● Achieving data locality for the consumer
Concept #3: Filtering + Enriching
DriverLocation {
"driver": 12345
"latitude": 50.30846,
"longitude": 30.53419
}
DriverETA {
"driver": 12345
"latitude": 50.30846,
"longitude": 30.53419
“order”: 98765,
“eta”: “2 min”
}
How to scale?
Driver Dispatching
Driver Dispatching
Driver Dispatching
Driver Dispatching
1
Scalability
1
1. geospatial indexing (geohash, S2, H3)
2. city_id (region)
Some sharding strategy
Consider the following points when you design a data
partitioning scheme:
1. Minimize cross-partition data access operations
2. Minimize cross-partition joins
1
Partitioning by Region
Possible challenges:
● down-time during rebalance:
scale-out, rolling update
● unbalanced load: The load
from Kyiv is equivalent to the
load from all cities of Ukraine
combined)
1
Try to fix:
Partitioning by Region + Replication
Replication:
● Standalone consumers
● No partitions rebalance
● No down-time
● Replication overhead is
less than 0.1CPU per pod
● Reduced requirements
for cold recovery
1
1. Scalability - adding Kafka
partitions and deploying
separate Shard-Instances for
cities/countries
2. Elasticity - scale-out of
consumers within a Shard
Scalability
Reliability?
1
Replica synchronization
● State-based CRDT
● Last write wins (LWW)
● Optimistic replication (can
become temporarily
inconsistent)
● Strong Eventual Consistency
(SEC)
● Reading Your Own
Writes
● Monotonic Reads
● Consistent Prefix Reads
Depends on your Domain
● Reading Your Own
Writes
● Monotonic Reads
● Consistent Prefix Reads
1
Problems with Replication Lag?
1
1. Single infrastructure dependency - Kafka (battle tested streaming
platform with high throughput, fault-tolerance, and scalability).
2. When a task instance restarts, local state is repopulated by reading its
own Kafka log
3. Yes, reading and repopulating will take some time
Fault tolerance with local state
1
1. Key-Based Retention
a. Aggressive topic compaction
b. Tombstones
2. Time-Based Retention
Controlling State Size.
How long time to rebuild the state?
1
1. Driver state retention: 1hour
2. Repopulate local state:
a. Read driver-state from the beginning of the topic: 400k msg (8
partitions)
b. Read driver-locations from the 'now - 5sec'
3. You need to implement own event for ”live processing started”
How long time to rebuild
the state?
"Live processing started "dispatching.driver-summary-events [0]"
after 00:00:01.7875633 sec (50142 msgs)"
SLA level of 99.998% uptime/availability
results in the following periods of allowed
downtime/unavailability:
■ Daily: 1.7s
Traffic Jams requirements
1. Reduce the cost of Google
Maps API
2. High rate of Writes (20k
online drivers)
3. Update traffic information
every 5min
Stateful processing
● Grouping messages by partition key
● Aggregating messages in hopping window
● MapReduce
Driver ETA Tracker
4
Similar workload using Redis
https://aws.amazon.com/blogs/database/optimize-redis-client-performance-for-amazon-elasticache/?utm_source=pocket_saves
○ Client: c5.4xlarge (16 vCPU 32GiB)
○ Redis: 3 nodes r6g.2xlarge (8 vCPUs 64Gib)
46
Resources Usage
Although the current design is simple, it allows flexibility to change
key aspects:
○ Replication + Sharding
4
Future works
46
1. Stateful is not always difficult
2. Simple and Reliable solution
3. Easy to maintain
4. Much more efficient in terms of resources (2 vCPUs for all
dispatching) instead of a Redis cluster with 16-24 vCPUs
5. What about MS Orleans?
Lessons learned
4
The Twelve-Factor App
Misleading
46
Space-based architecture?
https://www.amazon.com/_/dp/1492043451?smid=ATVPDKIKX0DER&_encoding=UTF8&tag=oreilly20-20
Contacts
Solution Architect
Oleksandr Chumak
https:/
/www.linkedin.com/in/oleksandr-chuma
k-45967588/
facebook.com/achumak.dev
1 de 46

Recomendados

Kubernetes @ Squarespace (SRE Portland Meetup October 2017) por
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
237 visualizações51 slides
Stephan Ewen - Experiences running Flink at Very Large Scale por
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
3.5K visualizações76 slides
BWC Supercomputing 2008 Presentation por
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
343 visualizações25 slides
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large... por
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
1.5K visualizações44 slides
QCON 2015: Gearpump, Realtime Streaming on Akka por
QCON 2015: Gearpump, Realtime Streaming on AkkaQCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on AkkaSean Zhong
634 visualizações60 slides
Our Multi-Year Journey to a 10x Faster Confluent Cloud por
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
32 visualizações43 slides

Mais conteúdo relacionado

Similar a "Stateful app as an efficient way to build dispatching for riders and drivers", Oleksandr Chumak

Challenges in Cloud Computing – VM Migration por
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationSarmad Makhdoom
7.1K visualizações26 slides
Velocity 2018 preetha appan final por
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
118 visualizações70 slides
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale por
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
900 visualizações56 slides
Practice and challenges from building IaaS por
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
841 visualizações26 slides
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps) por
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)Art Schanz
85 visualizações31 slides
Unclouding Container Challenges por
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container ChallengesRakuten Group, Inc.
407 visualizações18 slides

Similar a "Stateful app as an efficient way to build dispatching for riders and drivers", Oleksandr Chumak(20)

Challenges in Cloud Computing – VM Migration por Sarmad Makhdoom
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
Sarmad Makhdoom7.1K visualizações
Velocity 2018 preetha appan final por preethaappan
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
preethaappan118 visualizações
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale por Sean Zhong
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong900 visualizações
Practice and challenges from building IaaS por Shawn Zhu
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
Shawn Zhu841 visualizações
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps) por Art Schanz
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
Art Schanz85 visualizações
Unclouding Container Challenges por Rakuten Group, Inc.
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
Rakuten Group, Inc.407 visualizações
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala... por Martin Zapletal
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal1.3K visualizações
Oow2007 performance por Ricky Zhu
Oow2007 performanceOow2007 performance
Oow2007 performance
Ricky Zhu494 visualizações
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni... por MLconf
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf9K visualizações
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w... por Data Con LA
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA783 visualizações
Map reduce por 대호 김
Map reduceMap reduce
Map reduce
대호 김89 visualizações
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda... por DataWorks Summit/Hadoop Summit
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit3.1K visualizações
z/VM Performance Analysis por Rodrigo Campos
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
Rodrigo Campos5.8K visualizações
Designing Scalable Applications por Fabricio Epaminondas
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
Fabricio Epaminondas2.7K visualizações
Ingestion and Dimensions Compute and Enrich using Apache Apex por Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex671 visualizações
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas... por areej qasrawi
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
areej qasrawi64 visualizações
Corralling Big Data at TACC por inside-BigData.com
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
inside-BigData.com2K visualizações
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware por Lucidworks
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Lucidworks1.1K visualizações
Leveraging the Power of Solr with Spark por QAware GmbH
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH959 visualizações
Mobile web performance - MoDev East por Patrick Meenan
Mobile web performance - MoDev EastMobile web performance - MoDev East
Mobile web performance - MoDev East
Patrick Meenan3.4K visualizações

Mais de Fwdays

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov por
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan KochetovFwdays
24 visualizações33 slides
"Package management in monorepos", Zoltan Kochan por
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
33 visualizações18 slides
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell por
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M SnellFwdays
14 visualizações30 slides
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok por
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey DyachokFwdays
38 visualizações17 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin por
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
32 visualizações38 slides
"Running students' code in isolation. The hard way", Yurii Holiuk por
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 visualizações34 slides

Mais de Fwdays(20)

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov por Fwdays
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
Fwdays24 visualizações
"Package management in monorepos", Zoltan Kochan por Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays33 visualizações
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell por Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 visualizações
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok por Fwdays
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok
Fwdays38 visualizações
"Node.js Development in 2024: trends and tools", Nikita Galkin por Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays32 visualizações
"Running students' code in isolation. The hard way", Yurii Holiuk por Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 visualizações
"Surviving highload with Node.js", Andrii Shumada por Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 visualizações
"The role of CTO in a classical early-stage startup", Eugene Gusarov por Fwdays
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene Gusarov
Fwdays33 visualizações
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... por Fwdays
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...
Fwdays45 visualizações
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... por Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays48 visualizações
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur por Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays50 visualizações
"Fast Start to Building on AWS", Igor Ivaniuk por Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays53 visualizações
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays48 visualizações
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays32 visualizações
"How we switched to Kanban and how it integrates with product planning", Vady... por Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays76 visualizações
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ... por Fwdays
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ..."Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
Fwdays25 visualizações
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov por Fwdays
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
Fwdays69 visualizações
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy por Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays50 visualizações
From “T” to “E”, Dmytro Gryn por Fwdays
From “T” to “E”, Dmytro GrynFrom “T” to “E”, Dmytro Gryn
From “T” to “E”, Dmytro Gryn
Fwdays37 visualizações
"Why I left React in my TypeScript projects and where ", Illya Klymov por Fwdays
"Why I left React in my TypeScript projects and where ",  Illya Klymov"Why I left React in my TypeScript projects and where ",  Illya Klymov
"Why I left React in my TypeScript projects and where ", Illya Klymov
Fwdays256 visualizações

Último

iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
54 visualizações69 slides
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
161 visualizações13 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs por
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
158 visualizações59 slides
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue por
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueShapeBlue
135 visualizações13 slides
Ransomware is Knocking your Door_Final.pdf por
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
96 visualizações46 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
170 visualizações29 slides

Último(20)

iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 visualizações
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue161 visualizações
Digital Personal Data Protection (DPDP) Practical Approach For CISOs por Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash158 visualizações
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue por ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 visualizações
Ransomware is Knocking your Door_Final.pdf por Security Bootcamp
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdf
Security Bootcamp96 visualizações
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 visualizações
Initiating and Advancing Your Strategic GIS Governance Strategy por Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software176 visualizações
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online por ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 visualizações
Business Analyst Series 2023 - Week 4 Session 8 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 visualizações
Future of AR - Facebook Presentation por Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty64 visualizações
Business Analyst Series 2023 - Week 4 Session 7 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10139 visualizações
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... por ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 visualizações
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... por ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue166 visualizações
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... por The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer90 visualizações
NTGapps NTG LowCode Platform por Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu423 visualizações
State of the Union - Rohit Yadav - Apache CloudStack por ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue297 visualizações
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... por ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue145 visualizações
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... por ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue180 visualizações
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... por ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 visualizações

"Stateful app as an efficient way to build dispatching for riders and drivers", Oleksandr Chumak

  • 2. Uklon in numbers 12 130+ Engineers Product Teams 16 M Android/iOS downloads 1.5M+ Riders DAU 30+ microservices 200k+ Drivers DAU 3 Countries 30 Cities
  • 5. How to reduce CPU consumption by 10 times due to stateful-processing and ensure high reliability What is the report about?
  • 6. 3 What are the solutions employed by our competitors?
  • 7. 1 Scaling of stateful services Reliability of stateful services Workloads that make the stateless approach inefficient Basic concepts Agenda
  • 8. Workloads that make the stateless approach inefficient
  • 9. 1. massive frequent write operations are needed to track the objects' current locations. As drivers can move as fast as 20 meters per second, it is therefore important to update drivers' locations at a second. Several challenges within the ride-hailing are… 2. a K-nearest neighbour (kNN) query poses tremendous challenges, compared to a simple Get query, in a key-value data store such as Redis.
  • 10. Feature #1 Orders Dispatching Find the best driver for the order
  • 11. Feature #2 Orders Broadcasting Streaming your order to many drivers DriverApp
  • 12. Feature #3 Batch dispatching Greedy algorithm Batching algorithm The Process of Order Dispatching with Batch Windows 2 min 9 min 4 min 4 min Total wait time = 11 min Total wait time = 8 min
  • 13. image Feature #4 Driver ETA Tracker Requirements: 1. Active Orders = tens of thousands 2. Drivers send their location every 2-5 seconds
  • 14. 1. Order offers. Find the best driver near you. 2. Order broadcasts. Fan-out orders to multiple drivers. 3. Order chaining. Find the next order for the driver, while completing the current one. 4. Order batching (optimization). Reduce the total waiting time for all passengers. 5. Sector queue (airports, train stations). 6. Driver ETA tracking for accepted order. 7. Matching driver’s GPS location to map graph node. Other Workloads
  • 15. Simplified Overview of the Architecture Stateful
  • 16. ● Load balancing algorithms ● Scalability ○ Partitioning ○ Replication ● Fault tolerance and Cold start 4 Stateful architectures Open Problems
  • 17. 1 Key concept 1. Local state is stored in memory KV structures 2. The local state restored from the durable log. In same cases, local state change may have been checkpointed to remote KV store (or into a separate kafka topic) 3. Local state updates occur within a single-threaded. No concurrency, Monotonic Writes
  • 18. NFR (Kyiv only) Writes 1.1) 5000-10000 rps 1.2) 100-500 rps Reads 2.1) 500 rps (handle 100-500 drivers per request) 2.2) fetch 50000-200000 rows/sec (100-400MB/sec) driver entity: 2 KB (50 perc)/ 13 KB (99 perc) total size for 100K = 200 MB
  • 19. Key differences Stateless (remote KV) ● Provide GET/PUT/DELETE API ● A high CPU cost due to marshalling and serialization ● Additional network latency ● Frequently necessitates additional local caching Stateful (in-memory/local KV) ● Domain specific API. Ex: ○ Find nearest drivers ○ Calculate ETA ● Data locality ● Shared-nothing
  • 20. 1 Access patterns for In-memory KV 1. Key lookup 2. Index seek (Offers, Broadcast) 3. All scans / Range scans
  • 22. Two topics are described as co-partitioned if: 1. Their keys have the same schemas 2. They are materialized by topics with the same number of partitions 3. Their producers have similar 'partitioner' Concept #1: Co-partitioning
  • 23. Concept #2: Re-keying partitions ● Related events are not co-partitioned ● Well-balanced partitions ● These can be unbalanced partitions and, as a result, consumers ● Achieving data locality for the consumer
  • 24. Concept #3: Filtering + Enriching DriverLocation { "driver": 12345 "latitude": 50.30846, "longitude": 30.53419 } DriverETA { "driver": 12345 "latitude": 50.30846, "longitude": 30.53419 “order”: 98765, “eta”: “2 min” }
  • 25. How to scale? Driver Dispatching Driver Dispatching Driver Dispatching Driver Dispatching
  • 27. 1 1. geospatial indexing (geohash, S2, H3) 2. city_id (region) Some sharding strategy Consider the following points when you design a data partitioning scheme: 1. Minimize cross-partition data access operations 2. Minimize cross-partition joins
  • 28. 1 Partitioning by Region Possible challenges: ● down-time during rebalance: scale-out, rolling update ● unbalanced load: The load from Kyiv is equivalent to the load from all cities of Ukraine combined)
  • 29. 1 Try to fix: Partitioning by Region + Replication Replication: ● Standalone consumers ● No partitions rebalance ● No down-time ● Replication overhead is less than 0.1CPU per pod ● Reduced requirements for cold recovery
  • 30. 1 1. Scalability - adding Kafka partitions and deploying separate Shard-Instances for cities/countries 2. Elasticity - scale-out of consumers within a Shard Scalability
  • 32. 1 Replica synchronization ● State-based CRDT ● Last write wins (LWW) ● Optimistic replication (can become temporarily inconsistent) ● Strong Eventual Consistency (SEC)
  • 33. ● Reading Your Own Writes ● Monotonic Reads ● Consistent Prefix Reads Depends on your Domain ● Reading Your Own Writes ● Monotonic Reads ● Consistent Prefix Reads 1 Problems with Replication Lag?
  • 34. 1 1. Single infrastructure dependency - Kafka (battle tested streaming platform with high throughput, fault-tolerance, and scalability). 2. When a task instance restarts, local state is repopulated by reading its own Kafka log 3. Yes, reading and repopulating will take some time Fault tolerance with local state
  • 35. 1 1. Key-Based Retention a. Aggressive topic compaction b. Tombstones 2. Time-Based Retention Controlling State Size. How long time to rebuild the state?
  • 36. 1 1. Driver state retention: 1hour 2. Repopulate local state: a. Read driver-state from the beginning of the topic: 400k msg (8 partitions) b. Read driver-locations from the 'now - 5sec' 3. You need to implement own event for ”live processing started” How long time to rebuild the state? "Live processing started "dispatching.driver-summary-events [0]" after 00:00:01.7875633 sec (50142 msgs)" SLA level of 99.998% uptime/availability results in the following periods of allowed downtime/unavailability: ■ Daily: 1.7s
  • 37. Traffic Jams requirements 1. Reduce the cost of Google Maps API 2. High rate of Writes (20k online drivers) 3. Update traffic information every 5min
  • 38. Stateful processing ● Grouping messages by partition key ● Aggregating messages in hopping window ● MapReduce
  • 40. 4 Similar workload using Redis https://aws.amazon.com/blogs/database/optimize-redis-client-performance-for-amazon-elasticache/?utm_source=pocket_saves ○ Client: c5.4xlarge (16 vCPU 32GiB) ○ Redis: 3 nodes r6g.2xlarge (8 vCPUs 64Gib)
  • 42. Although the current design is simple, it allows flexibility to change key aspects: ○ Replication + Sharding 4 Future works
  • 43. 46 1. Stateful is not always difficult 2. Simple and Reliable solution 3. Easy to maintain 4. Much more efficient in terms of resources (2 vCPUs for all dispatching) instead of a Redis cluster with 16-24 vCPUs 5. What about MS Orleans? Lessons learned