"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

Fwdays
FwdaysFwdays
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
https://nbomber.com
https://github.com/stereodb
AGENDA
- intro
- why stateless is slow and less reliable
- tools for building stateful services
PART I
intro to sportsbook domain
and
how we come to stateful
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
20 KB payload for concurrent read and write
Redis, single node: 4vcpu - 8gb
redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842
redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
API API API
DB
API
Cache
DB
but Cache is not queryable
API + DB
Stateful Service
state
state
state
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
CDC (Debezium)
DB
How to handle a case when
your data is larger than RAM?
10 GB 30 GB
Solution 1: use memory DB that supports data larger than RAM
10 GB
20 GB
UA PL FR
Solution 2: use partition by tenant
Solution 3: use range-based sharding
users
(1-500)
users
(501-1000)
shard A shard B
PART II
why stateless is slow
API + DB
Stateful Service
API
Cache
DB
network latency
network latency
Latency Numbers
Latency
2010 2020
Compress 1KB with Zippy 2μs 2μs
Read 1 MB sequentially from RAM 30μs 3μs
Read 1 MB sequentially from SSD 494μs 49μs
Read 1 MB sequentially from disk 3ms 825μs
Round trip within same datacenter 500μs 500μs
Send packet CA -> Netherlands -> CA 150ms 150ms
https://colin-scott.github.io/personal_website/research/interactive_latency.html
API
Cache
DB
CPU: for serialize/deserialize
CPU: serialize/deserialize
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: serialize/deserialize
CPU: ASYNC request handling
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
Overreads
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Object hit rate / Transactional hit rate
A B
C
API
In order to fulfill our transactional flow we need to
fetch records: A, B, C
Record A and B will not impact our latency
Overall Latency = Latency of record C
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Most existing cache eviction algorithms focus on maximizing
object hit rate, or the fraction of single object requests served
from cache. However, this approach fails to capture the
inter-object dependencies within transactions.
async / await
async / await
Imagine that we run Redis on localhost. Even with such setup we
usually use async request handling.
public void SimpleMethod()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = Add(i, i);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private int Add(int a, int b) => a + b;
public async Task SimpleMethodAsync()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private Task<int> AddAsync(int a, int b)
{
return Task.FromResult(a + b);
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return a + b;
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return await Task.Run(() => a + b);
}
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART III
why stateless is less reliable
API
Cache
DB
API + DB
Stateful Service
We have a higher probability of failure
API
Cache
DB
circuit breaker
retry
fallback
timeout
bulkhead isolation
circuit breaker
retry
fallback
timeout
bulkhead isolation
API + DB
Stateful Service
API
Cache
DB
What about cache invalidation
and data consistency?
API + DB
Stateful Service
API
Cache
DB
What about the predictable scale-out?
Will your RPS increase if you add an
additional API or Cache node?
API + DB
Stateful Service
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
- Metastable failures occur in open systems with an uncontrolled source of
load where a trigger causes the system to enter a bad state that persists
even when the trigger is removed.
- Paradoxically, the root cause of these failures is often features that
improve the efficiency or reliability of the system.
- The characteristic of a metastable failure is that the sustaining effect keeps
the system in the metastable failure state even after the trigger is
removed.
At least 4 out of 15 major outages in the
last decade at Amazon Web Services
were caused by metastable failures.
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART IV
tools for building stateful services
distributed log with sync replication
In-process memory DB
SQL OLAP
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
At pick to handle big load for 1 tenant we have:
5-10 nodes, 0.5-2 CPU, 6GB RAM
THANKS
always benchmark
https://twitter.com/antyadev
1 de 60

Recomendados

Load Balancing MySQL with HAProxy - Slides por
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
11.3K visualizações25 slides
Synapse 2018 Guarding against failure in a hundred step pipeline por
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
120 visualizações112 slides
Anton Moldovan "Building an efficient replication system for thousands of ter... por
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
150 visualizações114 slides
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent... por
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...Amazon Web Services
9.4K visualizações89 slides
Fighting Against Chaotically Separated Values with Embulk por
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
2.1K visualizações45 slides
Microservice bus tutorial por
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
708 visualizações19 slides

Mais conteúdo relacionado

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

L’odyssée d’une requête HTTP chez Scaleway por
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayScaleway
293 visualizações41 slides
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF por
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFAlexandre Gouaillard
760 visualizações19 slides
DPC2007 PHP And Oracle (Kuassi Mensah) por
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)dpc
849 visualizações34 slides
Cassandra at teads por
Cassandra at teadsCassandra at teads
Cassandra at teadsRomain Hardouin
6.2K visualizações86 slides
StrongLoop Overview por
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
2.3K visualizações44 slides
Getting Started with Amazon Redshift por
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
1K visualizações59 slides

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan (20)

L’odyssée d’une requête HTTP chez Scaleway por Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez Scaleway
Scaleway293 visualizações
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF por Alexandre Gouaillard
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
Alexandre Gouaillard760 visualizações
DPC2007 PHP And Oracle (Kuassi Mensah) por dpc
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)
dpc849 visualizações
Cassandra at teads por Romain Hardouin
Cassandra at teadsCassandra at teads
Cassandra at teads
Romain Hardouin6.2K visualizações
StrongLoop Overview por Shubhra Kar
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar2.3K visualizações
Getting Started with Amazon Redshift por Amazon Web Services
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services1K visualizações
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai... por François-Guillaume Ribreau
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
François-Guillaume Ribreau2.7K visualizações
EEDC 2010. Scaling Web Applications por Expertos en TI
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI593 visualizações
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland por Fuenteovejuna
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna 551 visualizações
Kafka elastic search meetup 09242018 por Ying Xu
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu175 visualizações
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark por Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 visualizações
초보자를 위한 분산 캐시 이야기 por OnGameServer
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer12K visualizações
Sql sever engine batch mode and cpu architectures por Chris Adkin
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
Chris Adkin1K visualizações
How To Set Up SQL Load Balancing with HAProxy - Slides por Severalnines
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
Severalnines21.3K visualizações
Oracle Client Failover - Under The Hood por Ludovico Caldara
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
Ludovico Caldara1.9K visualizações
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D... por Vadym Kazulkin
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
Vadym Kazulkin84 visualizações
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호 por Amazon Web Services Korea
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Amazon Web Services Korea1.8K visualizações
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели... por Ontico
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Ontico3.2K visualizações
Extending Piwik At R7.com por Leo Lorieri
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.com
Leo Lorieri2.8K visualizações

Mais de Fwdays

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov por
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan KochetovFwdays
25 visualizações33 slides
"Package management in monorepos", Zoltan Kochan por
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
37 visualizações18 slides
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell por
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M SnellFwdays
14 visualizações30 slides
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok por
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey DyachokFwdays
40 visualizações17 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin por
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
37 visualizações38 slides
"Running students' code in isolation. The hard way", Yurii Holiuk por
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
38 visualizações34 slides

Mais de Fwdays(20)

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov por Fwdays
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
Fwdays25 visualizações
"Package management in monorepos", Zoltan Kochan por Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays37 visualizações
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell por Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 visualizações
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok por Fwdays
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok
Fwdays40 visualizações
"Node.js Development in 2024: trends and tools", Nikita Galkin por Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays37 visualizações
"Running students' code in isolation. The hard way", Yurii Holiuk por Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays38 visualizações
"Surviving highload with Node.js", Andrii Shumada por Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays59 visualizações
"The role of CTO in a classical early-stage startup", Eugene Gusarov por Fwdays
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene Gusarov
Fwdays34 visualizações
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... por Fwdays
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...
Fwdays45 visualizações
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... por Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays50 visualizações
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur por Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays50 visualizações
"Fast Start to Building on AWS", Igor Ivaniuk por Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays57 visualizações
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays48 visualizações
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays34 visualizações
"How we switched to Kanban and how it integrates with product planning", Vady... por Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays78 visualizações
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ... por Fwdays
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ..."Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
Fwdays26 visualizações
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov por Fwdays
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
Fwdays70 visualizações
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy por Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays50 visualizações
From “T” to “E”, Dmytro Gryn por Fwdays
From “T” to “E”, Dmytro GrynFrom “T” to “E”, Dmytro Gryn
From “T” to “E”, Dmytro Gryn
Fwdays37 visualizações
"Why I left React in my TypeScript projects and where ", Illya Klymov por Fwdays
"Why I left React in my TypeScript projects and where ",  Illya Klymov"Why I left React in my TypeScript projects and where ",  Illya Klymov
"Why I left React in my TypeScript projects and where ", Illya Klymov
Fwdays256 visualizações

Último

The Power of Generative AI in Accelerating No Code Adoption.pdf por
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdfSaeed Al Dhaheri
44 visualizações18 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... por
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
98 visualizações52 slides
Business Analyst Series 2023 - Week 4 Session 7 por
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
152 visualizações31 slides
NTGapps NTG LowCode Platform por
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
474 visualizações30 slides
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... por
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...BookNet Canada
43 visualizações16 slides
Qualifying SaaS, IaaS.pptx por
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1.1K visualizações8 slides

Último(20)

The Power of Generative AI in Accelerating No Code Adoption.pdf por Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri44 visualizações
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... por The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer98 visualizações
Business Analyst Series 2023 - Week 4 Session 7 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10152 visualizações
NTGapps NTG LowCode Platform por Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu474 visualizações
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... por BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada43 visualizações
Qualifying SaaS, IaaS.pptx por Sachin Bhandari
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptx
Sachin Bhandari1.1K visualizações
AIM102-S_Cognizant_CognizantCognitive por PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford23 visualizações
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... por ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue209 visualizações
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf por MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 visualizações
AI + Memoori = AIM por Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori15 visualizações
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... por Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro38 visualizações
Optimizing Communication to Optimize Human Behavior - LCBM por Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 visualizações
Choosing the Right Flutter App Development Company por Ficode Technologies
Choosing the Right Flutter App Development CompanyChoosing the Right Flutter App Development Company
Choosing the Right Flutter App Development Company
Ficode Technologies13 visualizações
Business Analyst Series 2023 - Week 4 Session 8 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10180 visualizações
Cocktail of Environments. How to Mix Test and Development Environments and St... por Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
Aleksandr Tarasov26 visualizações
Deep Tech and the Amplified Organisation: Core Concepts por Holonomics
Deep Tech and the Amplified Organisation: Core ConceptsDeep Tech and the Amplified Organisation: Core Concepts
Deep Tech and the Amplified Organisation: Core Concepts
Holonomics17 visualizações
KubeConNA23 Recap.pdf por MichaelOLeary82
KubeConNA23 Recap.pdfKubeConNA23 Recap.pdf
KubeConNA23 Recap.pdf
MichaelOLeary8228 visualizações
GDSC GLAU Info Session.pptx por gauriverrma4
GDSC GLAU Info Session.pptxGDSC GLAU Info Session.pptx
GDSC GLAU Info Session.pptx
gauriverrma415 visualizações
The Role of Patterns in the Era of Large Language Models por Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li104 visualizações

"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

  • 4. AGENDA - intro - why stateless is slow and less reliable - tools for building stateful services
  • 5. PART I intro to sportsbook domain and how we come to stateful
  • 6. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL
  • 7. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes
  • 8. 20 KB payload for concurrent read and write Redis, single node: 4vcpu - 8gb redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842 redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
  • 10. API Cache DB but Cache is not queryable
  • 11. API + DB Stateful Service
  • 15. How to handle a case when your data is larger than RAM? 10 GB 30 GB
  • 16. Solution 1: use memory DB that supports data larger than RAM 10 GB 20 GB
  • 17. UA PL FR Solution 2: use partition by tenant
  • 18. Solution 3: use range-based sharding users (1-500) users (501-1000) shard A shard B
  • 20. API + DB Stateful Service API Cache DB network latency network latency
  • 21. Latency Numbers Latency 2010 2020 Compress 1KB with Zippy 2μs 2μs Read 1 MB sequentially from RAM 30μs 3μs Read 1 MB sequentially from SSD 494μs 49μs Read 1 MB sequentially from disk 3ms 825μs Round trip within same datacenter 500μs 500μs Send packet CA -> Netherlands -> CA 150ms 150ms https://colin-scott.github.io/personal_website/research/interactive_latency.html
  • 22. API Cache DB CPU: for serialize/deserialize CPU: serialize/deserialize API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 23. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: serialize/deserialize CPU: ASYNC request handling API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 24. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets )
  • 25. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 26. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets Overreads CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 32. Object hit rate / Transactional hit rate
  • 33. A B C API In order to fulfill our transactional flow we need to fetch records: A, B, C Record A and B will not impact our latency Overall Latency = Latency of record C
  • 36. Most existing cache eviction algorithms focus on maximizing object hit rate, or the fraction of single object requests served from cache. However, this approach fails to capture the inter-object dependencies within transactions.
  • 38. async / await Imagine that we run Redis on localhost. Even with such setup we usually use async request handling.
  • 39. public void SimpleMethod() { var k = 0; for (int i = 0; i < Iterations; i++) { k = Add(i, i); } } [MethodImpl(MethodImplOptions.NoInlining)] private int Add(int a, int b) => a + b;
  • 40. public async Task SimpleMethodAsync() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private Task<int> AddAsync(int a, int b) { return Task.FromResult(a + b); }
  • 41. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return a + b; }
  • 42. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return await Task.Run(() => a + b); }
  • 45. PART III why stateless is less reliable
  • 46. API Cache DB API + DB Stateful Service We have a higher probability of failure
  • 47. API Cache DB circuit breaker retry fallback timeout bulkhead isolation circuit breaker retry fallback timeout bulkhead isolation API + DB Stateful Service
  • 48. API Cache DB What about cache invalidation and data consistency? API + DB Stateful Service
  • 49. API Cache DB What about the predictable scale-out? Will your RPS increase if you add an additional API or Cache node? API + DB Stateful Service
  • 51. - Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. - Paradoxically, the root cause of these failures is often features that improve the efficiency or reliability of the system. - The characteristic of a metastable failure is that the sustaining effect keeps the system in the metastable failure state even after the trigger is removed.
  • 52. At least 4 out of 15 major outages in the last decade at Amazon Web Services were caused by metastable failures.
  • 54. PART IV tools for building stateful services
  • 55. distributed log with sync replication
  • 59. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes At pick to handle big load for 1 tenant we have: 5-10 nodes, 0.5-2 CPU, 6GB RAM