SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Learning Rust the Hard
Way for a Production
Kafka + ScyllaDB Pipeline
Alexys Jacob, CTO, Numberly
2
+ For distributed, data-intensive apps that require high
performance and low latency
+ 400+ users worldwide
+ Results
+ Comcast: Reduced P99 latencies by 95%
+ FireEye: 1500% improvement in throughput
+ Discord: Reduced C* nodes from ~140 to 6
+ iFood: 9X cost reduction vs. DynamoDB
+ Open Source, Enterprise and Cloud options
+ Fully compatible with Apache Cassandra and Amazon
DynamoDB
About ScyllaDB
1ms <1ms
10ms
1M
10M
ScyllaDB Universe of 400+ Users
400+ Companies Use ScyllaDB
Seamless experiences
across content + devices
Make marketing more
relevant, effective
and measurable
Corporate fleet
management
Real-time analytics
2,000,000 SKU -commerce
management
Real-time location tracking
for friends/family
Video recommendation
management
IoT for industrial
machines
Synchronize browser
properties for millions
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
3
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Distributed storage for
distributed ledger tech
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Alexys Jacob
4
@ultrabug
+ CTO, Numberly
+ ScyllaDB awarded Open Source & University contributor
+ Open Source author & contributor
+ Apache Avro, Apache Airflow, MongoDB, MkDocs…
+ Tech speaker & writer
+ Gentoo Linux developer
+ Python Software Foundation contributing member
Speaker Photo
Agenda
+ The thought process to move from Python to Rust
+ Context, promises, arguments and decision
+ Learning Rust the hard way
+ All the stack components I had to work with in Rust
+ Tips, Open Source contributions and code samples
+ What is worth it?
+ Graphs, production numbers
+ Personal notes
5
Choosing Rust over
Python
6
At Numberly, we move and process (a lot of) data using Kafka streams and pipelines that are
enriched using ScyllaDB.
processor
app
processor
app
Project context at Numberly
Scylla
processor
app
raw data
enriched data
enriched data
enriched data client
app
partner
API
business
app
7
processor
app
processor
app
Pipeline reliability = latency + resilience
Scylla
processor
app
raw data
enriched data
enriched data
enriched data client
app
partner
API
business
app
If a processor or ScyllaDB is slow or fails,
our business, partners & clients are at risk.
8
A major change in our pipeline processors had to be undertaken, giving us the opportunity to redesign
them entirely.
The (rusted) opportunity
Scylla
processor
app
raw data
enriched data
enriched data
enriched data client
app
partner
API
business
app
9
“Hey, why not rewrite
those 3 Python processor apps
into 1 Rust app?”
10
The (never tried before) Rust promises
11
A language empowering everyone to build reliable and efficient software.
+ Secure
+ Memory and thread safety as first class citizens
+ No runtime or garbage collector
+ Easy to deploy
+ Compiled binaries are self-sufficient
+ No compromises
+ Strongly and statically typed
+ Exhaustivity is mandatory
+ Built-in error management syntax and primitives
+ Plays well with Python
+ PyO3 can be used to run Rust from Python (or the contrary)
Efficient software != Faster software
+ “Fast” meanings vary depending on your objectives.
+ Fast to develop?
+ Fast to maintain?
+ Fast to prototype?
+ Fast to process data?
+ Fast to cover all failure cases?
“Selecting a programming language can be a form of
premature optimization
12
Efficient software != Faster software
+ “Fast” meanings vary depending on your objectives.
+ Fast to develop? Python is way faster + did that for 15 years
+ Fast to maintain? Very few people at Numberly do know Rust
+ Fast to prototype? No, code must be complete to compile and run
+ Fast to process data? Sure: to prove it, measure it
+ Fast to cover all failure cases? Definitely: mandatory exhaustivity + error handling primitives
“I did not choose Rust to be “faster”.
Our Python code was fast enough
to deliver their pipeline processing.
13
Innovation cannot exist
if you don’t accept to lose time.
The question is
to know when and on what project.
14
The Reliable software paradigms
+ What makes me slow will make me stronger.
+ Low level paradigms (ownership, borrowing, lifetimes).
+ Strong type safety.
+ Compilation (debug, release).
+ Dependency management.
+ Exhaustive pattern matching.
+ Error management primitives (Result).
+ Explicit return values (Option).
15
The Reliable software paradigms
+ What makes me slow will make me stronger.
+ Low level paradigms (ownership, borrowing, lifetimes). If it compiles, it’s safe
+ Strong type safety. Predictable, readable, maintainable
+ Compilation (debug, release). Compiler is very helpful vs a random Python exception
+ Dependency management. Finally something looking sane vs Python mess
+ Exhaustive pattern matching. Confidence that you’re not forgetting something
+ Error management primitives (Result). Handle failure right from the language syntax
+ Explicit return values (Option). Clear separation between Some(value) and None
“
I chose Rust because it provided me with
the programming paradigms at the right abstraction level
that I needed to finally understand and better explain
the reliability and performance of my application.
16
Learning Rust the hard way
17
Production is not a Hello World
+ Learning the syntax and handling errors everywhere
+ Confluent Kafka + Schema Registry + Avro
+ Asynchronous latency-optimized design
+ ScyllaDB multi-datacenter
+ MongoDB
+ Kubernetes deployment
+ Prometheus exporter
+ Grafana dashboarding
+ Sentry
Scylla
processor
app
Confluent
Kafka
18
Confluent Kafka Schema Registry
+ Confluent Schema Registry breaks vanilla Apache Avro deserialization.
+ Gerard Klijs’ schema_registry_converter crate helps
+ I discovered performance problems which we worked and have been addressed!
+ Latency-overhead-free manual approach:
19
Apache Avro Rust was broken!
+ avro-rs crate given to Apache Avro without an appointed
committer.
+ Deserialization of complex schemas was broken...
+ I contributed fixes to Apache Avro (AVRO-3232+3240)
+ Now merged thanks to Martin Grigorov!
+ Rust compiler optimizations give a hell of a boost
(once Avro is fixed)
+ Deserializing Avro is faster than JSON!
20
green thread / msg
Asynchronous patterns to optimize latency
+ Tricks to make your Kafka consumer strategy more efficient.
+ Deserialize your consumer messages on the consumer loop, not on green-threads
+ Spawning a green-thread has a performance cost
+ Control your green-thread parallelism
+ Defer to green-threads when I/O starts to be required
Kafka
consumer
+
avro
deserializer
raw data
green thread / msg
green thread / msg
green thread / msg
green thread / msg
Scylla
enriched data
21
Absorbing tail latency spikes with parallelism
x16
x2
parallelism load
22
Scylla Rust (shard-aware) driver
+ The scylla-rust-driver crate is mature enough for
production
+ Use a CachingSession to automatically cache your
prepared statements
+ Beware: prepared queries are NOT paged, use paged
queries with execute_iter() instead!
+ Use at least version 0.4.2 if you run a multi-DC cluster!
23
Exporting metrics properly for Prometheus
+ Effectively measuring latencies down to microseconds.
+ Fine tune your histogram buckets to match your expected latencies!
...
24
Grafana dashboarding
+ Graph your precious metrics right!
+ ScyllaDB prepared statement cache size
+ Query and throughput rates
+ Kafka commits occurrence
+ Errors by type
+ Kubernetes pod memory
+ ...
+ Visualizing Prom Histograms
max by (environment)(histogram_quantile(0.50, processing_latency_seconds_bucket{...}))
25
Was it worth it?
26
Did I really lose time because of Rust?
+ I spent more time analyzing the latency impacts of code patterns and drivers’ options than
struggling with Rust syntax.
+ Key figures for this application:
+ Kafka consumer max throughput with processing? 200K msg/s on 20 partitions
+ Avro deserialization P50 latency? 75µs
+ Scylla SELECT P50 latency on 1.5B+ rows tables? 250µs
+ Scylla INSERT P50 latency on 1.5B+ rows tables? 660µs
27
It went better than expected
+ Rust crates ecosystem is mature, similar to Python Package Index.
+ 3 Python apps totalling 54 pods replaced by 1 Rust app totalling 20 pods
+ We helped & worked on making the scylla-rust-driver even better
+ Token aware policy can fallback to non-replicas for higher availability
+ Optimized partition key calculations for prepared statements
+ More to come!
+ This feels like the most reliable and efficient software I ever wrote!
28
Questions?
29
Brought to you by
FREE VIRTUAL EVENT | OCTOBER 19-20, 2022
The event for developers who care about
high-performance, low-latency applications.
Register at p99conf.io
Follow us on Twitter: @p99conf #p99conf
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

Mais conteúdo relacionado

Mais procurados

Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
Vineet .
 

Mais procurados (20)

Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes Istio
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Terraform GitOps on Codefresh
Terraform GitOps on CodefreshTerraform GitOps on Codefresh
Terraform GitOps on Codefresh
 
Case Study: Migration to GitLab (from Bitbucket) at AppsFlyer
Case Study: Migration to GitLab (from Bitbucket) at AppsFlyerCase Study: Migration to GitLab (from Bitbucket) at AppsFlyer
Case Study: Migration to GitLab (from Bitbucket) at AppsFlyer
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introduction
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
DevOps with GitHub Actions
DevOps with GitHub ActionsDevOps with GitHub Actions
DevOps with GitHub Actions
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
End-to-End CI/CD at scale with Infrastructure-as-Code on AWS
End-to-End CI/CD at scale with Infrastructure-as-Code on AWSEnd-to-End CI/CD at scale with Infrastructure-as-Code on AWS
End-to-End CI/CD at scale with Infrastructure-as-Code on AWS
 
DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
SRE & Kubernetes
SRE & KubernetesSRE & Kubernetes
SRE & Kubernetes
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Prometheus and Thanos
Prometheus and ThanosPrometheus and Thanos
Prometheus and Thanos
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 

Semelhante a Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
ScyllaDB
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 

Semelhante a Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline (20)

Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
Transforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at ScaleTransforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at Scale
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
 
Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows
 
Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...
Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...
Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Different I/O Access Methods for Linux, What We Chose for ScyllaDB, and Why
Different I/O Access Methods for Linux, What We Chose for ScyllaDB, and WhyDifferent I/O Access Methods for Linux, What We Chose for ScyllaDB, and Why
Different I/O Access Methods for Linux, What We Chose for ScyllaDB, and Why
 
Reflections On Serverless
Reflections On ServerlessReflections On Serverless
Reflections On Serverless
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
Introducing Scylla Cloud
Introducing Scylla CloudIntroducing Scylla Cloud
Introducing Scylla Cloud
 

Mais de ScyllaDB

Mais de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline

  • 1. Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline Alexys Jacob, CTO, Numberly
  • 2. 2 + For distributed, data-intensive apps that require high performance and low latency + 400+ users worldwide + Results + Comcast: Reduced P99 latencies by 95% + FireEye: 1500% improvement in throughput + Discord: Reduced C* nodes from ~140 to 6 + iFood: 9X cost reduction vs. DynamoDB + Open Source, Enterprise and Cloud options + Fully compatible with Apache Cassandra and Amazon DynamoDB About ScyllaDB 1ms <1ms 10ms 1M 10M ScyllaDB Universe of 400+ Users
  • 3. 400+ Companies Use ScyllaDB Seamless experiences across content + devices Make marketing more relevant, effective and measurable Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Real-time location tracking for friends/family Video recommendation management IoT for industrial machines Synchronize browser properties for millions Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app 3 Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Distributed storage for distributed ledger tech Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking
  • 4. Alexys Jacob 4 @ultrabug + CTO, Numberly + ScyllaDB awarded Open Source & University contributor + Open Source author & contributor + Apache Avro, Apache Airflow, MongoDB, MkDocs… + Tech speaker & writer + Gentoo Linux developer + Python Software Foundation contributing member Speaker Photo
  • 5. Agenda + The thought process to move from Python to Rust + Context, promises, arguments and decision + Learning Rust the hard way + All the stack components I had to work with in Rust + Tips, Open Source contributions and code samples + What is worth it? + Graphs, production numbers + Personal notes 5
  • 7. At Numberly, we move and process (a lot of) data using Kafka streams and pipelines that are enriched using ScyllaDB. processor app processor app Project context at Numberly Scylla processor app raw data enriched data enriched data enriched data client app partner API business app 7
  • 8. processor app processor app Pipeline reliability = latency + resilience Scylla processor app raw data enriched data enriched data enriched data client app partner API business app If a processor or ScyllaDB is slow or fails, our business, partners & clients are at risk. 8
  • 9. A major change in our pipeline processors had to be undertaken, giving us the opportunity to redesign them entirely. The (rusted) opportunity Scylla processor app raw data enriched data enriched data enriched data client app partner API business app 9
  • 10. “Hey, why not rewrite those 3 Python processor apps into 1 Rust app?” 10
  • 11. The (never tried before) Rust promises 11 A language empowering everyone to build reliable and efficient software. + Secure + Memory and thread safety as first class citizens + No runtime or garbage collector + Easy to deploy + Compiled binaries are self-sufficient + No compromises + Strongly and statically typed + Exhaustivity is mandatory + Built-in error management syntax and primitives + Plays well with Python + PyO3 can be used to run Rust from Python (or the contrary)
  • 12. Efficient software != Faster software + “Fast” meanings vary depending on your objectives. + Fast to develop? + Fast to maintain? + Fast to prototype? + Fast to process data? + Fast to cover all failure cases? “Selecting a programming language can be a form of premature optimization 12
  • 13. Efficient software != Faster software + “Fast” meanings vary depending on your objectives. + Fast to develop? Python is way faster + did that for 15 years + Fast to maintain? Very few people at Numberly do know Rust + Fast to prototype? No, code must be complete to compile and run + Fast to process data? Sure: to prove it, measure it + Fast to cover all failure cases? Definitely: mandatory exhaustivity + error handling primitives “I did not choose Rust to be “faster”. Our Python code was fast enough to deliver their pipeline processing. 13
  • 14. Innovation cannot exist if you don’t accept to lose time. The question is to know when and on what project. 14
  • 15. The Reliable software paradigms + What makes me slow will make me stronger. + Low level paradigms (ownership, borrowing, lifetimes). + Strong type safety. + Compilation (debug, release). + Dependency management. + Exhaustive pattern matching. + Error management primitives (Result). + Explicit return values (Option). 15
  • 16. The Reliable software paradigms + What makes me slow will make me stronger. + Low level paradigms (ownership, borrowing, lifetimes). If it compiles, it’s safe + Strong type safety. Predictable, readable, maintainable + Compilation (debug, release). Compiler is very helpful vs a random Python exception + Dependency management. Finally something looking sane vs Python mess + Exhaustive pattern matching. Confidence that you’re not forgetting something + Error management primitives (Result). Handle failure right from the language syntax + Explicit return values (Option). Clear separation between Some(value) and None “ I chose Rust because it provided me with the programming paradigms at the right abstraction level that I needed to finally understand and better explain the reliability and performance of my application. 16
  • 17. Learning Rust the hard way 17
  • 18. Production is not a Hello World + Learning the syntax and handling errors everywhere + Confluent Kafka + Schema Registry + Avro + Asynchronous latency-optimized design + ScyllaDB multi-datacenter + MongoDB + Kubernetes deployment + Prometheus exporter + Grafana dashboarding + Sentry Scylla processor app Confluent Kafka 18
  • 19. Confluent Kafka Schema Registry + Confluent Schema Registry breaks vanilla Apache Avro deserialization. + Gerard Klijs’ schema_registry_converter crate helps + I discovered performance problems which we worked and have been addressed! + Latency-overhead-free manual approach: 19
  • 20. Apache Avro Rust was broken! + avro-rs crate given to Apache Avro without an appointed committer. + Deserialization of complex schemas was broken... + I contributed fixes to Apache Avro (AVRO-3232+3240) + Now merged thanks to Martin Grigorov! + Rust compiler optimizations give a hell of a boost (once Avro is fixed) + Deserializing Avro is faster than JSON! 20
  • 21. green thread / msg Asynchronous patterns to optimize latency + Tricks to make your Kafka consumer strategy more efficient. + Deserialize your consumer messages on the consumer loop, not on green-threads + Spawning a green-thread has a performance cost + Control your green-thread parallelism + Defer to green-threads when I/O starts to be required Kafka consumer + avro deserializer raw data green thread / msg green thread / msg green thread / msg green thread / msg Scylla enriched data 21
  • 22. Absorbing tail latency spikes with parallelism x16 x2 parallelism load 22
  • 23. Scylla Rust (shard-aware) driver + The scylla-rust-driver crate is mature enough for production + Use a CachingSession to automatically cache your prepared statements + Beware: prepared queries are NOT paged, use paged queries with execute_iter() instead! + Use at least version 0.4.2 if you run a multi-DC cluster! 23
  • 24. Exporting metrics properly for Prometheus + Effectively measuring latencies down to microseconds. + Fine tune your histogram buckets to match your expected latencies! ... 24
  • 25. Grafana dashboarding + Graph your precious metrics right! + ScyllaDB prepared statement cache size + Query and throughput rates + Kafka commits occurrence + Errors by type + Kubernetes pod memory + ... + Visualizing Prom Histograms max by (environment)(histogram_quantile(0.50, processing_latency_seconds_bucket{...})) 25
  • 26. Was it worth it? 26
  • 27. Did I really lose time because of Rust? + I spent more time analyzing the latency impacts of code patterns and drivers’ options than struggling with Rust syntax. + Key figures for this application: + Kafka consumer max throughput with processing? 200K msg/s on 20 partitions + Avro deserialization P50 latency? 75µs + Scylla SELECT P50 latency on 1.5B+ rows tables? 250µs + Scylla INSERT P50 latency on 1.5B+ rows tables? 660µs 27
  • 28. It went better than expected + Rust crates ecosystem is mature, similar to Python Package Index. + 3 Python apps totalling 54 pods replaced by 1 Rust app totalling 20 pods + We helped & worked on making the scylla-rust-driver even better + Token aware policy can fallback to non-replicas for higher availability + Optimized partition key calculations for prepared statements + More to come! + This feels like the most reliable and efficient software I ever wrote! 28
  • 30. Brought to you by FREE VIRTUAL EVENT | OCTOBER 19-20, 2022 The event for developers who care about high-performance, low-latency applications. Register at p99conf.io Follow us on Twitter: @p99conf #p99conf
  • 31. Thank you for joining us today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/