SlideShare uma empresa Scribd logo
1 de 25
@mattklein123
Lyft's Envoy:
Embracing a Service Mesh
Matt Klein / @mattklein123, Software Engineer @Lyft
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
lyft-service-mesh
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
@mattklein123@mattklein123
Lyft ~5 years ago
PHP / Apache
monolith
MongoDB
InternetClients AWS ELB
Simple! No microservices! (but still not that simple)
@mattklein123@mattklein123
Lyft ~3 years ago
PHP / Apache
monolith
(+haproxy/nsq)
MongoDB
Internet
Clients
AWS external
ELB
DynamoDB
AWS internal
ELBs
Python services
Not simple! Microservices! With monolith!
(and some haproxy/nsq)
@mattklein123@mattklein123
Lyft’s microservice architecture problems 3 years ago
● Multiple Languages and frameworks.
● Many Protocols (HTTP/1, HTTP/2, gRPC, databases, caching, etc.).
● Black box load balancers (AWS ELB).
● Lack of consistent Observability (stats, tracing, and logging).
● Partial or no implementations of retry, circuit breaking, rate limiting,
timeouts, and other distributed systems best practices.
● Minimal Authentication and Authorization.
● Per language libraries for service calls.
● Extremely difficult to debug latency and failures.
● Developers did not trust the microservice architecture.
@mattklein123@mattklein123
Lyft’s architecture problems 3 years ago
A really big and confusing mess...
@mattklein123@mattklein123
What is Envoy and the service mesh?
The network should be transparent to applications. When
network and application problems do occur it should be easy to
determine the source of the problem.
@mattklein123@mattklein123
Service mesh refresher
Service A
Sidecar proxy
Service B
Sidecar proxy
Service A
Sidecar proxy
Service A
Sidecar proxy
Service A
Sidecar proxy
Service C
Sidecar proxy
Service A
Sidecar proxy
Service D
Sidecar proxy
@mattklein123@mattklein123
Envoy
● Out of process architecture
● High performance / low latency code base
● L3/L4 filter architecture
● HTTP L7 filter architecture
● HTTP/2 first
● Service discovery and active/passive health checking
● Advanced load balancing
● Best in class observability (stats, logging, and tracing)
● Authentication and authorization
● Edge proxy
@mattklein123@mattklein123
Observability
● Observability is by far the most important thing that Envoy and the service
mesh provides.
● Having all traffic transit through Envoy provides a single place to:
○ Produce consistent statistics for every hop.
○ Create and propagate a stable request ID / tracing context.
○ Consistent logging.
○ Distributed tracing.
@mattklein123@mattklein123
Lyft today
Legacy monolith
Internet
Clients
Front / edge
Python services
Obs, obs, obs, obs, obs, obs...
Go services
MongoDB
DynamoDB
Stats / tracing /
logging
Envoy manager
(xDS server)
Redis
External partners
@mattklein123@mattklein123
Per service auto-generated panel
Links to interesting
data
Clickable traces
from top-level
panel
Per-caller
information
@mattklein123@mattklein123
Distributed tracing
@mattklein123@mattklein123
Logging
@mattklein123@mattklein123
Service to service template dashboard
Template with drop down for every service
@mattklein123@mattklein123
Edge proxy
Per-upstream cluster RPS Per-upstream cluster 5xx Per-upstream cluster timings
@mattklein123@mattklein123
Global health dashboard
@mattklein123@mattklein123
Envoy thin clients @Lyft
from lyft.api_client import EnvoyClient
switchboard_client = EnvoyClient(
service='switchboard'
)
msg = {'template': 'breaksignout'}
headers = {'x-lyft-user-id': 12345647363394}
switchboard_client.post("/v2/messages", data=msg, headers=headers)
● Abstract away egress port
● Request ID/tracing propagation
● Guide devs into good timeout, retry, etc. policies
● Similar thin clients for Go and PHP
@mattklein123@mattklein123
Envoy config management via xDS APIs
● Envoy is a universal data plane
● xDS == * Discovery Service (various configuration APIs). E.g.,:
○ LDS == Listener Discovery Service
○ CDS == Cluster Discovery Service
● Both gRPC streaming and JSON/YAML REST via proto3!
● Central management system can control a fleet of Envoys avoiding per-proxy
config file hell
● Global bootstrap config for every Envoy, rest taken careof by the
management server
● Envoys + xDS + management system == fleet wide traffic management
distributed system
@mattklein123@mattklein123
Envoy config management via xDS APIs @lyft
Cluster manager
Listener
manager
Route manager
Legacy
discovery
service
Envoy manager
service
Envoy static
config repo
Service
manifests
S3
Registration cron
jobs
SDS
CDS
RDS
LDS
Only need a very tiny bootstrap config for each envoy...
@mattklein123@mattklein123
Lyft’s Envoy deployment
● 100s of services
● 10Ks of hosts
● 5-10M mesh RPS
● Majority h2
● All edge, StS, and vast majority of external partners
● MongoDB, DynamoDB, Spanner, Redis
● Evolving configuration management system as we move to K8s
@mattklein123@mattklein123
Envoy adoption
And lots more not listed...
@mattklein123@mattklein123
Why Envoy + Q&A
● Quality + velocity
● Extensibility
● Eventually consistent configuration API
● No “open core” / paid premium version. It’s all there
● Community, community, community
Critical mass has nearly been achieved. Becoming too costly to not use?
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/lyft-
service-mesh

Mais conteúdo relacionado

Mais de C4Media

Mais de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Lyft's Envoy: Embracing a Service Mesh

  • 1. @mattklein123 Lyft's Envoy: Embracing a Service Mesh Matt Klein / @mattklein123, Software Engineer @Lyft
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ lyft-service-mesh
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. @mattklein123@mattklein123 Lyft ~5 years ago PHP / Apache monolith MongoDB InternetClients AWS ELB Simple! No microservices! (but still not that simple)
  • 5. @mattklein123@mattklein123 Lyft ~3 years ago PHP / Apache monolith (+haproxy/nsq) MongoDB Internet Clients AWS external ELB DynamoDB AWS internal ELBs Python services Not simple! Microservices! With monolith! (and some haproxy/nsq)
  • 6. @mattklein123@mattklein123 Lyft’s microservice architecture problems 3 years ago ● Multiple Languages and frameworks. ● Many Protocols (HTTP/1, HTTP/2, gRPC, databases, caching, etc.). ● Black box load balancers (AWS ELB). ● Lack of consistent Observability (stats, tracing, and logging). ● Partial or no implementations of retry, circuit breaking, rate limiting, timeouts, and other distributed systems best practices. ● Minimal Authentication and Authorization. ● Per language libraries for service calls. ● Extremely difficult to debug latency and failures. ● Developers did not trust the microservice architecture.
  • 7. @mattklein123@mattklein123 Lyft’s architecture problems 3 years ago A really big and confusing mess...
  • 8. @mattklein123@mattklein123 What is Envoy and the service mesh? The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.
  • 9. @mattklein123@mattklein123 Service mesh refresher Service A Sidecar proxy Service B Sidecar proxy Service A Sidecar proxy Service A Sidecar proxy Service A Sidecar proxy Service C Sidecar proxy Service A Sidecar proxy Service D Sidecar proxy
  • 10. @mattklein123@mattklein123 Envoy ● Out of process architecture ● High performance / low latency code base ● L3/L4 filter architecture ● HTTP L7 filter architecture ● HTTP/2 first ● Service discovery and active/passive health checking ● Advanced load balancing ● Best in class observability (stats, logging, and tracing) ● Authentication and authorization ● Edge proxy
  • 11. @mattklein123@mattklein123 Observability ● Observability is by far the most important thing that Envoy and the service mesh provides. ● Having all traffic transit through Envoy provides a single place to: ○ Produce consistent statistics for every hop. ○ Create and propagate a stable request ID / tracing context. ○ Consistent logging. ○ Distributed tracing.
  • 12. @mattklein123@mattklein123 Lyft today Legacy monolith Internet Clients Front / edge Python services Obs, obs, obs, obs, obs, obs... Go services MongoDB DynamoDB Stats / tracing / logging Envoy manager (xDS server) Redis External partners
  • 13. @mattklein123@mattklein123 Per service auto-generated panel Links to interesting data Clickable traces from top-level panel Per-caller information
  • 16. @mattklein123@mattklein123 Service to service template dashboard Template with drop down for every service
  • 17. @mattklein123@mattklein123 Edge proxy Per-upstream cluster RPS Per-upstream cluster 5xx Per-upstream cluster timings
  • 19. @mattklein123@mattklein123 Envoy thin clients @Lyft from lyft.api_client import EnvoyClient switchboard_client = EnvoyClient( service='switchboard' ) msg = {'template': 'breaksignout'} headers = {'x-lyft-user-id': 12345647363394} switchboard_client.post("/v2/messages", data=msg, headers=headers) ● Abstract away egress port ● Request ID/tracing propagation ● Guide devs into good timeout, retry, etc. policies ● Similar thin clients for Go and PHP
  • 20. @mattklein123@mattklein123 Envoy config management via xDS APIs ● Envoy is a universal data plane ● xDS == * Discovery Service (various configuration APIs). E.g.,: ○ LDS == Listener Discovery Service ○ CDS == Cluster Discovery Service ● Both gRPC streaming and JSON/YAML REST via proto3! ● Central management system can control a fleet of Envoys avoiding per-proxy config file hell ● Global bootstrap config for every Envoy, rest taken careof by the management server ● Envoys + xDS + management system == fleet wide traffic management distributed system
  • 21. @mattklein123@mattklein123 Envoy config management via xDS APIs @lyft Cluster manager Listener manager Route manager Legacy discovery service Envoy manager service Envoy static config repo Service manifests S3 Registration cron jobs SDS CDS RDS LDS Only need a very tiny bootstrap config for each envoy...
  • 22. @mattklein123@mattklein123 Lyft’s Envoy deployment ● 100s of services ● 10Ks of hosts ● 5-10M mesh RPS ● Majority h2 ● All edge, StS, and vast majority of external partners ● MongoDB, DynamoDB, Spanner, Redis ● Evolving configuration management system as we move to K8s
  • 24. @mattklein123@mattklein123 Why Envoy + Q&A ● Quality + velocity ● Extensibility ● Eventually consistent configuration API ● No “open core” / paid premium version. It’s all there ● Community, community, community Critical mass has nearly been achieved. Becoming too costly to not use?
  • 25. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/lyft- service-mesh