SlideShare uma empresa Scribd logo
1 de 61
Confidential, Dynatrace, LLC
Architectural commandments for building & running
microservices at scale
Brian Wilson, Product Specialist, Dynatrace
@emperorwilson
Join our Podcast Series bit.ly/pureperf
1. Avoid Anti-Patterns
2. Continuous Deployment
3. Infrastructure Utilization
Microservices Techniques
1. N+1 call
2. N+1 query
3. Payload flood
4. Granularity/Tight Coupling
5. Inefficient Service Flow
6. Dependencies
Common Anti-Patterns
Confidential, Dynatrace, LLC
Our Setup
Service Instance A
Service Client
Request
Service Instance B
Service Instance N
Router/Service
Registry
Register
Our Setup
Confidential, Dynatrace, LLC
Anti-Patterns
Confidential, Dynatrace, LLC
N + 1 Call Pattern
Monolithic Code
public double getQuote(String type) {
double quote=0;
for (Product product: products) {
quote += product.getValue();
}
return quote;
}
N+1 Call Pattern
Works well within 1
process
N+1 Call Pattern
Product Service
Quote Service
1 call to Quote Service
= 44 calls to product
service
Confidential, Dynatrace, LLC
N + 1 Query Pattern
N+1 Query Pattern
N+1 Query Pattern
1 call to Quote Service
= 87 calls to DB
Product Service
Quote Service
Confidential, Dynatrace LLC
slide 16
Payload Flood
Payload Flood
Document creation is
split across multiple
services
Payload Flood
Increasing payload between services
call
Payload Flood
Payload could be significantly reduced by calling
individual services from document service.
Confidential, Dynatrace, LLC
Granularity/Tight Coupling
Granularity
Doc Processor Doc Transformer Doc Signer
Doc Encryption
Doc Shipment
Document Encryption is carved out at a separate
service. May not be the best option to run it as a
separate service
Documents
Tightly coupled. Really Distributed?
Confidential, Dynatrace, LLC
Inefficient Service Flow
(drawing parallels to Web Performance
Optimization)
WPO (Web Performance Optimization)
taught us optimizing resource dependencies
when loading a web page by analyzing
Resource Waterfalls
Especially useful when page loads get very
complex and overloaded:
3rd party dependencies, non optimize
resources, wrong cache settings, loading too
much data too early, …
SFPO (Service Flow Performance Optimization)
has to teach us how to optimize (micro)service
dependencies through Service Flows
Especially useful to identify: inefficient 3rd party services, recursive
call chains, N+1 Query Patterns, loading too much data, no data
caching, … -> sounds very familiar to WPO
Understanding where time within a
service and between service calls is spent
Classical cascading effect of recursive
service calls!
Confidential, Dynatrace, LLC
Dependencies
Look beyond the “Tip of the Iceberg”:
Understanding Dependencies is critical!
Who is depending on me? What is the risk of change?
Confidential, Dynatrace, LLC
Continuous Deployment
Continuous Deployment
Consumer 2
v1
Consumer 1
v1
Micro
service
v1
Consumer 2
v2
Consumer 1
v1
Micro
service
v2
Consumer 2
Consumer 1 Microservice
v2
Microservice
v1
Continuous Deployment
Service identifier = unique Id + version
Semantic versioning
(major.minor.patch)
Consumer 2
Consumer 1 Microservice
v2
Microservice
v1
Continuous Deployment
Consumer 2
Consumer 1 Microservice
v2
Microservice
v1
Continuous Deployment
Gatekeeper
Gatekeeper != O/R mapping service
Beware of N+1 patterns
Best Practice: Proper Tagging of Services
Confidential, Dynatrace, LLC
Real Use Case
2015201420xx
Response Time
2016+
1) 2-Person Project 2) Limited Success
3) Start Expansion
4) Performance
Slows Growth Users
5) Potential Decline?
Scaling a Search Service for Online Sports Club
Early 2015: Monolith Under Pressure
Can‘t scale vertically endlessly!
May: 2.68s 94.09% CPU
Bound
April: 0.52s
From Monolith to Services in a Hybrid-Cloud
Move Front End
to Cloud
Scale Backend
in Containers!
Go live – 7:00 a.m.
Go live – 12:00 p.m.
26.7s Load Time
5kB Payload
33! Service Calls
99kB - 3kB for each call!
171!Total SQL Count
Architecture Violation
Direct access to DB from frontend service
Single search query end-to-end
The fixed end-to-end use case
2.5s (vs 26.7)
5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3!(vs 177) Total
SQL Count
Confidential, Dynatrace, LLC
Infrastructure Utilization
Infrastructure Utilization
Is the load on microservices equally load
balanced?
When do you scale up/down?
• CPU
• Memory
• Load
Use automation process to scale up/down
Basic Infrastructure Utilization Monitoring
Utilization over time correlated with deployments!
Deployment and Infrastructure Dependencies
Keep an eye on active Docker containers
Treat containers just as hosts: CPU,Memory, Traffic, …
Confidential, Dynatrace LLC
N+1 Pattern
58
Product Service
Quote Service
Watch out for N+1 query
Confidential, Dynatrace LLC
Payload Flood
59
Payload between services can impact performance
Confidential, Dynatrace LLC
Granularity
60
Doc Processor Doc Transformer Doc Signer
Doc Encryption
Doc Shipment
Not every functionality needs to be its own microservice
Tightly Coupled
Too specialist services? No Caching?
Service Flow
Optimize your End-to-End Service Flows!
Dependencies
Reduce risk when changing highly depending services!
Continuous Deployment
Consumer 2
v1
Consumer 1
v1
Micro
service
V1
Consumer 2
v2
Consumer 1
v1
Micro
service
V2
Ensure multiple versions of services are supported
Infrastructure Utilization
Ensure equal load across all microservice instances
Source Code:
https://github.com/Dynatrace-Reinhard-Pilz/dt-micro
Free Trials:
Dynatrace FullStack: http://bit.ly/dtsaastrial
AppMon: http://bit.ly/dtpersonal
Join our Podcast Series bit.ly/pureperf

Mais conteúdo relacionado

Mais procurados

NATS for Modern Messaging and Microservices
NATS for Modern Messaging and MicroservicesNATS for Modern Messaging and Microservices
NATS for Modern Messaging and MicroservicesApcera
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)Open Source Consulting
 
WebSocket MicroService vs. REST Microservice
WebSocket MicroService vs. REST MicroserviceWebSocket MicroService vs. REST Microservice
WebSocket MicroService vs. REST MicroserviceRick Hightower
 
The Java Microservice Library
The Java Microservice LibraryThe Java Microservice Library
The Java Microservice LibraryRick Hightower
 
Devoxx fr 2016 - Apache Kafka - Stream Data Platform
Devoxx fr 2016 - Apache Kafka - Stream Data PlatformDevoxx fr 2016 - Apache Kafka - Stream Data Platform
Devoxx fr 2016 - Apache Kafka - Stream Data PlatformPublicis Sapient Engineering
 
5 lessons learned for Successful Migration to Confluent Cloud
5 lessons learned for  Successful Migration to Confluent Cloud5 lessons learned for  Successful Migration to Confluent Cloud
5 lessons learned for Successful Migration to Confluent CloudNatan Silnitsky
 
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...MongoDB
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...Natan Silnitsky
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub ServiceOracle Korea
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitecturePLUMgrid
 
Implementing Microservices with NATS
Implementing Microservices with NATSImplementing Microservices with NATS
Implementing Microservices with NATSApcera
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices AntipatternsC4Media
 
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native EraNATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Erawallyqs
 
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...HostedbyConfluent
 
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQuearyNATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQuearyApcera
 
Advanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKAdvanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKNatan Silnitsky
 
Managing traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopManaging traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopOpsta
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
 

Mais procurados (20)

Istio presentation jhug
Istio presentation jhugIstio presentation jhug
Istio presentation jhug
 
NATS for Modern Messaging and Microservices
NATS for Modern Messaging and MicroservicesNATS for Modern Messaging and Microservices
NATS for Modern Messaging and Microservices
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
 
WebSocket MicroService vs. REST Microservice
WebSocket MicroService vs. REST MicroserviceWebSocket MicroService vs. REST Microservice
WebSocket MicroService vs. REST Microservice
 
The Java Microservice Library
The Java Microservice LibraryThe Java Microservice Library
The Java Microservice Library
 
Devoxx fr 2016 - Apache Kafka - Stream Data Platform
Devoxx fr 2016 - Apache Kafka - Stream Data PlatformDevoxx fr 2016 - Apache Kafka - Stream Data Platform
Devoxx fr 2016 - Apache Kafka - Stream Data Platform
 
5 lessons learned for Successful Migration to Confluent Cloud
5 lessons learned for  Successful Migration to Confluent Cloud5 lessons learned for  Successful Migration to Confluent Cloud
5 lessons learned for Successful Migration to Confluent Cloud
 
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...
MongoDB .local London 2019: Modern Data Backup and Recovery from On-premises ...
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 
Implementing Microservices with NATS
Implementing Microservices with NATSImplementing Microservices with NATS
Implementing Microservices with NATS
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
 
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native EraNATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
 
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
 
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQuearyNATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
 
Advanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKAdvanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UK
 
Managing traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopManaging traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshop
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 

Semelhante a Architectural Commandments for Building & Running Microservices at Scale

Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)Rick Hightower
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2abramsm
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservicesMithun Arunan
 
Upgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfUpgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfVladimirRadzivil
 
The Network Fabric for Your Digital Transformation
The Network Fabric for Your Digital TransformationThe Network Fabric for Your Digital Transformation
The Network Fabric for Your Digital TransformationAmazon Web Services
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Min Fang
 
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...Amazon Web Services
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
 
Service Mesh Talk for CTO Forum
Service Mesh Talk for CTO ForumService Mesh Talk for CTO Forum
Service Mesh Talk for CTO ForumRick Hightower
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceAdrian Cockcroft
 
Mastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to MicroservicesMastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to MicroservicesJosh Evans
 
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdfQConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdfSimranjyotSuri
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservicedevopsdaysaustin
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Ransomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceRansomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceSagi Brody
 
How Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message QueueHow Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message Queueconfluent
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 

Semelhante a Architectural Commandments for Building & Running Microservices at Scale (20)

Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
 
Upgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfUpgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdf
 
The Network Fabric for Your Digital Transformation
The Network Fabric for Your Digital TransformationThe Network Fabric for Your Digital Transformation
The Network Fabric for Your Digital Transformation
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
 
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Service Mesh Talk for CTO Forum
Service Mesh Talk for CTO ForumService Mesh Talk for CTO Forum
Service Mesh Talk for CTO Forum
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft Conference
 
Mastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to MicroservicesMastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to Microservices
 
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdfQConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Ransomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceRansomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-Service
 
How Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message QueueHow Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message Queue
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Architectural Commandments for Building & Running Microservices at Scale

Notas do Editor

  1. Original recording can be found here - https://info.dynatrace.com/apm_dtm_all_17q2_wc_microservices_en_registration.html
  2. PurePerformance Podcast: http://bit.ly/pureperf Or http://www.spreaker.com/user/pureperformance
  3. Today, we are going to look at three important areas to focus on when moving from monolith to to microservices. Most of the data we’re going to look at today comes from experiences shared with us by our customers. Not just stories our customers related to us, but, especially in anti-patterns area – events we see occur over and over and over again based on data they share with us in in our free trial. First, we’ll look at some common anti-patterns and how to avoid them. Some of these anti-patterns are a bit newer, but many of them are the same old common problems that everybody insists on migrating to their microservices environments Next we’ll look at some important considerations for Continuous Deployment and take a look at a real use case An the last area we’ll look at is Infrastructure Utilization of your microservices environment.
  4. Let’s start with the Anti-Patterns. We’ll cover 6 of them today.
  5. Download the github repo with this microservice app: https://github.com/Dynatrace-Reinhard-Pilz/dt-micro Dynatrace Free Trial: http://bit.ly/dtsaastrial AppMon Free Trial: http://bit.ly/dtpersonal In order to get screenshots of some of the problem patterns presented here, we setup our own simple microservices environment to re-create them. You can recreate the environment yourself and try these out yourself – we have it in a github repository and I’ll share the link at the end. This environment runs on a single host, and what you’ll see is that with the right tools and the right frame of mind, you can very easily detect these problems very early on. Application is a controller that spins up multiple processes To make this actually microservices, we have the registry/router service Each service registers itself with the registry on startup The Service client, whether web request or request from another service, hits the router, the router sends the request to the proper service Created this with spring boot to make this easy Spring boot offers a rich set of technologies with which we can easily integrate We can deploy the same binaries to each instance and control what they’re doing through configuration. Download from github and try these out yourself
  6. Example of the code if you want to display - probably hard to read in most situations.
  7. While we’ve set up a lot of this up with Spring Boot, the anti-patterns I’m about to discuss have nothing specifically to do with Spring Boot. These anti-patterns are true regardless of what technology you are using. It could be jave, .net, node, or anything else. These are not technology specific, but rather architecturally based. In fact, microservices quite often span multiple languages and technologies. That’s part of what makes them great. Anti-patterns, however, are not great.
  8. Let’s start with the N+1 Call pattern. For both this and the next, I wish they had been called the 1+N pattern as it more accurately describes what’s going on, however, N+1 is already engrained.
  9. Let’s start with this getQuote function, leftover from the monolithic code. IN this monolithic example, you’re making a call to an API called getQuote. The main function of getQuote is to go through the list of products and sum up the value of their prices. This works kind of fine when running in a single, monolithic-type process because you can iterate through the products and prices because all the info is in cache and you’re just accessing local memory and it’s all fast. Overhead is very minimal when something like this is setup properly in a single process. So, what typically happens when everybody gets all excited and moves this to microservices.
  10. You end up with something like this. This is a screenshot of a transaction flow. We see the a web request on the left side making a call to the quote service. The quote service makes a call to the product service. The product service retrieves the product price from the database which it then passes back to the quote service. The quote service sums up all the prices and sends the response back to the client. Everybody is happy because we have microservices and we can scale. We can even automate. There a few problems with this, though. One call to the quote service results in 44 calls to the product service. Product service has very minimal business logic built in it, so it’s only handling the price for one product at a time. This quote has 44 items in it, so 44 calls to the product service Adds a lot of overhead To Network To Product service, because there’s no telling what kind of request the quote service, or any other new service, will introduce. Also, separate queries are being made for each product – more on that next Quote service is waiting for all the data and then has to process it, which could tie it up from servicing more calls from clients. Better way: Take some of the business logic from the quote service and move it to the product service. Product service should be more intelligent to take the full list of products, sum the prices and return the total to the quote service. This also frees up the load on the Quote service, making it more responsive to the end clients. Reason this might have been designed like this: Quote service and product service are typically different teams, maybe different management. Product service not talking to its customers, so they’re just writing the most basic of functions. Quote service not talking to Product service team to let them how they’re going to use them – this leads to unintended abuse of the product service because it’s too simple Communicate to build better services and avoid the N+1 call problem. Even with these improvements, we still see a lot going on with the database…
  11. The N+1 query problem is very similar to the N+1 Call pattern, however this one involves the database. A very simple example is if your application has to get the the employment start data for all of your employees, the application first makes a call for all employ IDs, then for each employee id, makes a query to get the start date.
  12. N+1 query is very similar to N+1 call, but they are separate problems. Fixing one doesn’t fix the other. Being aware of the pattern, though can help you to avoid introducing it anywhere. In this call trace screenshot – 1 transaction instance - Recursive calls of quote service to product service which makes a DB acquisition call and single product query for each individual item in the quote. It’s easy to spot the N+1 Problem visually. Though the query itself is fast, you’re adding network load, connection constrains and loading the DB. Even if there’s no problem right now, chances are conditions will arise where this will blow up in your face.
  13. Going back to our transaction flow, we see one single call into the quote service results in 87 calls to the database. So, in this one spectacularly horrible example, we have the quote service making 44 calls to the Product Service, and the product Service making 87 calls to the database. There are a few thing to consider: If you eliminate the N+1 call pattern on the product service, there’s a good chance you’ll eliminate the N+1 query pattern to the DB, but not necessarily. Though the product service may handle the quote service intelligently, you can still end up executing a single query for price for each individual item. Write a better query. Or Think about leveraging in-memory cache like memcache. Multiple product services instance are using cache to get the data: Much faster Eliminates network calls to the db Consider that when you move from a monolithic to microservices architecture, where you have all of this scaling capability, one of the immediate impacts is that you, at least initially, blow away all of your caching strategies. Microservices doesn’t mean that you should stop using a caching strategy.
  14. Payload Flood Architecture follow a hierarchy model where top level service does it’s part of the work, fire and forget the entire payload to the next service that does it’s tiny bit of work and so on down the line. In the end, you have a big data stream and unlike the waterfall in this picture, you don’t have gravity to get the payload from microservice to microservice. Also, you have the client back at the top who you have to get the data back to.
  15. Our example app: Created a small set of services to create a big report Document service gets the initial dataset from the DB, fire and forget to doc-processor Doc processor run’s it’s tiny bit of code, sends the entire payload do the transformer, and so on down the line. On the surface, doesn’t look bad. You can scale any tier Since document service, and each service below, fires and forgets, it’s free to work on the next request. But, if we look into the detail…
  16. We see the trade off. Huge amount of data being transferred among the services Some might push back and say they have a very robust network and network payloads are not a problem This defines an environmental condition which must be true in order for the services to work. What happens if: Due to unforeseen circumstances, there’s a temporary restriction on network? Your microservice gets deployed to a different environment. Your company expands to mulitple public, private or hybrid data centers which are not all created equal. You can’t control the network. It’s like the movie speed: the bus had to drive around the streets of LA at a speed of over 50mph – other wise if blows up. It’s a condition for their survival. Same thing goes with your services – if the network slows down, your process blows up. Another problem is when dealing with a lot of this kind of data is the data is result of serialized objects. And serialized objects eventually need to get deserialized. This consumes CPU, and if you’re really unlucky, you block resources and create synchronization issues. And I want to stress, this is all one request. You’re switching from a monolithic application to a microserves to scale well. You have to think about the maximum amount of transactions you want to support per second and estimate, based on these numbers, what you can actually support.
  17. To fix this, in our example app, we got rid of the hierarchical model and replaced it with a parent/child model where more intelligence was built into the documet service, which in turn orchestrates the data between the different tiers. A single large payload is no longer moved between tiers. Instead, document service sends only the data each service needs in order to do it’s bit of work. Also, we can run the job in parallel because the doc processor, transformer and signer don’t require the work of each other in order to do their job. This may look like a disadvantage because now the Document Processor has to be aware the entire time, orchestrating, and it will not be free to handle the next request. However, this can be overcome by still leveraging a fire & forget type call, where it monitors a queue that the other services send a message to when they’re done. This then allows not only parallel processing, but allows it to be asynchronous. With this change, we gain the following performance improvements: No longer tied to an environmental network condition Reduces Network payload Run parts of the job in parallel Run parts of the job asynchronously.
  18. The next anti-pattern we constantly come across is the concept of granularity and too tight coupling
  19. A great candidate for a microservice is one that both has a very well defined API and, as a component itself, doesn’t require calls to any other services. Looking at a transaction flow of our document service, in order to illustrate the concept, we created a step called encryption. In this example, every step of the document creation workflow makes at least 1 call to Doc Encryption. Since it’s a well defined API and doesn’t require calls to other services, and you can scale it, it looks like a good idea. Also, you don’t have to maintain encryption code and keys on each service. Looks like a good plan Let’s look at breaking an API call like this into a microservice is not a good idea. In this setup, there are a lot of rest calls to the encryption services. The service does not consumer a lot of CPU, but there are a lot of calls over the network to it. A colleague saw something like this in an engagement. There were a lot of calls to a fast service, but there was a lot of network overhead. When he asked why they needed to separate out the service, they said “so we can scale and spin up multiple instances when we need to.” They ended their discussion when my colleague asked how many instances they’re running, to which they answered, ‘we’ve only ever needed one”. So, that begs the question, if you have a service that is only running 1 instance and doesn’t have to scale, why are you breaking it out and adding the cost of running a service external to the other services that use it? Additionally, architects should look at the ideas and try to figure out if there’s a better way. In the specific example of encryption, we can just make the inter-tier calls with SSL instead of having to make a call to encryption, simplifying everything. Keep a look out on for ways to simplify.
  20. Too Tight Coupling: 99% of the calls to Journey service make a call to Check Destination. This means, basically, for every call to Journey service, Journey Service has to make a network call to the check destination service and the check destination services has to be running and responsive. However if 90, 99, 100% of calls are going to another service, you are making things too complicated. You are creating a nano-service. Can anybody make a good case for nano-services? You are introducing complexity and adding two more points of failure – network problems and CheckDestination availability. If there’s this tight of a coupling between services, you’ve split too much. Either join them back together, or, see if you can leverage caching.
  21. +- 10 years of WPO to learn from Steve Souders wrote the book High Performance Web Sites in 2007. Thanks to people like Steve, Pat Meenan, Paul Irish, Nicole Sullivan, Tammy Everts and many more, and all the people in the trenches toiling away at WPO, we have a wealth of knowledge that we can apply to service flows. This knowledge includes both problem patterns, some of which translate from web/browser patterns to services, as well as the concept of visually analyzing the performance in flows and waterfalls to identify problem patterns.
  22. Browser waterfalls help us highlight the problems we have – makes the very easy to spot, especially in very complex web pages – visualization is key. WPO helps with minifying and combining JS and CSS files to reduce round trips, optimizing images, ensure proper use of browser caching, loading critical elements first instead of large bulk request, etc.
  23. We like to call this Service Flow Performance Optimization, or SFPO We can apply a lot of these learnings to optimize our service flow. Caching, bulking, Teach us how to optimize microservices dependencies - visualize it.
  24. Like WPO, when we get into especially complex service flows, visualizing them is key. We can use these flows to identify all the things in the box – like WPO waterfall. Similar patterns and parallels to WPO
  25. Another way to view flows is in a waterfall/PurePath type view, just like browser waterfalls. This allows to visually see what services are called by an initiating call. We can easily see how much time, what kind of time and how much network time was spend where. This makes it very easy to spot patterns…
  26. Without even looking at the details, this pictures should raise a concern in anybody. Recursive call chain – easy to detect when you can see it, just like WPO
  27. Don’t just focus on your own service and it’s immediate neighbors, somebody has to look at the whole thing it can get huge and out of control If you just look at your part of the front, it looks great If you look at the big picture, you’ll find that there is a lot more complexity involved. Do you know who is dependent on your service. Do you know what the services you are dependent on are dependent on? Is there a service 5 layers down that is critical to your existence?
  28. Understanding who your customers are, who is dependent on you. In Monolithic code, this is easy. Microservices complicate the picture exponentially. You have no good way to know unless you are monitoring who is making calls into you. Service back traces clearly display all of service who depend on your service. Armed with this info: You know who’s at risk if you make changes You know who to collaborate with when coming up with changes (think back to the reason N+1 Call pattern happens) By collaborating with your dependents, you’ll write a much better, more performant, and more useful service
  29. So, now that we covered a bunch of anit-patterns, let’s continue onto the topic of Continuous deployment.
  30. So, not that you’ve taken all the necessary steps to ensure your services are as performant and well tuned as possible, what you need to always consider is that you are not deploying a big “GA” version. The next version might already be in the pipeline, a few days away from production, and another version is being conceptualized right behind that. The version you are pushing out right now is impermanent. So, when you first deploy, everything is great. You have multiple consumers using your microservice, and since this is the first time it’s being used, they’re all using the same version. Everybody is happy. However, your service is so popular and everybody wants to use it that you’re forced to update it and add new functionality. When you do that, you run the risk of this. And since you don’t want to have any downtime when you deploy, you do this all live. Consumer 2 has already changed in tandem to take advantage of the new functionality, but consumer 1, not needing the new functionality, didn’t change and now they’re broken. 50% of your users are now failing.
  31. So, to avoid this issue, you have to be prepared to run multiple versions of your service. This could be for a short time or for a long a time. It depends on how long you want to support the old version and how long other consumers need to make their changes. If consumer one is a mobile app, you’ll, for example, you’ll need longer support than if these consumers were other internal services. There are a few ways you can do this. If you’re clever enough, you can introduce capability layers where the either of the services can talk to the new service and that new service has a backward compatible protocol layer. However, you don’t have to worry about this if you deploy multiple version of the service. TO do this, you need more than just a unique identifier for the service like “document service”. You need to add a version to this. So, the consumer should always be looking for the, let’s say Document Service, with a specific version. We’d ideally suggest each service has a minimum and maximum version supported definition, and when talking versions, semantic versioning should be used. – major, minor, patch at minimum. They should be meaningful. Most people would expect a change in the major version to indicate a change in the API, and therefore a likely incompatibility with consumers running an older version, where as the minor/patch should still be compatible with the older versions.
  32. This concept can extend when we move to the database. Service 2 may introduce some changes that require changes in the database schema, and this in turn may break calls made from microservice v1. So, again, 50% of your users are getting an error page. We can’t maintain 2 databases, so, what do we do about this?
  33. You basically have to inject some type of mediator, or gatekeeper, between the services and the database. The gatekeeper is the only one who can talk to the database. Whoever wants to talk to the DB has to talk to the gatekeeper. The gatekeeper runs on a specific version and has the compatibility layer built into it. Each version of the service can talk to the gatekeeper, and the gatekeeper, in turn, will create the queries compatible with the database. So, this works nice, however, let’s keep in mind the N+1 problems. You have to make your gatekeeper clever. Don’t just create O/R mapping services. A gatekeeper that is only offering an object oriented API to create statements to execute on the DB is a bad idea. Sooner or later you’ll run into N+1 again. Instead, options would include singling out functionality from the services and moving them to the gatekeeper to make the gatekeeper more intelligent, or perhaps your services don’t even interact with the database. Instead, you leverage a third party caching mechanism that takes care of interacting with the database and knows how to distribute the cache among the multiple instances.
  34. Most platforms support tags. Very important to use them and monitor the entities as well as being aware of the tags. Monitoring has to be able to see these tags. By seeing tags in monitoring, you’ll know which version of your service has an issue.
  35. Let’s pivot to a real use case from one of our customers.
  36. This was a search service for an online sports club in Europe. Users could go on and search for local soccer sports clubs and go there. It started as a 2 person project, it was used a little bit and had a little bit of success. IN 2014, they decided to expand the service to different cities in Europe. As they did this, they saw an increase in users to the site. And, as you can imagine, as the users increased, they sad a significant increase in response time. They, predictably also started seeing a drop off in users and response time got worse.
  37. They had a monolithic .net app that connected to a SQL server in the back end. IN april 2015, the response time was decent. Next month, when they expanded, the response time increased. Not terrible, but not great. But what they saw is that the application was CPU bound and they could not scale it vertically.
  38. So, they thought “hey - microservices and cloud will save the day“ Don‘t we all? So, they moved the frontend logic into the public cloud and the backend search service into Containers. The idea was to be able to host these containers in the public cloud, deploying the front end where they need it globally, with the ability to scale the back end as needed. So, they quickly modified the app to break it out into microservices. They could now scale and their problems were solved.
  39. On Go Live Date with the new architecture everything looked good at 7AM where not many folks were yet online! Response time was acceptable, users were mostly satisfied and bounce rate was ok.
  40. By noon – when the real traffic started to come in the picture was completely different. User Experience across the globe was bad. Response Time jumped from 2.5 to 25s and bounce rate trippled from 20% to 60%
  41. The backend service itself was well tested. The problem was that they never looked at what happens under load „end-to-end“. Turned out that the frontend had direct access to the database to execute the initial query when somebody executed a search. The returned list of search result IDs was then iterated over in a loop. For every element a „Micro“ Service call was made to the backend which resulted in 33! Service Invokations for this particular use case where the search result returned 33 items.N+1 Lots of wasted traffic and resources as these Key Architectural Metrics show us 33 service calls N+1 call problem 99KB – payload call 171 queries N+1 Query problem.
  42. So, they went back to the drawing board. They made the front end more intelligent. They re-architected the backend and got rid of the n+1 problems. Payload went down as a result, eliminating payload flood. They fixed the problem by understanding the end-to-end use cases and then defined backend service APIs that provided the data they really needed by the frontend. This reduced roundtrips, elimiated the architectural regression and improved performance and scalability
  43. So, now that you’ve taken care of your performance issues and have made sure that you can deploy safely and compatibly, you are faced with another very important concept - how do you utilize your infrastructure optimally.
  44. Balancing the load of your microservices is one of the keys to properly utilizing your infrastructure. With so many instances running, keeping an eye on balancing is much more difficult. And if your services aren’t balanced, you: Run the risk of the overloaded instances encountering performance problems Waste money by running too many instances. If you can spread your load evenly, you can likely reduce the number of instances running. Somebody has to pay for all of the compute you are using. Balancing is more than just balancing throughput. Make sure CPU and Memory consumption is balanced as well. If you can get a handle on your balancing, you can more easily set parameters for scaling. Take time to identify and establish criteria along the dimensions of CPU, Memory and Load for when you are to scale up. More importantly, identify the same thresholds for when you can comfortable scale down. Not scaling down defeats the purpose of microservices and containers and costs a lot of money. Once you scale up or down, make sure the load balances across the new configuration. Automate all this. Why do this manually.
  45. One thing a lot of people don’t realize is that monitoring your infrastructure in a monolithic app is very different than monitoring in a microservices architecture. You might be monitoring all of the hosts in your environment, but even if all of the hosts are green, it doesn’t mean your system is running well. It just means that none of your hosts or processes have a problem, and that’s at least good. Your red instances might indicate a non-critical full disk partition, in which case, who really cares except the infrastructure team. With the green hosts, your host and process may be healthy, but your code might be terrible. We need to see all the way through from infrastructure to service/code performance. Also, Most of this type do monitoring only shows how you are doing right now. You have to look at over time know what happens to impact
  46. Historical and trend data is very important. You need to know when deployments happen in your monitoring, so that if you see strange behavior, you can tie it to the change. If you can’t see that, you’re stuck. Also need to be able to compare performance same hour last week to know if something is our of the ordinary. Time based monitoring is a huge improvement, but it still leaves a lt out. We don’t know what impacts what, and what dependencies there are.
  47. Understand your dependencies – know which services are running on which processes on which hosts in which data centers. Know how services talk to each other, processes interact with each other. In Monolithic, If a single server is having a problem, you’ll know which services are impacted. You know which dependents might be impacted. In a microservices environment, your hosts can go well over 100, even into the thousands, processes and service instances into the 10s of thousands. There’s no way you’re going to know what all is dependent on each other. And if you figure it out today, tomorrow it will change. Dependency monitoring/mapping is very important. This means you have to be able to map this in your monitoring. If a single server goes down in a microservices environment, who are the immediate neighbors that are impacted. Who are the services further down the line that might be impacted. Think of that tip of the iceberg slide we looked at earlier. If one of those nodes out to the far left had a problem, are we really going to know that it was the one to impact the node on the far right? If you have 5 machines with high CPU, you’re likely going to look for root cause on those machines. If you know the dependencies, you’ll be able to see that the problem is a network issue on a downstream component. Green & Red lights are good. Timeline metrics, especially with deployment markers, are good, but in the microservices world, you have to monitor the dependencies as well.
  48. [For Docker crowds] It’s important to monitor your entire Docker stack. Monitor the containers, the hosts they run on, the instance counts, throughput, etc. Utilize dependency mapping to be able to look at your Docker components as both an ecosystem as well as individual components.
  49. Important to monitor each container as if it was a host – traffic, CPU, memory Also important to monitor the hosts the containers are running on. Here is also where you can see the load distribution. If I’m running 10 containers, but my load is not balanced like the picture here, there’s a good chance I may be able to spin down to 6 or 7 containers if I balance the load. That’ll save us money.
  50. And last but not least, in order to know what to do here, you have to monitor your instance. Chart the load distribution, chart the number of instances, chart the resource utilization across your services. If you don’t take the time to monitor your system, you wont’ know what’s going on and you will end up paying much more for the compute resources you are using. The point of moving to, say, docker and the cloud, is to improve performance and save money. You can only do this if you are monitoring performance and resource utilization. Also, it’s important not to set-and-forget this. Review your monitoring strategies and collected data often to make see if you can optimize better. A final thought here – when you move to a microservices architecture, the components that keep your business logic up and running are as important as the business logic itself.
  51. Lessons Learned!
  52. N+1 patterns are like cockroaches – they’ll outlive us all Approach N+1 patterns like crack – don’t do it.