SlideShare uma empresa Scribd logo
1 de 45
Patterns for building resilient and scalable
microservices platform on AWS
Boyan Dimitrov,
Platform Automation Lead @ Hailo
@nathariel
Back in 2011 we started simple
We quickly found out that supporting monoliths is hard:
• Hard to maintain the codebase
• Hard to build new features
• Hard to scale the dev teams
Failure to deliver business value
Frontend Backend
MySQL
So in 2013 we ended up
doing…
At present we have
• Microservices ecosystem (99.9% written in Go)
• Designed specifically for the cloud – different building blocks and
components will constantly be in flux, broken or unavailable
• 1000+ AWS instances spanning multiple regions
• 200+ services in production
The Platform under the hood
TeVPC
Auto Scaling
S3
OrchestrationEnv DNS
Release AutoScaling
Discovery
Monitoring
CFEC2
Route 53
Redshift
ComputeEIP
Routing
Core
Platform
Provisioning
Login
Services
Cloud Provider
Whisper
Config
• Lowest level building blocks
• We mostly use basic PaaS components and services as they cover most of our
needs
• We expect every underlying component to fail and we designed for this
eu-west-1
Message Bus+
Go Services
Proxy Layer
C*
us-east-1
Proxy Layer
C*
Go Services
Message Bus+
eu-west-1
Proxy Layer
Message Bus
eu-west-1a
Services
eu-west-1b eu-west-1c
Shared Infra
RabbitMQ RabbitMQ RabbitMQ
API API API
Go Go Go
x many
C*
NSQ
ZK
C*
NSQ
ZK
C*
NSQ
ZK
x many x many
• We use auto scaling groups for everything
 Guarantees each component can be rebuilt automatically
 Including our database clusters that run on ephemeral storage ( we do keep
6 copies of each piece of data in 2 regions )
• Minimum of 3 AZs in every region
• Every workflow is automated
• Every component has to be self healing and scalable
Basic principles
• Our “cloud provider abstraction” layer
• Main purpose is infrastructure and workflow automation and discovery
• Has a global view of everything happening across our infrastructure
• Provides additional capabilities on top of AWS
• Runs in a dedicated VPCs across two regions
OrchestrationEnv DNS
Release AutoScalingComputeEIP
Whisper
It all started by a small challenge we had to overcome:
Payment providers whitelist sources
EIP Service
Elastic IP Provisioning Service
NAT
LIVE
NAT
FOO
51.x.x.1 nat live
51.x.x.2 nat live
51.x.x.3 nat live
50.x.x.5
1
nat foo
Maintains elastic IP pools across all
our accounts and matches them against
auto scaling groups and environments
auto scaling group auto scaling group
We do a lot of server discovery
• Both external and internal orchestration tools rely on AWS APIs for server
discovery
• Puppet has AWS integration for clustering infra
• Exponential back-off mitigates the issue but does not solve it if you have
many clients
“RequestLimitExceeded”.
Compute service to the rescue
• A distributed cache of all compute instances and their meta data
• Powerful query API ( Very Fast!)
• Main interface for creating new compute instances
• Reconciles any changes in any AWS account within seconds
Compute Service
Other
providers
Internal
tools
External
toolsServices
Everything in our platform emits events
So naturally we want to capture all external events as well!
Whisper Service
It’s all about event driven compute – think Lambda but within our platform
Events
Events
Hundreds of publishers & subscribe
NSQ Topics
Events
External
sources
Actions
To subscribe to any new event source
we have to only change a single service
What about AWS resource access?
temporary
security
credentials
AWS Account X AWS Account Y
service
temporary
security
credentials
role role
• Each external orchestration
service instance has a
“global” view of our
infrastructure
• Relies heavily on STS to
operate across different
accounts and regions
• Each service has a
designated role for every
account and region
AWS Auth under the hood
Shared environments create contention. We decided to boost our
developers productivity and give them on demand environments
ENV ENV
ENV
Environment Service
SIE MIE
Infrastructure
Core Platform
Single server on
AWS
Hundreds of servers
/ single AWS region
CloudFormation
Orchestration layer
On demand environments
Single Instance Environment Multi instance environment
Environment Service
SIE MIE
Infrastructure
Core Platform
Single server on
AWS
Hundreds of servers
/ single AWS region
Release Service
ANY ENV
(PROD)
Services
Config
*Data
clone
ETA: ~12 min ETA: ~40 min
CloudFormation
Orchestration layer
On demand environments
Single Instance Environment Multi instance environment
Environment Service
SIE MIE
Infrastructure
Core Platform
Single server on
AWS
Vagrant support
Hundreds of servers
/ single AWS region
Multi-region
environments
Release Service
ANY ENV
(PROD)
Services
Config
*Data
clone
ETA: ~12 min ETA: ~40 min
CloudFormation
Orchestration layer
On demand environments
Single Instance Environment Multi instance environment
SIE
Pre Prod
All of this so we can do
SIE
MIE
MIE
SIE
MIE
SIE Live
Orchestration
SIE
Preparing for…
SIE
MIE
MIE
SIE
MIE
SIE Live
Orchestration
• The only services directly aware of our cloud provider specifics – gives us a lot of
flexibility and let us introduce changes quickly
• Each of them fulfills a very specific task and together create powerful workflows
• Nothing else in our platform is aware of the underlying cloud layer
• We did not envision being “cloud agnostic” – it just happened
Provides the most essential platform functions for every service:
• Service Discovery
• Service Provisioning
• Routing & Load Balancing
• Authentication/Authorization
• Monitoring
• Configuration
Service Provisioning
Provisioning Service
Build Pipeline
Amazon S3
Provisioning Manager
Provisioning Service
Docker Registry
Provisioning overview
Instance Instance
Process Container
Auto Scaling GroupAuto Scaling Group
Service deployment specifics
• Each service is decoupled from the rest and deployed individually
• We run multiple services on the same instance but each service is
deployed in at least 3 AZs
• We rely on auto scaling groups for organizing and scaling our
workload
• We use static partitioning to match a service to an auto scaling group
and this results in non optimal resource utilisation (25% - 50%)
Deploying a service
service name version
auto scaling group
Coming soon: Elastic resource pools and QoS scheduling
Elastic Resource Pool
ECS
Agent
ECS
Agent
ECS
Agent
ECS
Agent
ECS
Agent
ECS
Agent
QoS Scheduler
eu-west-1a eu-west-1b eu-west-1c
AWS
Cloud Provider
ECS
Cluster Manager
instance instance instance instance instance instance
So what does this mean?
Elastic resource pool
75-80%
Utilization
eu-west-1a eu-west-1b eu-west-1c
One word – such difference!
instance instance instance instance instance instance
Why building our own scheduler?
• Service Priority
• Service specific runtime metrics
• Interference
• Cloud awareness ( availability zones, pool elasticity…)
Running services in a pay as you go fashion will soon be a reality as much as
todays on demand compute
We want a cloud-native scheduler that is aware of the cloud specifics and our
microservices ecosystem:
• Self-contained units of execution
• Built around business capabilities or domain objects
• Small enough to be rewritten in a few days
• They are all about adding business value
Service interactions – not as scary as it looks!
A microservice under the hood
Logic
Storage
Library for abstracting service-
to-service comms
service-layer
Handler platform-layer
Self-configuring external
service adapters
Service
• Service to service
communication libs
• Discovery
• Configuration
• A/B testing capabilities
• Monitoring & Instrumentation
• … and much more
Any service gets for free:
Microservices are all about tooling
Live request tracing
You need to identify your main KPIs
Thanks!
Get a taxi home on us:
@nathariel
boyan@hailocab.com
@HailoTech

Mais conteúdo relacionado

Mais procurados

Scaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSScaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSBoyan Dimitrov
 
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...Dirk Hoerig
 
AWS ELB Tips & Best Practices
AWS ELB Tips & Best PracticesAWS ELB Tips & Best Practices
AWS ELB Tips & Best PracticesChinaNetCloud
 
Greetings from AWS User Group Taiwan
Greetings from AWS User Group TaiwanGreetings from AWS User Group Taiwan
Greetings from AWS User Group TaiwanCliff Chao-kuan Lu
 
How IT at Getty Images Brokers Cloud Services
How IT at Getty Images Brokers Cloud ServicesHow IT at Getty Images Brokers Cloud Services
How IT at Getty Images Brokers Cloud ServicesRightScale
 
Building a Modern Microservices Architecture at Gilt: The Essentials
Building a Modern Microservices Architecture at Gilt: The EssentialsBuilding a Modern Microservices Architecture at Gilt: The Essentials
Building a Modern Microservices Architecture at Gilt: The EssentialsC4Media
 
The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...Lucas Jellema
 
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...Amazon Web Services
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...confluent
 
Nurturing a large GST ecosystem on AWS - Anil Sharma, Chicago
Nurturing a large GST ecosystem on AWS - Anil Sharma, ChicagoNurturing a large GST ecosystem on AWS - Anil Sharma, Chicago
Nurturing a large GST ecosystem on AWS - Anil Sharma, ChicagoAWS Chicago
 
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...AWS Chicago
 
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...Amazon Web Services
 
Simplified migration with CloudEndure
Simplified migration with CloudEndureSimplified migration with CloudEndure
Simplified migration with CloudEndureTushar Gupta
 
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleManaging Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleRightScale
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayVMware Tanzu
 
Meetup #4: AWS ELB Deep dive & Best practices
Meetup #4: AWS ELB Deep dive & Best practicesMeetup #4: AWS ELB Deep dive & Best practices
Meetup #4: AWS ELB Deep dive & Best practicesAWS Vietnam Community
 

Mais procurados (20)

Scaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSScaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWS
 
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...
Micro-Service Architectures in E-Commerce environments with SPHERE.IO / comme...
 
MicroServices sur AWS
MicroServices sur AWSMicroServices sur AWS
MicroServices sur AWS
 
AWS ELB Tips & Best Practices
AWS ELB Tips & Best PracticesAWS ELB Tips & Best Practices
AWS ELB Tips & Best Practices
 
Greetings from AWS User Group Taiwan
Greetings from AWS User Group TaiwanGreetings from AWS User Group Taiwan
Greetings from AWS User Group Taiwan
 
How IT at Getty Images Brokers Cloud Services
How IT at Getty Images Brokers Cloud ServicesHow IT at Getty Images Brokers Cloud Services
How IT at Getty Images Brokers Cloud Services
 
Building a Modern Microservices Architecture at Gilt: The Essentials
Building a Modern Microservices Architecture at Gilt: The EssentialsBuilding a Modern Microservices Architecture at Gilt: The Essentials
Building a Modern Microservices Architecture at Gilt: The Essentials
 
Intro to Serverless
Intro to ServerlessIntro to Serverless
Intro to Serverless
 
The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...
 
104 meets cloud
104 meets cloud104 meets cloud
104 meets cloud
 
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...
Partner Solutions: Rackspace - Rethinking Your Migration Strategy to Maximize...
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
 
Nurturing a large GST ecosystem on AWS - Anil Sharma, Chicago
Nurturing a large GST ecosystem on AWS - Anil Sharma, ChicagoNurturing a large GST ecosystem on AWS - Anil Sharma, Chicago
Nurturing a large GST ecosystem on AWS - Anil Sharma, Chicago
 
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...
Kubernetes for Sales Engineers & Solutions Engineers–You Too Can Leverage Kub...
 
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...
How to Say Yes to Self-Service in the Cloud and Become an IT Hero (ENT217) | ...
 
Simplified migration with CloudEndure
Simplified migration with CloudEndureSimplified migration with CloudEndure
Simplified migration with CloudEndure
 
Sundog Media Toolkit
Sundog Media Toolkit Sundog Media Toolkit
Sundog Media Toolkit
 
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScaleManaging Container-as-a-Service and Docker Clusters in the Cloud with RightScale
Managing Container-as-a-Service and Docker Clusters in the Cloud with RightScale
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
 
Meetup #4: AWS ELB Deep dive & Best practices
Meetup #4: AWS ELB Deep dive & Best practicesMeetup #4: AWS ELB Deep dive & Best practices
Meetup #4: AWS ELB Deep dive & Best practices
 

Semelhante a Patterns for building resilient and scalable microservices platform on AWS

Satrtup Bootcamp - Scale on AWS
Satrtup Bootcamp - Scale on AWSSatrtup Bootcamp - Scale on AWS
Satrtup Bootcamp - Scale on AWSIdan Tohami
 
Managed Cloud Services for Siebel CRM on Amazon AWS
Managed Cloud Services for Siebel CRM on Amazon AWSManaged Cloud Services for Siebel CRM on Amazon AWS
Managed Cloud Services for Siebel CRM on Amazon AWSMilind Waikul
 
Global Azure Bootcamp: Azure service fabric
Global Azure Bootcamp: Azure service fabric Global Azure Bootcamp: Azure service fabric
Global Azure Bootcamp: Azure service fabric Luis Valencia
 
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)Amazon Web Services
 
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Amazon Web Services
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsScott Miao
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
Pace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlayPace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlayAmazon Web Services
 
Infrastructure Security: Your Minimum Security Baseline
Infrastructure Security: Your Minimum Security BaselineInfrastructure Security: Your Minimum Security Baseline
Infrastructure Security: Your Minimum Security BaselineAmazon Web Services
 
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?Amazon Web Services LATAM
 
SAP on Amazon web services
SAP on Amazon web servicesSAP on Amazon web services
SAP on Amazon web servicescloudnonstop
 
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSGAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSAmazon Web Services
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)Amazon Web Services
 
AWS Summit Singapore - More Containers, Less Operations
AWS Summit Singapore - More Containers, Less OperationsAWS Summit Singapore - More Containers, Less Operations
AWS Summit Singapore - More Containers, Less OperationsAmazon Web Services
 
Serverless Architectures on AWS Lambda
Serverless Architectures on AWS LambdaServerless Architectures on AWS Lambda
Serverless Architectures on AWS LambdaSerhat Can
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 

Semelhante a Patterns for building resilient and scalable microservices platform on AWS (20)

Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
Satrtup Bootcamp - Scale on AWS
Satrtup Bootcamp - Scale on AWSSatrtup Bootcamp - Scale on AWS
Satrtup Bootcamp - Scale on AWS
 
Managed Cloud Services for Siebel CRM on Amazon AWS
Managed Cloud Services for Siebel CRM on Amazon AWSManaged Cloud Services for Siebel CRM on Amazon AWS
Managed Cloud Services for Siebel CRM on Amazon AWS
 
Global Azure Bootcamp: Azure service fabric
Global Azure Bootcamp: Azure service fabric Global Azure Bootcamp: Azure service fabric
Global Azure Bootcamp: Azure service fabric
 
Amazon ECS
Amazon ECSAmazon ECS
Amazon ECS
 
AWS Migration Day - Windows Workloads
AWS Migration Day - Windows WorkloadsAWS Migration Day - Windows Workloads
AWS Migration Day - Windows Workloads
 
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
 
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
Pace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlayPace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlay
 
Infrastructure Security: Your Minimum Security Baseline
Infrastructure Security: Your Minimum Security BaselineInfrastructure Security: Your Minimum Security Baseline
Infrastructure Security: Your Minimum Security Baseline
 
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?
AWS Cloud Experience CA: ¿Porqué Correr WorkLoads Microsoft & Oracle en AWS?
 
Enterprise Workloads on AWS
Enterprise Workloads on AWSEnterprise Workloads on AWS
Enterprise Workloads on AWS
 
SAP on Amazon web services
SAP on Amazon web servicesSAP on Amazon web services
SAP on Amazon web services
 
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECSGAM307_Ubisoft How For Honor Runs Using Amazon ECS
GAM307_Ubisoft How For Honor Runs Using Amazon ECS
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
 
AWS Summit Singapore - More Containers, Less Operations
AWS Summit Singapore - More Containers, Less OperationsAWS Summit Singapore - More Containers, Less Operations
AWS Summit Singapore - More Containers, Less Operations
 
Serverless Architectures on AWS Lambda
Serverless Architectures on AWS LambdaServerless Architectures on AWS Lambda
Serverless Architectures on AWS Lambda
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 

Último

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Patterns for building resilient and scalable microservices platform on AWS

  • 1. Patterns for building resilient and scalable microservices platform on AWS Boyan Dimitrov, Platform Automation Lead @ Hailo @nathariel
  • 2.
  • 3. Back in 2011 we started simple We quickly found out that supporting monoliths is hard: • Hard to maintain the codebase • Hard to build new features • Hard to scale the dev teams Failure to deliver business value Frontend Backend MySQL
  • 4. So in 2013 we ended up doing…
  • 5. At present we have • Microservices ecosystem (99.9% written in Go) • Designed specifically for the cloud – different building blocks and components will constantly be in flux, broken or unavailable • 1000+ AWS instances spanning multiple regions • 200+ services in production
  • 7. TeVPC Auto Scaling S3 OrchestrationEnv DNS Release AutoScaling Discovery Monitoring CFEC2 Route 53 Redshift ComputeEIP Routing Core Platform Provisioning Login Services Cloud Provider Whisper Config
  • 8. • Lowest level building blocks • We mostly use basic PaaS components and services as they cover most of our needs • We expect every underlying component to fail and we designed for this
  • 9. eu-west-1 Message Bus+ Go Services Proxy Layer C* us-east-1 Proxy Layer C* Go Services Message Bus+
  • 10. eu-west-1 Proxy Layer Message Bus eu-west-1a Services eu-west-1b eu-west-1c Shared Infra RabbitMQ RabbitMQ RabbitMQ API API API Go Go Go x many C* NSQ ZK C* NSQ ZK C* NSQ ZK x many x many
  • 11. • We use auto scaling groups for everything  Guarantees each component can be rebuilt automatically  Including our database clusters that run on ephemeral storage ( we do keep 6 copies of each piece of data in 2 regions ) • Minimum of 3 AZs in every region • Every workflow is automated • Every component has to be self healing and scalable Basic principles
  • 12. • Our “cloud provider abstraction” layer • Main purpose is infrastructure and workflow automation and discovery • Has a global view of everything happening across our infrastructure • Provides additional capabilities on top of AWS • Runs in a dedicated VPCs across two regions OrchestrationEnv DNS Release AutoScalingComputeEIP Whisper
  • 13. It all started by a small challenge we had to overcome: Payment providers whitelist sources
  • 14. EIP Service Elastic IP Provisioning Service NAT LIVE NAT FOO 51.x.x.1 nat live 51.x.x.2 nat live 51.x.x.3 nat live 50.x.x.5 1 nat foo Maintains elastic IP pools across all our accounts and matches them against auto scaling groups and environments auto scaling group auto scaling group
  • 15. We do a lot of server discovery • Both external and internal orchestration tools rely on AWS APIs for server discovery • Puppet has AWS integration for clustering infra • Exponential back-off mitigates the issue but does not solve it if you have many clients “RequestLimitExceeded”.
  • 16. Compute service to the rescue • A distributed cache of all compute instances and their meta data • Powerful query API ( Very Fast!) • Main interface for creating new compute instances • Reconciles any changes in any AWS account within seconds Compute Service Other providers Internal tools External toolsServices
  • 17. Everything in our platform emits events So naturally we want to capture all external events as well!
  • 18. Whisper Service It’s all about event driven compute – think Lambda but within our platform Events Events Hundreds of publishers & subscribe NSQ Topics Events External sources Actions To subscribe to any new event source we have to only change a single service
  • 19. What about AWS resource access?
  • 20. temporary security credentials AWS Account X AWS Account Y service temporary security credentials role role • Each external orchestration service instance has a “global” view of our infrastructure • Relies heavily on STS to operate across different accounts and regions • Each service has a designated role for every account and region AWS Auth under the hood
  • 21. Shared environments create contention. We decided to boost our developers productivity and give them on demand environments ENV ENV ENV
  • 22. Environment Service SIE MIE Infrastructure Core Platform Single server on AWS Hundreds of servers / single AWS region CloudFormation Orchestration layer On demand environments Single Instance Environment Multi instance environment
  • 23. Environment Service SIE MIE Infrastructure Core Platform Single server on AWS Hundreds of servers / single AWS region Release Service ANY ENV (PROD) Services Config *Data clone ETA: ~12 min ETA: ~40 min CloudFormation Orchestration layer On demand environments Single Instance Environment Multi instance environment
  • 24. Environment Service SIE MIE Infrastructure Core Platform Single server on AWS Vagrant support Hundreds of servers / single AWS region Multi-region environments Release Service ANY ENV (PROD) Services Config *Data clone ETA: ~12 min ETA: ~40 min CloudFormation Orchestration layer On demand environments Single Instance Environment Multi instance environment
  • 25.
  • 26. SIE Pre Prod All of this so we can do SIE MIE MIE SIE MIE SIE Live Orchestration
  • 28. • The only services directly aware of our cloud provider specifics – gives us a lot of flexibility and let us introduce changes quickly • Each of them fulfills a very specific task and together create powerful workflows • Nothing else in our platform is aware of the underlying cloud layer • We did not envision being “cloud agnostic” – it just happened
  • 29. Provides the most essential platform functions for every service: • Service Discovery • Service Provisioning • Routing & Load Balancing • Authentication/Authorization • Monitoring • Configuration
  • 31. Provisioning Service Build Pipeline Amazon S3 Provisioning Manager Provisioning Service Docker Registry Provisioning overview Instance Instance Process Container Auto Scaling GroupAuto Scaling Group
  • 32. Service deployment specifics • Each service is decoupled from the rest and deployed individually • We run multiple services on the same instance but each service is deployed in at least 3 AZs • We rely on auto scaling groups for organizing and scaling our workload • We use static partitioning to match a service to an auto scaling group and this results in non optimal resource utilisation (25% - 50%)
  • 33. Deploying a service service name version auto scaling group
  • 34. Coming soon: Elastic resource pools and QoS scheduling Elastic Resource Pool ECS Agent ECS Agent ECS Agent ECS Agent ECS Agent ECS Agent QoS Scheduler eu-west-1a eu-west-1b eu-west-1c AWS Cloud Provider ECS Cluster Manager instance instance instance instance instance instance
  • 35. So what does this mean? Elastic resource pool 75-80% Utilization eu-west-1a eu-west-1b eu-west-1c One word – such difference! instance instance instance instance instance instance
  • 36. Why building our own scheduler? • Service Priority • Service specific runtime metrics • Interference • Cloud awareness ( availability zones, pool elasticity…) Running services in a pay as you go fashion will soon be a reality as much as todays on demand compute We want a cloud-native scheduler that is aware of the cloud specifics and our microservices ecosystem:
  • 37. • Self-contained units of execution • Built around business capabilities or domain objects • Small enough to be rewritten in a few days • They are all about adding business value
  • 38. Service interactions – not as scary as it looks!
  • 39. A microservice under the hood Logic Storage Library for abstracting service- to-service comms service-layer Handler platform-layer Self-configuring external service adapters Service • Service to service communication libs • Discovery • Configuration • A/B testing capabilities • Monitoring & Instrumentation • … and much more Any service gets for free:
  • 40. Microservices are all about tooling
  • 41.
  • 43. You need to identify your main KPIs
  • 44.
  • 45. Thanks! Get a taxi home on us: @nathariel boyan@hailocab.com @HailoTech

Notas do Editor

  1. Seamless user experience
  2. Cloud
  3. We built our custom provisioning system and we started by running a number of services on a single instance Initially we were running services as normal processes on the instance but this started causing noisy neighbour problems Several months ago we gradually started moving to containers aiming for isolation and resource control capabilities.
  4. We want an elastic resource pool where services are scheduled on a need to basis We don’t want to manage services manually and leave that to a smart scheduler