O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Kubernetes and lastminute.com group:
our course towards better scalability
and processes
michele.orsi@lastminute.com
@mich...
An inspiring travel company
2
A tech company to the core
Tech department: 300+ people
Applications: ~100
Database: 4 TB data
Servers: 1400 VMs, 300 phys...
https://www.pexels.com/photo/turtle-walking-on-sand-132936/
Business: "technology is slow"
Technology: "the monolith is the problem"
https://www.flickr.com/photos/southtopia/5702790189
https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/
"... let’s break into microservices"
A lot of issues
● LONG provisioning time
● LACK OF alignment across environments
● LACK OF alignment across applications
●...
An year-long endeavour
● build a new, modern infrastructure
● migrate the search (flight/hotel) product there
... without:...
Our plan
● same architecture across environments
● a common framework to align software
● centralized monitoring/logging, ...
How? Teams and peopleNew teams
https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during...
Our infrastructure and technologyOur infrastructure and technology
https://www.pexels.com/photo/colorful-toothed-wheels-17...
Docker containers
registry.intra/application:v2-090025032017
BASE OS
JAVA JRE
START/STOP SCRIPTS
JAR APPLICATION
● build o...
Kubernetes
● independent from OS/hosts
● isolated env, managed at scale
● self-healing
● externalised configuration
Omega ...
https://www.pexels.com/photo/red-toy-truck-24619/
"Your infrastructure on wheels"
Kubernetes: physical representation
NODE
1
cluster
NODE
2
NODE
70
...
K8S
DOCKER
FLANNELD
ETCD
Ubuntu
K8S
DOCKER
FLANNELD
...
Kubernetes: logical representation
NAMESPACE1
CPU 10
MEM 40GB
cluster
NAMESPACE2
CPU 20
MEM 80GB
NAMESPACE3
CPU 80
MEM 90G...
APP3-PRODUCTION
Kubernetes: our architecture
APP2-PRODUCTION
APP1-PRODUCTION
APP3-PRODUCTION
APP2-PRODUCTION
APP1-PREVIEW
...
APP1-PRODUCTION
Kubernetes: our architecture and choices
POD
collectd
production
applicationfluentd
carbon
18
APP1-PRODUCTION
POD
Monitoring and alerting: grafana + graphite
cluster
graphite
application
Grafana 4
icons from http://w...
Kubernetes: our architecture and choices
APP1-PRODUCTION
deployment
replica-set
app1.lastminute.intra
secret configmap
POD...
Kubernetes: what’s left outside?
● datastores
○ DBs
○ logs
○ metrics
● distributed caches
● distributed locking
● pub-sub
...
1st try (with test app), it seemed to work
https://www.flickr.com/photos/26516072@N00/2194001232
Self-healing
ref: https://technologyconversations.com/2016/01/26/self-healing-systems
application
I am fine, thanks
Hey, h...
Kubernetes contract
"When a container is dead I will restart it"
"When a container is ready I will forward traffic to it"
Kubernetes probes: liveness & readiness
Two questions:
● when can I consider my
container alive?
● when can I consider my
...
/liveness:
● when tomcat container is up
● when ratio active/max threads < threshold
/readiness:
● all the startup jobs ha...
● zero downtime during rollout
● resilience improved
● legacy infrastructure to the rescue in case of problem
2nd try (wit...
... failure ... the big one!
https://www.flickr.com/photos/ghost_of_kuji/2763674926
Problems
● configuration
● infrastructure
● tools
● manual mistakes
● (external) scalability
29
● temporary team focus on objective
● automation
● Go deeper in docker/kubernetes
Another improvement step
30
Pipeline: a huge step forward
microservice = factory.newDeployRequest()
.withArtifact("com.lastminute.application1",2)
lmn...
Pipeline: a huge step forward
● git push
○ continuous integration
○ continuous delivery
pull
jar
build
docker
(gate)
QA
ca...
"Go" deep .. whatever language it takes
https://www.pexels.com/photo/sea-man-person-ocean-2859/
nginx ingress controller problem
NGINX
NGINX
NGINX
LB
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.5
10.0.0.6
NGINX
NGINX
NG...
There’s light ..There’s a light .. at the end
https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-t...
● lead and migration time
● resilience
● root cause analysis
● speed of deployment
● instant and easy scaling
... benefits...
● 70 physical nodes, 1300 pods, 5200 containers
● 20k req/sec in the new cluster
● 35 micro-services migrated in 6 months
...
Yes, we’re hiring!
THANKS
careers.lastminutegroup.com
38
Próximos SlideShares
Carregando em…5
×

Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

574 visualizações

Publicada em

Kubernetes adoption is straight forward when starting from scratch or in public clouds, but what the journey looks like when your starting point is a legacy infrastructure with high-traffic?

In this talk we present our experience that begun almost 1 year ago and challenged everything inside our organisation. Developer teams changed the way they work, product owners benefit from the new speed achieved and the need of new roles emerged in IT department.

We will explain our lessons learnt and the way to get the best out of this solution.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

  1. 1. Kubernetes and lastminute.com group: our course towards better scalability and processes michele.orsi@lastminute.com @micheleorsi Rome, 24-25 March 2017
  2. 2. An inspiring travel company 2
  3. 3. A tech company to the core Tech department: 300+ people Applications: ~100 Database: 4 TB data Servers: 1400 VMs, 300 physical machines Locations: Chiasso, Milan, Madrid, London, Bengaluru 3
  4. 4. https://www.pexels.com/photo/turtle-walking-on-sand-132936/ Business: "technology is slow"
  5. 5. Technology: "the monolith is the problem" https://www.flickr.com/photos/southtopia/5702790189
  6. 6. https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/ "... let’s break into microservices"
  7. 7. A lot of issues ● LONG provisioning time ● LACK OF alignment across environments ● LACK OF alignment across applications ● LACK OF awareness about ops (monitoring, alerting) 7
  8. 8. An year-long endeavour ● build a new, modern infrastructure ● migrate the search (flight/hotel) product there ... without: ● impacting the business ● throwing away our whole datacenter 8
  9. 9. Our plan ● same architecture across environments ● a common framework to align software ● centralized monitoring/logging, with alerts ● zero downtime deployment ● automation everywhere 9
  10. 10. How? Teams and peopleNew teams https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/
  11. 11. Our infrastructure and technologyOur infrastructure and technology https://www.pexels.com/photo/colorful-toothed-wheels-171198/
  12. 12. Docker containers registry.intra/application:v2-090025032017 BASE OS JAVA JRE START/STOP SCRIPTS JAR APPLICATION ● build once, run everywhere ● externalised configuration 12
  13. 13. Kubernetes ● independent from OS/hosts ● isolated env, managed at scale ● self-healing ● externalised configuration Omega paper: http://research.google.com/pubs/pub41684.html 13
  14. 14. https://www.pexels.com/photo/red-toy-truck-24619/ "Your infrastructure on wheels"
  15. 15. Kubernetes: physical representation NODE 1 cluster NODE 2 NODE 70 ... K8S DOCKER FLANNELD ETCD Ubuntu K8S DOCKER FLANNELD ETCD Ubuntu K8S DOCKER FLANNELD ETCD Ubuntu 15
  16. 16. Kubernetes: logical representation NAMESPACE1 CPU 10 MEM 40GB cluster NAMESPACE2 CPU 20 MEM 80GB NAMESPACE3 CPU 80 MEM 90GB NAMESPACE4 CPU 100 MEM 10GB 16
  17. 17. APP3-PRODUCTION Kubernetes: our architecture APP2-PRODUCTION APP1-PRODUCTION APP3-PRODUCTION APP2-PRODUCTION APP1-PREVIEW APP3-PRODUCTION APP2-PRODUCTION APP1-DEVELOPMENT APP3-PRODUCTION APP2-PRODUCTION APP1-QA nonproductionproduction 17
  18. 18. APP1-PRODUCTION Kubernetes: our architecture and choices POD collectd production applicationfluentd carbon 18
  19. 19. APP1-PRODUCTION POD Monitoring and alerting: grafana + graphite cluster graphite application Grafana 4 icons from http://www.flaticon.com collectd carbon 19
  20. 20. Kubernetes: our architecture and choices APP1-PRODUCTION deployment replica-set app1.lastminute.intra secret configmap POD 3 POD 2 POD 1 production 20
  21. 21. Kubernetes: what’s left outside? ● datastores ○ DBs ○ logs ○ metrics ● distributed caches ● distributed locking ● pub-sub 21
  22. 22. 1st try (with test app), it seemed to work https://www.flickr.com/photos/26516072@N00/2194001232
  23. 23. Self-healing ref: https://technologyconversations.com/2016/01/26/self-healing-systems application I am fine, thanks Hey, how are you? Hey, how are you? I have problems 23
  24. 24. Kubernetes contract "When a container is dead I will restart it" "When a container is ready I will forward traffic to it"
  25. 25. Kubernetes probes: liveness & readiness Two questions: ● when can I consider my container alive? ● when can I consider my container ready to receive traffic? spec: containers: livenessProbe: httpGet: path: /liveness readinessProbe: httpGet: path: /readiness deployment.yaml
  26. 26. /liveness: ● when tomcat container is up ● when ratio active/max threads < threshold /readiness: ● all the startup jobs have run .. ongoing never-ending research .. Our choices: framework - k8s 26
  27. 27. ● zero downtime during rollout ● resilience improved ● legacy infrastructure to the rescue in case of problem 2nd try (with production traffic) 27
  28. 28. ... failure ... the big one! https://www.flickr.com/photos/ghost_of_kuji/2763674926
  29. 29. Problems ● configuration ● infrastructure ● tools ● manual mistakes ● (external) scalability 29
  30. 30. ● temporary team focus on objective ● automation ● Go deeper in docker/kubernetes Another improvement step 30
  31. 31. Pipeline: a huge step forward microservice = factory.newDeployRequest() .withArtifact("com.lastminute.application1",2) lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview") lmn_deployCanaryStrategy(microservice,"production") pipeline 31
  32. 32. Pipeline: a huge step forward ● git push ○ continuous integration ○ continuous delivery pull jar build docker (gate) QA canary (gate) QA stable (gate) PREV canary (gate) PREV stable (gate) PROD canary (gate) PROD stable 32
  33. 33. "Go" deep .. whatever language it takes https://www.pexels.com/photo/sea-man-person-ocean-2859/
  34. 34. nginx ingress controller problem NGINX NGINX NGINX LB 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.6 NGINX NGINX NGINX NGINX NGINX 34
  35. 35. There’s light ..There’s a light .. at the end https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/
  36. 36. ● lead and migration time ● resilience ● root cause analysis ● speed of deployment ● instant and easy scaling ... benefits 36
  37. 37. ● 70 physical nodes, 1300 pods, 5200 containers ● 20k req/sec in the new cluster ● 35 micro-services migrated in 6 months ● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes ○ 4 minutes to release 100 instances of a new version ● 2M metrics/minute flows Give me the numbers! 37
  38. 38. Yes, we’re hiring! THANKS careers.lastminutegroup.com 38

×