Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge.
In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando Technology department. We will highlight in the context of Kubernetes: AWS service integrations, our IAM/OAuth infrastructure, cluster autoscaling, continuous delivery and general developer experience. The talk will cover our most important learnings and we will openly share failure stories.
Presented on 2017-09-28 at AWS Tech Community Days in Cologne.
Gen AI in Business - Global Trends Report 2024.pdf
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - AWS Tech Community Days Cologne
1. AWS TECH COMMUNITY DAYS
2017-09-28
HENNING JACOBS
@try_except_
Kubernetes on AWS
@ZalandoTech
2. 2
ZALANDO
15 markets
6 fulfillment centers
21 million active customers
3.6 billion € net sales 2016
200 million visits per month
13,000 employees in Europe
5. 5
FOUR ERAS AT ZALANDO TECH
ZOMCATPHP STUPS KUBERNETES
2010 2015 2016
Data center
WAR
AWS
Docker
Cloud Formation
Low level (AWS API)
AWS
Docker
Kubernetes manifest
High abstraction level
Data center
PHP files
20. 20
ASSIGNING AWS IAM ROLE TO POD
kind: Deployment
spec:
template:
metadata:
annotations:
# annotation for kube2iam
iam.amazonaws.com/role: "app-myapp-role"
spec:
containers:
- name: ...
...
https://github.com/jtblin/kube2iam
⇒ AWS SDKs just work as expected
32. 32
GETTING STARTED
Goal: use Kubernetes API as primary interface for AWS
• Mate, External DNS
• Kubernetes Ingress Controller for AWS
• kube2iam
⇒ we wrote new components
to achieve our goal
35. 35
GETTING STARTED
Other questions we asked ourselves..
• Single AZ vs. Multi AZ? ⇒ Multi AZ
• Federation? ⇒ No, not ready yet
• Overlay network? ⇒ Flannel, “rock solid”
• Authnz? ⇒ OAuth, webhook
38. 38
STABILITY: AWS RATE LIMITS
• Ran into the same trap twice (Mate & Ingress Ctrl)
• Kubernetes core causes many calls (e.g. EBS)
• Monitoring (ZMON) needs to poll AWS
⇒ One of our biggest pain points with AWS
(and all workarounds are hard and/or ugly)
39. 39
STABILITY: LIMIT RANGE
kubectl describe limitrange
Name: limits
Namespace: default
Type Resource Min Max Default Req Default Limit Max Limit/Request Ratio
---- -------- --- --- ----------- ------------- -----------------------
Container memory - 64Gi 100Mi 1Gi -
Container cpu - 16 100m 3 -
http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html#resources
⇒ Mitigate errors on OSI layer 8 ;-)
44. 44
ONBOARDING
• Many new concepts to grasp vs. 200 teams
• Kubernetes Training (2h)
• Documentation
• Recorded Friday Demos
• Support Channels (chat, mail)
50. 50
LINKS
Running Kubernetes in Production on AWS
http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html
Kube AWS Ingress Controller
https://github.com/zalando-incubator/kube-ingress-aws-controller
External DNS
https://github.com/kubernetes-incubator/external-dns
PostgreSQL Operator
https://github.com/zalando-incubator/postgres-operator
Zalando Cluster Configuration
https://github.com/zalando-incubator/kubernetes-on-aws
List of Organizations using Kubernetes on AWS
https://github.com/hjacobs/kubernetes-on-aws-users