More information and detailed schemas > http://bit.ly/2KjOSi7 (unclickable link - please copy and paste)
One of the most structural choices we made while building OVH Managed Kubernetes service was to deploy our customers’ clusters over our own ones. Kubinception indeed…
In this post we are relating our experience running Kubernetes over Kubernetes, with hundreds of customers’ clusters. Why did we choose this architecture? What are the main stakes with such a design? What problems did we encounter? How did we deal with those issues? And, even more important, if we had to take the decision today, would we choose again to do the Kubinception?
19. CNCF BordeauxKubinception
Node port
Kubinception client side networking
Incoming
connections
● Access thanks to the node IP
(or with Round-Robin DNS)
● Routed with port number
● Sensitive to node failures
21. CNCF BordeauxKubinception
Kubinception client side networking
API Server
connections
External → API Server
● OVH IPLB
● Node Port
● Ingress nginx + SNI routing
● Full TCP connexion
no SSL termination before API Server
OVH IPLBNode Port
Adminnode
Ingress
Nginx
kubectl
com
m
ands
22. CNCF BordeauxKubinception
Kubinception client side networking
API Server
connections
Node initiate a TCP tunnel
● called WormHole from Kubernikus project
Connections from Kube components are
routed through the tunnel
N
ot accessible
from
clients nodes
due
to
private
network
26. CNCF BordeauxKubinception
ETCD as a pod?
● Easy to bootstrap ● Hard to maintain thousands of them
● Use local, non persistent storage as
default
● Too much risk of total quorum loss
using persistent volumes
28. CNCF BordeauxKubinception
ETCD with an operator?
● Built by the community
● Handle etcd lifecycle:
○ creation
○ destruction
○ resizing
○ failover
○ rolling upgrades, backups…
● Use local, non persistent storage as
default
● Too much risk of total quorum loss
using persistent volumes
64. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
A
R
P
65. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
who has 10.100.0.2 ?
10.0.0.1
00:00:00:00:00:01
10.100.0.2
00:00:00:00:21:42
A
R
P
66. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
who has 10.100.0.2 ?
10.0.0.1
00:00:00:00:00:01
10.100.0.2
00:00:00:00:21:42
A
R
P
67. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
10.100.0.2 is-at
00:00:00:00:21:42
10.0.0.1
00:00:00:00:00:01
10.100.0.2
00:00:00:00:21:42
A
R
P
68. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
10.100.0.2 is-at
00:00:00:00:21:42
10.0.0.1
00:00:00:00:00:01
10.100.0.2
00:00:00:00:21:42
A
R
P
69. CNCF BordeauxKubinception
Network - Proxy ARP
CBR0BREX
00:00:00:00:21:42
10.10.0.1/32
00:00:00:00:25:45
10.100.0.1/24
00:00:00:00:00:02
10.100.0.2/32
00:00:00:00:00:01
10.0.0.1/32
NodeETCD
Pod
SYN/ACK 10.100.0.2
10.0.0.1
00:00:00:00:00:01
10.100.0.2
00:00:00:00:21:42
10.100.0.2
00:00:00:00:21:42
A
R
P
73. CNCF BordeauxKubinception
KubeProxy - IPTables
● Chained Process / Not incremental
● Locked during update
● Time spent to add 1 rule when svc count increase
● For 20k svc (160k rules) : 5 hours!
74. CNCF BordeauxKubinception
KubeProxy - IPTables
Routing performances
1 service 1k service 10k services 50k services
First
Service
575μs 614μs 1023μs 1821μs
Middle
Service
575μs 602μs 1048μs 4174μs
Last
Service
575μs 631μs 1050μs 7077μs
75. CNCF BordeauxKubinception
KubeProxy - IPVS
● Hashing vs Chains
● Better Load Balancing algorithms
● Weighted / RR / LeastConn / src&dst hashing / …
● Health Checks / Connections retries…
● IPTables is a swiss knife where IPVS is a purposed one
77. CNCF BordeauxKubinception
Bandwidth performances
KubeProxy - IPVS vs IPTables
#service 1 1k 5k 10k 25k 50k
First First Last First Last First Last First Last First Last
IPTables
(MB/s)
67 64 56 50 39 15 6 0 0 0 0
IPVS
(MB/s)
65 62 54 54 54 43 43 30 29 24 24