Toshiaki Hatano presented on integrating VXLAN support natively in Linux to address the VLAN ID limit in CloudStack. VXLAN allows for more isolated guest networks by using 16 million VXLAN network identifiers instead of the 4096 VLAN IDs. The implementation strategy is to initially target KVM hypervisors with Linux bridging, and add a VXLAN isolation method and VXLAN guest network driver while keeping most of the existing VLAN logic. This would allow CloudStack to provide larger virtual private cloud deployments with network isolation comparable to VLANs but without being restricted by the VLAN ID limit.
2. 6/24/2013 2
• Toshiaki Hatano
• Network Engineer, and Technical Account Manager at Verio
• Employee of NTT Communications
o a leading telecommunication company in Japan
About me
3. 6/24/2013 3
• We’re using CloudStack
• As core component of
our Public Cloud Service
CloudStack and Us
Cloudn•
• We’re providing both
Basic and Advanced zone.
• Planning to provide VPC.
4. 6/24/2013 4
• Advanced Zone
o have more functionality
• NAT, FW, LB, VPN
• VPC
o Isolation required
• For each guest network
• For each VPC tier
• Isolation Method: VLAN
o VLAN ID is limited
• Only 4096
• Should be identical in a zone
o # of Domains are limited by VLAN
• A domain require at least one VID
Problem: VLAN ID limit
VPC
Public Network
Virtual
Router
VPC
Tier
VM VM VM VM
VPC
Tier
Guest
Network
VM VM
Virtual
Router
Isolated
Advanced
Zone
5. 6/24/2013 5
• VXLAN
• VLAN like Layer 2 encapsulation over UDP
• being standardized in IETF
• 16M isolated network
• Why?
• Open source implementation exists in Linux kernel
• Work in distributed manner, just like VLAN
• Learning bridge
• 1:N tunneling
• UDP encapsulation
• No need of expensive network device to support
VXLAN and Why?
6. 6/24/2013 6
VXLAN 1:N tunnel
Host
VM
vxlanYethX
brethX-Y
vnet
Underlying Network
VMVM
(not associated
with VXLAN Y)
(1)
(2)
① If multicast or broadcast or Unicast but host (Src) doesn’t know mapping
VXLAN uses Multicast
Host (Dst) learn mapping between VM and Host (Src)
② If Unicast and Host (Src) learned mapping between VM and Host (Dst)
VXLAN uses Unicast
*1
*1: If underlying Network supports IGMP/MLD snooping and/or Multicast routing.
7. 6/24/2013 7
• Initial target
• KVM hypervisor with “Bridge” (not Open vSwitch)
• Only for Guest Network
• Share logic/UI-flow with VLAN as much as possible
1. Assign VNI range for zone while zone creation
2. Allocate VNI for network while network creation
3. Automatically create VXLAN interface and connect it to bridge
when first VM in network created
• To handle difference
• Add isolation method “VXLAN”
• Add Guru “VxlanGuestNetworkGuru”
• Add code like “if( isolationmethod == “VXLAN” ) …”
to every code assuming VLAN, outside Guru
Implementation strategy
8. 6/24/2013 8
CloudStack KVM VLAN – bridging Overview
KVM
Host
Public Network
Internet
cloudbrX
ethX
VR
vnetX
vnetX
brethX-Y
VM
vnetX brethX-Y
KVM
Host
Guest Network (VLAN encap)
ethX.Y ethX.Y
VM
vnetX
ethX ethX
9. 6/24/2013 9
CloudStack KVM VXLAN– bridging Overview
KVM
Public Network
Internet
cloudbrX
ethX
VR
vnetX
vnetX
brethX-Y
VM
vnetX brethX-Y
KVM
VXLAN encapsulated
VM
vnetX
cloudbrX cloudbrX
ethX ethX
vxlanY vxlanY
10. 6/24/2013
10
Requirement:
KVM/Bridge (not Open vSwitch)
Linux kernel 3.7 or later
VXLAN kernel module and iproute2 supporting
Recent Linux distribution satisfy this.
Fedora 17
Ubuntu 13
Etc.
User flow – (1) Setup KVM
14. 6/24/2013
14
Packet capture
KVM 1
vxlanX
KVM 2 KVM 3
vxlanX vxlanX
eth eth eth
VM 1
VR
VM 2 VM 3
1) Ping from VM1 to VM2
(captured from vxlanX on KVM1)
2) Ping from VM1 to broadcast address
(captured from vxlanX on KVM1)
17. 6/24/2013
17
• We’re adding new network isolation method: “VXLAN”
• The goal is to provide bigger substitute of VLAN
• And make as little change in UI/UX as possible
Summary
Special Thanks:
Jamie Gritton: Verio Inc.
Junji Arakawa: NTT Communications Corp.
Hello.Good morning everyone.My name is Toshiaki Hatano.I’m currently working at Verio, a hosting company, as Network Engineer and Technical Account Manager.I’m also employee of NTT Communications, a leading telecom company in Japan.I don’t have much things to be written here, because I’m very new in this industry.I was university student specializing networking in Japan, before I join the company last year.
Before start talking about VXLAN, As Mr. childer mentioned in key note.We are operator.And we’d like to be developer now.
Let me share some background of this speech.I believe this is common problem for the people who providing IaaS.Right diagram illustrate networking of advanced zone.It’s obvious that Advanced Zone have more functionality.You can have your domain dedicated private network.You can do NAT, Firewall, Load Balancing, and VPN within CloudStack.And that’s most important one, you can setup your Virtual Private Cloud in Advanced Zone.I heard some customers are really demanding VPC function in IaaS.But Advanced zone require isolation between Guest Networks.Additionally, in VPC, tiers should be isolated each other.We’re using VLAN to isolate networks and I believe VLAN is most typical method to isolate networks.VLAN ID should be identical within a zone.A domain consume at least one VLAN ID.VLAN ID is limited to 4096. It may decreased by switch spec.In our case, we’re allocating a domain per customer account.So, we cannot put more than 4096 customer in a zone.MAC address table
There’re already many solution for CloudStack that could solve that 4k VLAN ID limitation.Like Private VLAN, Security Group isolation for advanced zone, Q-in-Q, Open vSwitch tunnel using GRE, many proprietary SDN solutions.Open sources are good. But currently all introduce some limitation.For example, prohibit broadcast at all.There’re open source implementation that we’re ready to use… At least for Linux based hypervisors.It’s UDP encapsulation, so what the underlying network devices have to do is just pass UDP packet. We can use common network device.Open sourceNo full mesh required
I’d like to explain a little bit detail about 1:N tunneling.When we create vxlan interface, it’s bonded to some actual interface. Let’s call the actual interface with “ethX”, you could use bridge interface or vlan interface for the actual interface also.The frame passed to vxlan interface will be encapsulated and go out from the actual interface.It’s same behavior as tun/tap device.When the VM start communicating other VM or VR, host doesn’t know where the destination VM is in.So, when host doesn’t know where the destination VM is, host send the encapsulated packet to multicast address.So destination host and all other host in that vxlan network know where the source host and VM is.Learn the mapping of the src host and src VM.So, we don’t need to setup per flow table by ourselves nor by complicated centralized controller.Don’t need to setup mesh of the tunnel.Only the host that newly connected to isolated vxlan network, should be change. No need to reconfigure existing host every time we add new VM.That make management code very simple.
For example, vNet range check code in NetworkManagerImpl class.
cloud-agent create vlaninterface (ethY.X) onphysical interface (ethY)which associated to guesttraffic label (cloudbrZ), created vlan interface will beassociated to cloudVirBrX.Frame sent via ethY.X willbe encapsulated with vlanheader and go out from physical interface (ethY).
cloud-agent create vxlaninterface (vxlanX) onbridge interface (cloudbrZ)specified by guest traffic label, created vxlan interface willbe associated to cloudVirBrX.Frame sent via vxlanX willbe encapsulated with vxlanheader and go out from physical interface (ethY).It’s just replace tunnel interface. From VLAN to VXLAN.Basically no change to other components.Thanks to the similarity of VLAN and VXLAN, other functions like NAT, firewall, load balancing, security groups should work with VXLAN.So, we don’t need to re-invent wheel for VXLAN.
Put the guest network in separated physical network