Software Defined Networking is seeing a lot of momentum these days. With server virtualization solving the virtual machines problem, and large scale object storage solving the distributed storage challenge, SDN is seen as key in virtual networking.
In this talk we don't try to define SDN but rather dive straight into what in our opinion is the core enabled of SDN: the virtual switch OVS.
OVS can help manage VLAN for guest network isolation, it can re-route any traffic at L2-L4 by keeping forwarding tables controlled by a remote controller (Openfow controller). We show these few OVS capabilities and highlight how they are used in CloudStack and Xen.
Xen Summit presentation of CloudStack and Software Defined Networks. OpenVswitch is the default bridge in Xen and supported in XenServer and Xen Cloud Platform
1. Xen*, SDN and
Apache Cloudstack
Sebastien Goasguen,
Apache CloudStack Citrix EMEA
August 28th 2012
Xen Summit
2. Outline
• A bit about CloudStack
• A bit about SDN
• A bit about OpenVswitch
• Some bits about “SDN” in CloudStack
• Slides are on slideshare for download:
http://www.slideshare.net/sebastiengoasguen/
cloudstack-and-sdn
3. Apache CloudStack
• IaaS solution to build a • Java application
private/public cloud • Ant build but moving to
• Hypervisor agnostic: maven (via new
– Xen, XS, XCP contributor)
– KVM
– VMware
• Object store support • First Apache release 4.0
– Swift
coming Sept 26th
– Upcoming support from
Caringo
• EC2/S3 compatibility
4. Participating in CloudStack
• Apache incubator project
• http://www.cloudstack.org
• #cloudstack on irc.freenode.net
• @CloudStack on Twitter
• http://cloudstack.org/discuss/mailing-lists.html
Welcoming contributions and feedback, Join the fun !
6. NaaS ?
• “Cloud Servers”
– On-demand, Elastic, Measured server provisioning
• Cloud Storage
– Scalable/fault tolerant storage with object stores
• Cloud Networks
– How to do on-demand, elastic, measured
networking provisioning ?
– How to program the network ?
7. A very extensive API
• CloudStack orchestrates
your network:
– Provisioning
– Configuration
– Updates
• For multi-tenants
isolation
• Using hardware and
software devices
• At scale: O(10^4) Hyp,
O(10^5) VMs…
8. Software Defined Networking
• Enable innovation, experimentation,
optimization and customization of networks
• Move control of the network to software. i.e
Programmable network
• Virtualize the network
• Vendor-agnostic, standard protocol for
control: OpenFlow
9. OpenFlow
• Leading SDN protocol
• Decouples control and
data plane by giving a
controller the ability to
install flow rules on
switches.
• Hardware or software
• Google achieved 95% switches can use
utilization of WAN OpenFlow
backbone by using SDN
• Spec driven by ONF
10. OpenFlow
• OpenFlow rules can drop, rewrite, forward packets
Rule Action Stats
Packet + byte counters
1. Forward packet to port(s)
2. Encapsulate and forward to controller
3. Drop packet
4. Send to normal processing pipeline
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Port src dst type ID Src Dst Prot sport dport
Diagram Src: http://www.openflow.org/wp/documents/
August 28, 2012 10
11. OF Controllers
• Several controllers out
there (NOX,
POX,Trema…)
• Floodlight from Big
Switch. Apache license
12. OpenVSwitch
• “Open vSwitch is a
production quality,
multilayer virtual switch
licensed under the open
source Apache 2.0
license. It is designed to
enable the massive
network automation
through programmatic
extension…”
13. OpenVSwitch
• Default bridge in XenServer
and XCP
• Supported in Xen but not
integrated in toolstack
• Enables:
– VLAN tagging
– Rate limiting
– GRE tunnels
– OpenFlow controller
– …
• High Performance
( http://networkheresy.com/category/
14. e.g OVS rate limiting
• Can enforce QoS with rate limiting controls
• ovs-vsctl set Interface tap0
ingress_policing_rate=1000
• ovs-vsctl set Interface tap0
ingress_policing_burst=100
16. e.g OVS and GRE tunnels
• No Cookbook on OVS page
• ovs-vsctl add-port br1 gre1 -- set interface gre1
type=gre options:remote_ip=192.168.1.152
17. OVS and Openflow
• Point OVS switches to an OF controller:
$ovs_vsctl set-controller br0 tcp 192.168.1.33:6633
• Install rules on switch
–Proactively (before any packet flows)
–Reactively (unknown packets forwarded to controller,
who pushes flow mod on switch, then operates at line
rate)
• Can do SDN with OpenFLOW but also with
straight up OVS and managing mappings/rules in
CloudStack db.
18. OpenNebula
• Supports VLAN tagging
and rate limiting
through “hooks” that
call ovs_vsctl
• Scripts executed on an
hypervisor before a VM
is launched
• Potentially also
executed after VM
shutdown for cleanup
• Also supports OpenFlow
19. CloudStack Nicira Support
• https://cwiki.apache.or
g/confluence/display/CL
OUDSTACK/Feature+Nic
ira+NVP+integration
• By Hugo Trippaers,
Schuberg Philis
20. API key to customization of the
network
• You dream it,
CloudStack orchestrates
it
21. Terminology
Zone: Availability zone,
aka Regions. Could be
worldwide. Different data
centers
Pods: Racks or aisles in a
data center
Clusters: Group of
machines with a common
type of Hypervisor
Host: A Single server
Primary Storage: Shared
storage across a cluster
Secondary Storage:
Shared storage in a single
Zone
22. Physical Network
Operations
Users
Admin and
Cloud API
CloudStack
Mgmt Server
Cluster Router
MySQL
Load Balancer
Availability Zone
L3 Core Switch
Access
Layer
Switches
…
Secondary
Servers
Storage
… … … …
Pod 1 Pod 2 Pod 3 Pod N
Slide from Chiradeep Vittal, http://www.slideshare.net/cloudstack/cloudstack-networking
23. Layer-2 Guest Virtual Network
1 VLAN per guest network
CS Virtual Router provides Network Services External Devices provide Network Services
Network Hardware exposing API can be controlled
Guest Virtual Network 10.1.1.1/8 Guest Virtual Network 10.1.1.1/8
VLAN 100 VLAN 100
Public Public
Network/Inter Network/Inter
net net Guest
Guest Private IP
VM 1 Public IP 10.1.1.1 VM 1
10.1.1.1 10.1.1.111
Gateway 65.37.141.11 Juniper
Public IP 1 SRX
address
65.37.141.11 CS Firewall
10.1.1.1 Guest Guest
Virtual
10.1.1.3 VM 2 10.1.1.3 VM 2
Router
Public IP Private IP
DHCP, DNS 65.37.141. 10.1.1.112
NetScaler
NAT 112
Guest Load Guest
Load Balancing 10.1.1.4 VM 3 Blancer 10.1.1.4 VM 3
VPN
Guest Guest
10.1.1.5 VM 4 10.1.1.5 VM 4
CS
DHCP, Virtual
Router
DNS
Slide from Chiradeep Vittal, http://www.slideshare.net/cloudstack/cloudstack-networking
24. Opportunity for Xen
• Opportunity to create highly specialized
networking services appliances using
– OpenMirage VMs
– HalVM
• See talks in Monday’s session
25. Networking challenges in a private
Cloud
• Multi-tenants on hypervisors => isolation
between guest networks
• VLANs in the datacenter is hard and limit at
4096 VLANs.
• Hardware switches may not do it very well or
have a lower limit
26. Networking trend
• Move to software switches
• Move to L3 isolation
• Use tunnels between OVS (GRE tech preview)
• Program the network through API
• Encapsulation virtualizes the network,
between overlays on overlays on overlays..
• L3 on L2 on GRE on L3 on L2…
• Then you bring the WAN and you have:
• L3 on L2 on GRE on L3 on L2 on GRE on L3 on
L2 ….Euhhhh !!!
27. Back of the enveloppe
• ~10,000 hypervisors in your data center
• ~100,000 VMs
– x10 or x100 if you use HalVM or Openmirage.org
• (10,000*9,999)/2 tunnels for a full mesh
– 50x10^6 tunnels to keep track of ?
28. Slide from Chiradeep Vittal
Layer 3 cloud networking
Web
Web DB
DB Web
Web
VM
VM VM
VM VM
VM
Web DB
Security Security
Group Group
Web
Web Web
Web DB
DB
VM
VM VM
VM VM
VM
… … …
Web
Web Web
Web
VM
VM VM
VM
Ingress Rule: Allow VMs in Web Security Group access to VMs in DB Security Group on Port 3306
29. L3 isolation with distributed firewalls
Tenant 10.1.0.2
Public Public IP address
1 VM 1
Internet 65.37.141.11
65.37.141.24
65.37.141.36 10.1.0.1
Pod 1 L2 Tenant 10.1.0.3
65.37.141.80 Switch 2 VM 1
Tenant 10.1.0.4
1 VM 2
L3 Core
Pod 2 L2
Switch
10.1.8.1
…
10.1.16.1
Load Pod 3 L2
Balancer Switch
…
Slide from Chiradeep Vittal
30. L3 isolation with distributed firewalls
Tenant 10.1.0.2
Public Public IP address
1 VM 1
Internet 65.37.141.11
65.37.141.24
65.37.141.36 10.1.0.1
Pod 1 L2 Tenant 10.1.0.3
65.37.141.80 Switch 2 VM 1
Tenant 10.1.0.4
1 VM 2
L3 Core
Pod 2 L2
Switch
10.1.8.1
…
10.1.16.1
Load Pod 3 L2
Balancer Switch
… Tenant
1 VM 3
10.1.16.47
Tenant
10.1.16.85
1 VM 4
Slide from Chiradeep Vittal
31. L3 isolation with distributed firewalls
Tenant 10.1.0.2
Public Public IP address
1 VM 1
Internet 65.37.141.11
65.37.141.24
65.37.141.36 10.1.0.1
Pod 1 L2 Tenant 10.1.0.3
65.37.141.80 Switch 2 VM 1
Tenant 10.1.0.4
1 VM 2
L3 Core
Pod 2 L2
Switch
10.1.8.1
…
Tenant 10.1.16.12
10.1.16.1 2 VM 2
Load Pod 3 L2
Balancer Switch
Tenant
2 VM 3 10.1.16.21
… Tenant
1 VM 3
10.1.16.47
Tenant
10.1.16.85
1 VM 4
Slide from Chiradeep Vittal
32. A Million Firewalls?
VM
VM VM VM
… VM VM VM
VM
… … … VM
VM
VM VM …
VM VM VM
VM VM
VM VM VMVM
VM VM
VM VM VM
VM VM VMVM
VM
VM VM VM
… VM VM VM
VM
… … … VM
VM
VM VM …
VM VM VM
VM VM
VM VM VM
VM
VM VM
VM VM VM
VM VM VM
VM
VM
VM VMVM VM
… … VM VM
VM VM
… … VM
VM VM …
VM VM VM
VM VM
VM VM
VM
VM
VM VM
VM VM
VM VM
VM VM
VM
VM
VM VM VM
… VM VM VM
VM
… … … VM
VM
VM VM …
VM VM VM
VM VM
VM VMVM
VM
VM VM VM
VM VM VM
VM VMVM
VM
VM VM VM
… VM VM VM
VM
… … … VM
VM
VM VM …
VM VM VM
VM VM
VM VM
VM
VM
VM VMVM VM
VM VM
VM VM
VM
…
VM
VM
VM
VM
…
VM
VM
VM
VM
…
VM
VM
VM
…
VM
VM … VM
…
VM
VM VM
VM VM
VM
VM VM VM VM
VM VM VM
VM VM
VM
VM
VM VM
VM VM
… … VM VM
VM VM
… … VM
VM VM …
VM VM VM
VM VM
VM VM
VM
VM
VM VM VM
VM VM VM
VM VM
VM
VM
VM VM VM
… VM VM VM
VM
… … … VM
VM
VM VM …
VM VM VM
VM VM
VM VM
VM
VM
VM VM VM
VM VM VM
VM VM
VM
Slide from Chiradeep Vittal
33. Problem:
Manage the state of 100s of thousands of firewalls
Solution:
Well-known software scaling techniques
•Message queues
•Consistency tradeoffs
•Idempotent configuration & retries
CloudStack uses
•special purpose queues
•optimized for large security groups
•eventual consistency for rule updates
Slide from Chiradeep Vittal
34. Problem:
Firewall (iptables) rules explosion on the host firewall
Solution:
Use ipsets:
ipset –N web_sg iptreemap
ipset –A web_sg 10.1.16.31
ipset –A web_sg 10.1.16.112
ipset –A web_sg 10.1.189.5
…
ipset –A web_sg 10.21.9.77
-A FORWARD –p tcp –m tcp –dport 3060 –m set –match-set web_sg src -j ACCEPT
Slide from Chiradeep Vittal
35. Conclusions
• Programmable networking is here
• Software switches are key enabler to network
virtualization
• Opens the door for scalable, on-demand,
ephemeral networks
• OVS is the default switch in Xen, and
supported in XenServer and XenCP.
• CloudStack implements highly scalable
network structures and leverages OVS
capabilities
36. Participating in CloudStack
• Apache incubator project
• http://www.cloudstack.org
• #cloudstack on irc.freenode.net
• @CloudStack on Twitter
• http://cloudstack.org/discuss/mailing-lists.html
Welcoming contributions and feedback, Join the fun !
Notas do Editor
Related VMs are placed into security groups: for example, web vms are placed in the web security group and the db vms are in the DB security group. By default all ingress traffic to the vm is dropped. To allow web vms to communicate to DB vms, the cloud user calls an api to allow access on the database’s tcp port.
Each pod has a different subnet. When a VM is started in a pod, it acquires a free ip in that pod’s subnet. Different tenants can land up in the same pod and hence share the same L2 subnet. Because security groups deny all by default, each VM needs a host-based firewall (embedded in the hypervisor dom0) to enforce this. This also prevents stuff like DHCP and ARP snooping. To prevent attacks, multicast and broadcast are blocked by the firewall
As a tenant starts more vms, the vms can land in different pods. The cloud user cannot make any assumptions about L2 connectivity between their vms.
As vms get created and destroyed, CloudStack has to ensure the configuration of the host-based firewalls (iptables) is consistent with the security group rules programmed by the cloud user
40,000 hypervisors in a data center x 25 vms / hypervisor = 1 million firewalls to be orchestrated by CloudStack
An ipset is a kernel datastructure that can match an ip very efficiently against a large set of ips. For example, using a tree structure, an ip address can be quickly tested for containment. The ipset is supplied to the iptables rule leading to a single iptable rule.