SlideShare uma empresa Scribd logo
1 de 33
Managing Open vSwitch 
Across a large heterogeneous fleet 
Andy Hill @andyhky 
Systems Engineer, Rackspace 
Joel Preas @joelintheory 
Systems Engineer, Rackspace
Some Definitions 
Large Fleet Heterogenous 
• Several different hardware 
manufacturers 
• Several XenServer major versions 
(sometimes on varying kernels) 
• Five hardware profiles 
• Six production public clouds 
• Six internal private clouds 
• Various non production environments 
• Tens of thousands of hosts 
• Hundreds of thousands of instances
Quick OVS Introduction
History 
• Rackspace used Open vSwitch since the pre 1.0 days 
• Behind most of First Generation Cloud Servers (Slicehost) 
• Powers 100% of Next Generation Cloud Servers 
• Upgraded OVS on Next Gen hypervisors 9 times over 2 
years
Upgrade Open vSwitch 
If you get nothing else from this talk, upgrade OVS!
Why upgrade? 
Reasons we upgraded: 
• Performance 
• Less impacting upgrades 
• NSX Controller version requirements 
• Nasty regression in 2.1 [96be8de] 
http://bit.do/OVS21Regression 
• Performance
Performance 
• Broadcast domain sizing 
• Special care in ingress broadcast flows 
• Craft flows to explicitly allow destined broadcast traffic
Performance 
• The Dark Ages (< 1.11) 
• Megaflows (>= 1.11) 
• Ludicrous Speed (>= 2.1)
The Dark Ages (< 1.11) 
• Flow-eviction-threshold = 2000 
• Single threaded 
• 12 point match for datapath flow 
• 8 upcall paths for datapath misses 
• Userspace hit per bridge (2x the lookups)
Megaflows (1.11+) 
• Wildcard matching on datapath 
• Less likely to hit flow-eviction-threshold 
• Some workloads still had issues 
• Most cases datapath flows cut in half or better
Ludicrous speed (2.1+) 
• RIP flow-eviction-threshold 
• 200000 datapath flows (configurable) 
• In the wild, we have seen over 72K datapath flows / 
260K pps
OVS 1.4 -> OVS 2.1 
Broadcast flows
Mission Accomplished! 
We moved the bottleneck! 
New bottlenecks: 
● Guest OS kernel configuration 
● Xen Netback/Netfront Driver
Upgrade, Upgrade, Upgrade 
If you package Open vSwitch, don’t leave 
your customers in The Dark Ages 
Open vSwitch 2.3 is LTS
Upgrade process 
• Ansible Driven (async - watch your SSH timeouts) 
• /etc/init.d/openvswitch force-reload-kmod 
• bonded <= 30 sec of data plane impact 
• non-bonded <=5 sec of data plane impact 
http://bit.do/ovsupgrade
Bridge Fail Modes 
Secure vs. Normal bridge fail mode 
Learning L2 switch, overriding default 
• Critical XenServer bug with Windows causing full host 
reboots (CTX140814) 
• Bridge fail mode change is a datapath impacting event 
• Fail modes do not persist across reboots in XenServer 
unless in bridge other-config
Patch Ports and Bridge Fail Modes 
• Misconfigured patch ports + ‘Normal’ Bridge Fail mode 
• Patches do not persist across reboots, cron.reboot to 
set up- no hypervisor hook available
Bridge Migration 
OVS Upgrades required all bridges to be secured 
1. Create new bridge 
2. Move VIFs from old bridge to new bridge (loss of a 
couple of packets) 
3. Upgrade OVS 
4. Ensure bridge fail mode change persists across reboot 
5. Clean up 
Entire process orchestrated with Ansible
Kernel Modules 
Running Kernel OVS Kernel Module Staged Kernel Reboot Outcome 
vABC OVSvABC None Everything’s Fine 
vABC OVSvABC vDEF No Networking 
vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
Kernel Modules 
• Ensure proper OVS kernel modules are in place 
• Kernel Upgrade = OVS kernel module upgrade 
• More packaging work to do for heterogenous 
environment 
• Failure to do so can force a trip to a Java console
Other Challenges with OVS 
• Tied to old version because $REASON 
• VLAN Splinters/ovs-vlan-bug-workaround 
• Hypervisor Integration 
• Platforms: LXC, KVM, XenServer 5.5 and beyond
Measuring OVS 
PavlOVS sends these metrics to StatsD/graphite: 
• Per bridge byte_count, packet_count, flow_count 
• Instance count 
• ovs CPU utilization 
• Aggregate datapath flow_count, missed, hit, lost rates 
These are aggregated per region->cell->host 
Useful for DDoS detection (Graphite highestCurrent()) 
Scaling issues with Graphite/StatsD
OVS in Compute Host Lifecycle 
Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can 
fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First 
made sure it failed explicitly, later added logic to delete existing on 
provisioning.
Monitoring OVS 
Connectivity to SDN controller 
• ovs-vsctl find manager is_connected=false 
• ovs-vsctl find controller is_connected=false 
SDN integration process (ovs-xapi-sync) 
• pgrep -f ovs-xapi-sync 
Routes
Monitoring OVS
Reboots 
Will the host networking survive a reboot? (kernel modules) 
http://bit.do/iwillsurvive
Monitoring OVS
Monitoring OVS 
XSA-108 - AKA Rebootpocalypse 2014 
• Incorrect kmods on reboot may require OOB access to fix! 
• Had monitoring in place 
• Pre-flight check for KMods just in case
Questions? 
THANK YOU 
RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM 
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Mais conteúdo relacionado

Mais procurados

OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
LF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OpenvSwitch
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 
LF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OpenvSwitch
 
Open stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveOpen stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveyfauser
 
Automating linux network performance testing
Automating linux network performance testingAutomating linux network performance testing
Automating linux network performance testingAntonio Ojea Garcia
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networkingSim Janghoon
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauseryfauser
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, greSim Janghoon
 
Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting TracebacksJames Denton
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch IntroductionHungWei Chiu
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStackCumulus Networks
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 

Mais procurados (20)

OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
LF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitchLF_OVS_17_LXC Linux Containers over Open vSwitch
LF_OVS_17_LXC Linux Containers over Open vSwitch
 
OVS-NFV Tutorial
OVS-NFV TutorialOVS-NFV Tutorial
OVS-NFV Tutorial
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
LF_OVS_17_State of the OVN
LF_OVS_17_State of the OVNLF_OVS_17_State of the OVN
LF_OVS_17_State of the OVN
 
Open stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_diveOpen stack networking_101_part-2_tech_deep_dive
Open stack networking_101_part-2_tech_deep_dive
 
Automating linux network performance testing
Automating linux network performance testingAutomating linux network performance testing
Automating linux network performance testing
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networking
 
Geneve
GeneveGeneve
Geneve
 
Osdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauser
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, gre
 
Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting Tracebacks
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch Introduction
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStack
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Kubernetes Intro
Kubernetes IntroKubernetes Intro
Kubernetes Intro
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 

Destaque

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer Khubaib Mahar
 
Upcoming internet challenges
Upcoming internet challengesUpcoming internet challenges
Upcoming internet challengesIvan Pepelnjak
 
OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07Nicolas (Nick) Barcet
 
Module 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGModule 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGTail-f Systems
 
Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and OpenstackDave Neary
 
Module 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionModule 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionTail-f Systems
 
SDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSAMeh Zaghloul
 
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerUnder the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerThe Linux Foundation
 
Openstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsOpenstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsThomas Morin
 
NAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesNAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesIvan Pepelnjak
 
DEVNET-1006 Getting Started with OpenDayLight
DEVNET-1006	Getting Started with OpenDayLightDEVNET-1006	Getting Started with OpenDayLight
DEVNET-1006 Getting Started with OpenDayLightCisco DevNet
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch YongKi Kim
 

Destaque (14)

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer
 
Upcoming internet challenges
Upcoming internet challengesUpcoming internet challenges
Upcoming internet challenges
 
OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07OpenStack Paris Meetup on Nfv 2014/10/07
OpenStack Paris Meetup on Nfv 2014/10/07
 
Module 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGModule 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANG
 
Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and Openstack
 
Module 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionModule 1: ConfD Technical Introduction
Module 1: ConfD Technical Introduction
 
SDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininetSDN Training - Open daylight installation + example with mininet
SDN Training - Open daylight installation + example with mininet
 
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerUnder the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
 
Openstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNsOpenstack Neutron, interconnections with BGP/MPLS VPNs
Openstack Neutron, interconnections with BGP/MPLS VPNs
 
NAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutesNAT64 and DNS64 in 30 minutes
NAT64 and DNS64 in 30 minutes
 
DEVNET-1006 Getting Started with OpenDayLight
DEVNET-1006	Getting Started with OpenDayLightDEVNET-1006	Getting Started with OpenDayLight
DEVNET-1006 Getting Started with OpenDayLight
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
 
NETCONF YANG tutorial
NETCONF YANG tutorialNETCONF YANG tutorial
NETCONF YANG tutorial
 

Semelhante a Managing Open vSwitch Across a Large Heterogenous Fleet

Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)andyhky
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Community
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld
 
OVN DBs HA with scale test
OVN DBs HA with scale testOVN DBs HA with scale test
OVN DBs HA with scale testAliasgar Ginwala
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OpenvSwitch
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
VIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkYousef Morcos
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Vietnam Open Infrastructure User Group
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5solarisyourep
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld
 

Semelhante a Managing Open vSwitch Across a Large Heterogenous Fleet (20)

Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed Switch
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
OVN DBs HA with scale test
OVN DBs HA with scale testOVN DBs HA with scale test
OVN DBs HA with scale test
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
VIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and Network
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5
 
OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Managing Open vSwitch Across a Large Heterogenous Fleet

  • 1. Managing Open vSwitch Across a large heterogeneous fleet Andy Hill @andyhky Systems Engineer, Rackspace Joel Preas @joelintheory Systems Engineer, Rackspace
  • 2. Some Definitions Large Fleet Heterogenous • Several different hardware manufacturers • Several XenServer major versions (sometimes on varying kernels) • Five hardware profiles • Six production public clouds • Six internal private clouds • Various non production environments • Tens of thousands of hosts • Hundreds of thousands of instances
  • 4. History • Rackspace used Open vSwitch since the pre 1.0 days • Behind most of First Generation Cloud Servers (Slicehost) • Powers 100% of Next Generation Cloud Servers • Upgraded OVS on Next Gen hypervisors 9 times over 2 years
  • 5. Upgrade Open vSwitch If you get nothing else from this talk, upgrade OVS!
  • 6. Why upgrade? Reasons we upgraded: • Performance • Less impacting upgrades • NSX Controller version requirements • Nasty regression in 2.1 [96be8de] http://bit.do/OVS21Regression • Performance
  • 7. Performance • Broadcast domain sizing • Special care in ingress broadcast flows • Craft flows to explicitly allow destined broadcast traffic
  • 8.
  • 9.
  • 10. Performance • The Dark Ages (< 1.11) • Megaflows (>= 1.11) • Ludicrous Speed (>= 2.1)
  • 11. The Dark Ages (< 1.11) • Flow-eviction-threshold = 2000 • Single threaded • 12 point match for datapath flow • 8 upcall paths for datapath misses • Userspace hit per bridge (2x the lookups)
  • 12. Megaflows (1.11+) • Wildcard matching on datapath • Less likely to hit flow-eviction-threshold • Some workloads still had issues • Most cases datapath flows cut in half or better
  • 13.
  • 14. Ludicrous speed (2.1+) • RIP flow-eviction-threshold • 200000 datapath flows (configurable) • In the wild, we have seen over 72K datapath flows / 260K pps
  • 15.
  • 16. OVS 1.4 -> OVS 2.1 Broadcast flows
  • 17. Mission Accomplished! We moved the bottleneck! New bottlenecks: ● Guest OS kernel configuration ● Xen Netback/Netfront Driver
  • 18. Upgrade, Upgrade, Upgrade If you package Open vSwitch, don’t leave your customers in The Dark Ages Open vSwitch 2.3 is LTS
  • 19. Upgrade process • Ansible Driven (async - watch your SSH timeouts) • /etc/init.d/openvswitch force-reload-kmod • bonded <= 30 sec of data plane impact • non-bonded <=5 sec of data plane impact http://bit.do/ovsupgrade
  • 20. Bridge Fail Modes Secure vs. Normal bridge fail mode Learning L2 switch, overriding default • Critical XenServer bug with Windows causing full host reboots (CTX140814) • Bridge fail mode change is a datapath impacting event • Fail modes do not persist across reboots in XenServer unless in bridge other-config
  • 21. Patch Ports and Bridge Fail Modes • Misconfigured patch ports + ‘Normal’ Bridge Fail mode • Patches do not persist across reboots, cron.reboot to set up- no hypervisor hook available
  • 22. Bridge Migration OVS Upgrades required all bridges to be secured 1. Create new bridge 2. Move VIFs from old bridge to new bridge (loss of a couple of packets) 3. Upgrade OVS 4. Ensure bridge fail mode change persists across reboot 5. Clean up Entire process orchestrated with Ansible
  • 23. Kernel Modules Running Kernel OVS Kernel Module Staged Kernel Reboot Outcome vABC OVSvABC None Everything’s Fine vABC OVSvABC vDEF No Networking vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
  • 24. Kernel Modules • Ensure proper OVS kernel modules are in place • Kernel Upgrade = OVS kernel module upgrade • More packaging work to do for heterogenous environment • Failure to do so can force a trip to a Java console
  • 25. Other Challenges with OVS • Tied to old version because $REASON • VLAN Splinters/ovs-vlan-bug-workaround • Hypervisor Integration • Platforms: LXC, KVM, XenServer 5.5 and beyond
  • 26. Measuring OVS PavlOVS sends these metrics to StatsD/graphite: • Per bridge byte_count, packet_count, flow_count • Instance count • ovs CPU utilization • Aggregate datapath flow_count, missed, hit, lost rates These are aggregated per region->cell->host Useful for DDoS detection (Graphite highestCurrent()) Scaling issues with Graphite/StatsD
  • 27. OVS in Compute Host Lifecycle Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First made sure it failed explicitly, later added logic to delete existing on provisioning.
  • 28. Monitoring OVS Connectivity to SDN controller • ovs-vsctl find manager is_connected=false • ovs-vsctl find controller is_connected=false SDN integration process (ovs-xapi-sync) • pgrep -f ovs-xapi-sync Routes
  • 30. Reboots Will the host networking survive a reboot? (kernel modules) http://bit.do/iwillsurvive
  • 32. Monitoring OVS XSA-108 - AKA Rebootpocalypse 2014 • Incorrect kmods on reboot may require OOB access to fix! • Had monitoring in place • Pre-flight check for KMods just in case
  • 33. Questions? THANK YOU RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Notas do Editor

  1. Worth mentioning the # of kernel versions?