O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Deploying CloudStack and Ceph with flexible VXLAN and BGP networking

89 visualizações

Publicada em

Wido den Hollander: Deploying CloudStack and Ceph with flexible VXLAN and BGP networking

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Deploying CloudStack and Ceph with flexible VXLAN and BGP networking

  1. 1. Ceph and CloudStack 24-10-2019, London
  2. 2. WiFi UoL Conferences
  3. 3. Social Media #cephcloudstack
  4. 4. Thanks to today’s sponsors
  5. 5. Schedule - Online at https://ceph.io/cephdays - Highlights: - Short break at 10:45 - Lunch at 12:00 - Short break at 15:15 - Pub (the Gallery, 1st floor) at 17:00
  6. 6. VXLAN+BGP+EVPN CloudStack & Ceph Wido den Hollander
  7. 7. Who am I? - Wido den Hollander (1986) - Dutch, live in the Netherlands - Ceph, CloudStack and IPv6 guru - Open Source lover! - Ceph Trainer and Consultant - CTO at 42on
  8. 8. 42on - Founded the company in 2012 - Company specialized in Ceph - Consultancy - Training - Emergency Assistance - https://42on.com/
  9. 9. PCextreme - Co-founded this Dutch hosting company in 2004 - Traditionally a domain + webhoster - Transitioned into providing infrastructure for many other hosting companies - Running cloud deployment with Ceph and CloudStack - Using the KVM hypervisor - Numbers: - 80k customers - >100k domain names - ~20k Virtual Machines - ~3PB of Ceph storage
  10. 10. Layer 2 networks - When trying to scale a network Layer 2 becomes difficult - Redundancy is a problem - Reliability is a challenge - L2 works great in smaller scale networks - We are trying to eliminate them on our cloud deployment - Usually a 4k limit on the total amount of VLANs
  11. 11. Layer 3 networks are better! - More flexible - More reliable - More scalable - They are just better! (imho)
  12. 12. Compute 2.0 ● Initiated end of 2018 ● Existing Cloud(Stack) and Ceph clusters were based on Layer 2 networking ● Goals: ○ Increase flexibility ○ Better scalability (more virtual networks) ■ Less (or no) Layer 2 networking ○ As Open Source as it can be ○ Underlying network should be IPv6-only!
  13. 13. IPv6 Why use technologies of the future with the IP protocol from the past?
  14. 14. VXLAN "Virtual Extensible LAN (VXLAN) is a network virtualization technology that attempts to address the scalability problems associated with large cloud computing deployments. It uses a VLAN-like encapsulation technique to encapsulate OSI layer 2 Ethernet frames within layer 4 UDP datagrams, using 4789 as the default IANA-assigned destination UDP port number."
  15. 15. VXLAN Ethernet packets encapsulated inside UDP ● This means you can route them through an existing Layer 3 network ● MTU of underlying network needs to be >1580 ○ We use 9216 ● VXLAN segments are called VNI ○ You can create 16M VNIs!
  16. 16. VXLAN & Multicast ● Default VXLAN uses Multicast to exchange VNI, IP and MAC information with other hosts ● Multicast has scalability problems ● Difficult to route Multicast to different Layer 2 segments
  17. 17. The solution: VXLAN+BGP+EVPN ● Use the Border Gateway Protocol (BGP) for exchanging VNI, IP and MAC information between hosts ● Allows for scaling VXLAN over multiple Layer 3 segments and further ○ Multi DC if required ● Good blog: https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
  18. 18. VXLAN & EVPN Ethernet VPNs (EVPNs) enable you to connect groups of dispersed customer sites using Layer 2 virtual bridges, and Virtual Extensible LANs (VXLANs) allow you to stretch Layer 2 connectivity over an intervening Layer 3 network, while providing network segmentation like a VLAN, but without the scaling limitation of traditional VLANs. Source: Juniper.net
  19. 19. VXLAN+BGP+EVPN
  20. 20. VXLAN & CloudStack ● Properly supported in 4.12 release ○ We (PCextreme) pushed multiple commits to polish and improve support ● BGP+EVPN only supported with customized script ○ https://github.com/PCextreme/cloudstack/blob/vxlan-bgp-evpn/scripts/vm/network/vnet/modifyvxlan.sh ● Configuring Frrouting (BGP) is not done by CloudStack ○ Frr learns new VNIs automatically when CloudStacks spawns a new VM ● Security Grouping (Firewalling) works ○ With IPv6 support!
  21. 21. Cumulus Linux ● Debian Linux based Operating System for ONIE switches ○ "Open networking software for the modern data center" ● Support for ○ VXLAN ○ BGP ○ EVPN ● You can run your own (Linux) scripting/tooling on the switch! ○ We run the Salt Minion for provisioning on these switches
  22. 22. Dell S5232F-ON ● 32x100Gbit switch with ONIE support ○ Atom CPU ○ 4GB Memory ● Modern Broadcom Trident3 chipset ○ VXLAN offloading in the chipset ○ This does all the actual heavy lifting. Cumulus just programs the chipset ● Affordable ○ ~EUR 3.500,00 (ex VAT) for a single switch ● Supported by Cumulus Linux ● We use them for two purposes ○ Core routers for this cloud deployment ○ Top-of-Rack for Hypervisors
  23. 23. Dell S5232F-ON
  24. 24. Hypervisors ● 1U SuperMicro with Dual AMD Epyc ○ AS-1123US-TR4 ● 1TB DDR4 Memory ● 10x Samsung 2TB NVMe for local storage ● 2x 100Gbit Mellanox ConnectX-5 ○ VXLAN offloading support ● Affordable! ○ ~EUR 17.500 per server
  25. 25. Hypervisors
  26. 26. Ceph OSD ● 1U SuperMicro with Single AMD Epyc 7351P ○ 1113S-WN10RT ● 128GB DDR4 Memory ● 10x Samsung 4TB NVMe ○ PM983 SSDs ● 2x 10Gbit to top of rack ● Affordable ○ ~EUR 8.500 for a single system
  27. 27. Ceph OSD
  28. 28. Networking on each host ● Hypervisors and Ceph servers run BGP on the host ○ We are using Frrouting ● No LACP, Bonding, Teaming to servers ○ This means no Layer 2 to hosts! ● All hosts announce their IPv6 loopback address through BGP ○ This is a /128 route ● Jumbo Frames enabled on all hosts ○ MTU is set to 9000 ○ IPv6 MTU Path Detection makes sure this also works towards the internet
  29. 29. BGP configuration hostname hv-138-a05-23.xxx.yyy.eu ! interface lo ipv6 address 2a05:1500:601:2::9/128 ! router bgp 4200100123 neighbor enp81s0f0 interface peer-group uplinks neighbor enp81s0f1 interface peer-group uplinks ! address-family ipv6 unicast network 2001:db8:601:2::9/128 neighbor uplinks activate exit-address-family
  30. 30. BGP Unnumbered ● Each host creates a BGP session with both Top-of-Rack switches ● Using BGP Unnumbered the BGP sessions are created over IPv6 Link-Local ○ No need to manually assign IP(v6) addresses to hosts and switches ○ VXLAN and IPv4 traffic is routed over IPv6 ○ Super easy to scale! BGP neighbor on enp81s0f0: fe80::225:90ff:feb2:bcdf, remote AS 4200100002, local AS 4200100123, external link Hostname: tor-138-a05-46 .. .. Local host: fe80::ba59:9fff:fe20:6e22, Local port: 39020 Foreign host: fe80::225:90ff:feb2:bcdf, Foreign port: 179
  31. 31. Scaling ● We scale by expanding and building multiple Ceph clusters ○ We choose to NOT grow a Ceph cluster larger than a Single 19" rack ○ This way we prevent that our whole cloud environment goes down due an issue with a single Ceph cluster ○ CloudStack supports multiple Ceph clusters ● Layer 2 (VLAN) scaling is no longer a problem! ○ We can add as many hypervisors as we think CloudStack (safely) supports ○ No need to stretch VLANs over multiple racks ● Hypervisors can be spread out over multiple racks ○ They consume about 0,5kW each. Maximum per rack is ~5KW
  32. 32. Scaling ● With VLANs we had to be conservative ● Now we can provide isolated networks for customers on request ○ We can route their own IP-space ○ Additional traffic policies can be set
  33. 33. Ceph clusters ● One cluster per rack ● 3 MONs ● As much OSD machines as we can safely fit ○ Power consumption restrictions (5~6kW) apply ● ~650TB of RAW Ceph storage per rack ○ 100% NVMe storage ○ We are considering using 8TB NVMe drives instead of 4TB ● If needed we could deploy a HDD-only Ceph cluster for slow bulk storage
  34. 34. Performance ● Excellent! ● We can saturate the 100Gbit links ● We achieve a ~0.8ms write latency ○ 4k write size ○ 3x replication
  35. 35. Ceph and BGP ● Just works ○ Ceph only wants a working Layer 3 route to other hosts ● Need to define public_addr in ceph.conf ○ public_addr = 2001:db8:601:2::7 ● No further configuration needed for Ceph
  36. 36. Conclusion ● Scalable networking solution without the use of VLANs ● We can easily create Virtual Networks for customers on our CloudStack deployment(s) ○ Announce their (public) IP-space (v4/v6) and route it to their VMs ● High performance Ceph environment providing reliable and fast storage ○ Risks mitigated by creating multiple Ceph clusters
  37. 37. Thanks! Wido den Hollander wido@denhollander.io @widodh https://blog.widodh.nl/