Network operators are slowly but surely embracing L3-based leaf-spine designs. However, either due to legacy applications or certain multi-tenancy requirements, the need for L2 across racks is still present. How do you solve the problem of providing L2 across multiple racks? EVPN is quickly emerging as the best answer to this question.
In this episode of our 2-part series on EVPN, we start with a discussion of the use cases, a review of the technologies EVPN competes with, and dive into an evaluation of the pros and cons of each.
For a recording of the live event, go to http://go.cumulusnetworks.com/l/32472/2017-09-22/95t27t
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Demystifying EVPN in the data center: Part 1 in 2 episode series
1. 1
Oct 12, 2017
Dinesh G Dutt | Cumulus Networks
Part 1: Technology, Use Cases, Bridging
Operationalizing EVPN in the DC
2. 2Cumulus Networks
What is EVPN ?
Why should you care ?
Use cases and requirements
BGP models for EVPN in FRR
EVPN for bridging
Configuring EVPN in FRR
Agenda
3. 3Cumulus Networks
What is EVPN
• Ethernet VPN i.e. another form of L2 VPN
▪ Different from VPLS
• Original EVPN RFC: RFC 7432
▪ BGP MPLS-based Ethernet VPN
▪ Requirements defined in RFC 7209
4. 4Cumulus Networks
Primary Goals of EVPN
• Overcome the limitations of VPLS
▪ Support for multihoming and redundancy
▪ No data plane learning => no flooding
▪ Multicast optimization
▪ Allows supporting multiple encapsulation types (signaled via
control protocol)
▪ Less configuration
5. 5
Wait! This all sounds like service provider
stuff. Why should I care ?
6. 6Cumulus Networks
The Story In the Data Center So Far
SPINE
LEAF
• CLOS is the new network
architecture
• IP-based fabrics is in,
VLAN/L2-based fabrics is
out
• Scale out wins over scale in
• Fixed form factor boxes
largely win over modular
chassis solutions
• Cloud-native apps rule!
7. 7Cumulus Networks
Except...
• Many enterprise DC still have plenty of legacy applications,
designed with old world network assumptions
▪ See Ivan Pepelnjak’s blog post for these assumptions:
http://blog.ipspace.net/2017/10/solving-problem-in-right-place.
html
▪ Solutions such as VM Mobility are still steeped in the
assumptions of an L2 segment, even though IP address can
be maintained without requiring L2
8. 8Cumulus Networks
VxLAN To The Rescue
• VxLAN has become quite popular as the model for running
L2 over a pure L3 network
▪ Primarily introduced as a multi-tenant, private cloud story
• Original script was for a controller-based play
• But controller-based play has had a limited run
10. 10Cumulus Networks
Meet The New EVPN
• A new set of IETF drafts defining the adaptation of EVPN in
the data center
• Base draft is: draft-ietf-bess-evpn-overlay-08
▪ A Network Virtualization Overlay Solution Using EVPN
▪ VNI (virtual network identifier) replaces VPN in terminology
• Replaces MPLS-based fabrics with IP-based fabrics:
▪ VxLAN, NVGRE, and MPLS over GRE
• Controller-less VxLAN
11. 11Cumulus Networks
EVPN in the DC: Summary
• Supports extending L2 segments over an IP fabric
• Supports routing between L2 segments
• L3 multicast in the overlay is a work in progress
• BGP is the control plane
• Multi-vendor support
• Mainstream introduction of VxLAN routing in merchant
silicon
13. 13Cumulus Networks
Three Primary Use Cases
• Replace VLAN-based access-agg-core enterprise
architecture with EVPN-CLOS based architecture
• Multi-tenant hosting
• Data Center Interconnect (DCI)
14. 14Cumulus Networks
Replacing L2 Core With L3 Core in Traditional Enterprises
• Don’t require > 4K VLANs
▪ Typically tens to hundreds, maybe a couple of thousand
• No other orchestrator usually available
▪ Orchestrating across compute and network
• Routing between L2 VNIs mandatory
• L3 multicast between L2 VNIs maybe required
15. 15Cumulus Networks
Multi-Tenant DC a.k.a Private Cloud
• Require > 4K VNIs in the fabric
• Routing across VNIs in well-defined points in the network
only
▪ Routing will be VRF-aware
• Orchestrator maybe present to simplify deployment
▪ Example: Openstack
• L3 multicast across tenants not common
16. 16Cumulus Networks
Datacenter Interconnect (DCI)
• Stretch L2 segment across DC
• Support for isolating control plane chatter across DCs
• Support for some form of aggregation/summary of MACs to
scale out
• Optimize replication to avoid replicating from local VTEP to
every remote VTEP
• Support multi-homing and redundancy of border routers
• Translating VNIs
17. 17Cumulus Networks
Why Focus on Use Cases ?
• Modern DC network are built on the KISS principle
▪ Keep it simple stupid
• Immutable infrastructure is the growing mantra
▪ Network doesn’t change dynamically in tune with app
• EVPN has the potential to re-introduce all the complexity of
old networks back into the modern DC network
• Focusing on use cases and deployment models can put a
check on complexity
▪ More as we go through the webinar
19. 19Cumulus Networks
What’s iBGP Got To Do With It ?
• eBGP is the deployment
model in the modern DC
• EVPN is typically deployed
as an iBGP model with
peering between VTEPs
▪ Holdover from SP world
▪ Assumes a different IGP
protocol to setup fabric
connectivity
▪ Spines become iBGP
route reflectors(RR) to
avoid iBGP full mesh
SPINE
LEAF
20. 20Cumulus Networks
Simplify BGP Deployment Model
• Make EVPN BGP peering work
over eBGP
• Leaves peer with spines as
usual
• Spines transport EVPN
AFI/SAFI without pushing state
into the data plane (similar to
iBGP RR)
• Modification: For EVPN
AFI/SAFI, don’t automatically
do next-hop-self
SPINE
LEAF
22. 22Cumulus Networks
BGP and EVPN Basics
• EVPN uses l2vpn AFI and evpn SAFI
• Multiple different pieces of information to exchange:
▪ MAC and MAC/IP along with associated VNI and remote
VTEP (VxLAN Tunnel Endpoint) binding
▪ List of VNIs each VTEP is interested in
▪ Route prefixes (subnet routes)
▪ Multicast routes
▪ etc.
• Encoding these different types of information is done by
defining route types
▪ There are ~12 route types defined today
23. 23Cumulus Networks
Basic Bridging in EVPN
• Forward packets based on MAC address
lookup
▪ Learn where destination MAC is
▪ Learn the source-MAC to port binding
• Handle BUM (broadcast, unknown unicast,
multicast)
▪ Send BUM traffic only where desired
• Optimize L2 multicast
▪ Send multicast packets where there are
interested listeners
Exchanged via BGP
(Type 2 Routes)
Traditional Learning
Exchanged via BGP
(Type 3 Routes)
IGMP/MLD Proxy to
BGP Type 6 Route
Ingress Replication
or L3 Multicast
24. 24Cumulus Networks
Type 3 Routes Illustrated
A X B C Y Z
10.1.1.410.1.1.310.1.1.210.1.1.1
10.1.1.5 10.1.1.6
L1 L2 L3 L4
S2S1
W
When EVPN family is
activated, L1 sends Type 3
route advt to its BGP peers
indicating its interested in
Brown and Blue VNIs
S1 and S2 send this
information to L2, L3 and L4
L2, L3 and L4 learn of L1’s VNI list
• Similarly L2, L3 and L4 send their own Type 3 routes
• At the end, each VTEP has a list of other VTEPs and the list of VNIs they’re interested
in
25. 25Cumulus Networks
Illustrating Unknown Unicast Data Plane
A X B C Y Z
10.1.1.410.1.1.310.1.1.210.1.1.1
10.1.1.5 10.1.1.6
L1 L2 L3 L4
S2S1
X sends
packet to Z
L1 associates X’s MAC/VNI
with ingress port. Since Z is
unknown, does ingress
replication to L3, L4
L3, L4 decapsulate packet and
flood it out all known brown VNI
ports since they don’t know Z’s
location as well
• Ingress replication is done only to L3/L4 which have brown VNI
• Different switching chips support doing ECMP post ingress replication; static,
predefined spreading of traffic is done where chip doesn’t support
• No egress VTEP learns off of VxLAN packets (implicitly disabled with EVPN)
W
26. 26Cumulus Networks
Illustrating The Control Plane
A X B C Y Z
10.1.1.410.1.1.310.1.1.210.1.1.1
10.1.1.5 10.1.1.6
L1 L2 L3 L4
S2S1
X sends
packet to Z
L1 learns X’s ingress port,
sends Type 2 route with
Mac of X, VNI, VTEP of X,
to its BGP peers, S1 and
S2
Spines sends the received
Type 2 route to its peers,
L2-L4. Nothing is installed
on the spine itself
L3 & L4 install a MAC table entry with Mac of X
pointing to VTEP of L1. L2 merely stores this info in
the BGP VNI RIB since it has no brown VNI
W
28. 28Cumulus Networks
Three Choices For Handling BUMs
Head end or ingress replication
L3 multicast i.e. underlay uses multicast
Drop unknown unicast and unknown multicast silently
29. 29Cumulus Networks
Ingress Replication
• Keeps the underlay simple
▪ No need to setup/debug L3 multicast
• The default model on Cumulus Linux
• The most popular when I speak to customers (potential or
otherwise)
▪ Maybe biased info, since Cumulus only supports this today
30. 30Cumulus Networks
L3 Multicast in Underlay
• Map each VNI’s traffic to a L3 multicast group
• Ideal is that each VNI is mapped to a separate L3 multicast group
• Control and data plane efficiency limit ideal goals
• More complex configuration due to additional configuration:
▪ Configuring PIM
▪ Mapping VNI to L3 multicast group
▪ Additional checking if VNI received in group is of interest
• Only benefit is ability to handle lots of BUM traffic (or even L2
multicast)
31. 31Cumulus Networks
Drop BUM Traffic
• Many network admins consider BUM traffic as a potential
DDOS attack vector
• A key primary goal of EVPN was to eliminate BUM via
control plane support
• Useful mostly if used in conjunction with ARP suppression
• Primary drawbacks:
▪ Inability to handle silent servers (speak only when spoken to).
Do these even exist anymore ?
▪ Slower convergence due to control plane distributing
information rather than learning via data plane
33. 33Cumulus Networks
Dual-Attached Hosts Deployment Model
• The two switches a dual-attached host connects to behave
no differently w.r.t. BGP for EVPN than regular BGP
▪ Each of the two switches has its own ASN
• MLAG typically used to provide a single logical bonded
interface to the host
• Peer-link/MLAG is sometimes debated
▪ Alternate proposal is to use the L3 core and BGP to exchange
relevant information between the switches
▪ Type 1 and type 4 route types defined for this purpose
▪ Not commonly deployed or popular
▪ Maybe of interest for data center interconnect switches
34. 34Cumulus Networks
VxLAN Configuration for Dual-Attached Hosts
• Many switching ASICs do not support multiple VTEP IP
addresses associated with a MAC/VNI in the MAC table
• So both switches attached that a dual-attached host
connects to MUST use an anycast IP address as the VTEP
IP address
▪ Ensure that this anycast VTEP IP is advertised in BGP
underlay
36. 36Cumulus Networks
ARP Suppression
• Eliminate or reduce ARP broadcasts by providing local ARP
proxy
▪ Not a traditional L3 ARP Proxy, just a L2 ARP local response
• Announce MAC/IP binding along with MAC/VNI to VTEP
association
▪ This is also a Type-2 route
• Can be enabled on a per-VNI basis
37. 37Cumulus Networks
ARP Suppression: Vendor Notes
• ARP Suppression can be enabled on Cumulus, independent
of the VTEP being the gateway for that VNI
• Some of the other vendors enable this feature only if VTEP
is also the gateway for that VNI
• Cumulus supports only ARP suppression today, ND support
coming soon
39. 39Cumulus Networks
Three Primary Modifications to Support EVPN
• The Linux kernel had three primary modifications:
▪ Support for ARP suppression
▪ Adding a flag to indicate a MAC table entry was learnt via an
external source
▪ Adding a flag to indicate an IP/IPv6 neighbor entry was learnt
via an external source
• The first has been upstreamed and accepted into
mainstream Linux kernel
• The two flags are being upstreamed
47. 47Cumulus Networks
FRR’s Simplified Configuration
• Assume sane defaults
• Simplify the common case
• Take out all the stuff that’s inconsequential
• Those who want all the knobs and warts still have it
GOAL: Simplify configuration to reduce human error
48. 48Cumulus Networks
Summary
• EVPN is a standards-based technology that allows
enterprise networks to run traditional applications over a L3
core
• EVPN uses VxLAN as its base data plane encapsulation
• EVPN uses BGP as the control plane
• FRR/Cumulus Linux use sane defaults to simplify the EVPN
configuration and operations