Mais conteúdo relacionado
Semelhante a RIFT A New Approach to Building DC Fabrics (20)
RIFT A New Approach to Building DC Fabrics
- 1. © 2018 Juniper Networks
RIFT
A new approach to building DC fabrics
Nitin Vig
Chief Architect, Juniper Networks
- 2. © 2018 Juniper Networks
AGENDA
2
Datacenter Fabric Trends
Introduction to RIFT
RIFT key features
Industry status
Summary
- 3. © 2018 Juniper Networks
DATACENTER FABRIC - TRENDS
Hybrid Clouds are here to stay
• Hybrid cloud for many reasons, one of them to keep real-estate from Hyper scalers
• Customers are hosting their content & critical business processes; Need to build own fabrics
• Impossible to sustain proprietary OPEX efforts
Fabrics are becoming Uniform, Local & Regular
• Vast amount of bandwidth close to the producer & consumer necessary
• Fabric architectures being adopted outside the conventional DC (Metro, PoP)
• WAN-style Traffic Engineering & protection replaced by Wide Fan-out & distributed systems redundancy
Fabric is the new “RAM chip”
• No one configures RAM banks manually in every laptop
• IP fabrics HW is largely commodity already
• IP fabrics will “OPEX commoditize” (consume bandwidth)
3
- 4. © 2018 Juniper Networks
DATACENTER FABRIC – TECHNOLOGY EVOLUTION
Tree to CLOS topology
• Tree: core/aggregation/access layers
• Folded CLOS or Fat Trees: Spine & Leaf
Layer2 switching to Layer3 routing
• Layer 3 routing underlay with Layer2/3 overlay
Layer3 underlay routing options: IGP > eBGP
• For scaling. Convergence & OPEX considerations
4
Folded
Original Fat Tree (based on CLOS)
Folder Fat Tree
- 5. © 2018 Juniper Networks
DATACENTER FABRIC: ROUTING PROTOCOL CHALLENGES
• Routing protocols are complex (to deal with irregular topologies)
• Routing protocols are:
• EITHER: Fast, but not scalable to 100k nodes (link-state)
• OR: Slow, when scalable to 100k nodes (distance-vector)
CURRENT ROUTING PROTOCOLS DATACENTER FABRICS
Built for irregular network topologies
Low degree of connectivity
Uniform topology (CLOS, folded Fat-Tree)
High degree of connectivity (Hyper-scale DCs)
NOT A PERFECT MATCH
- 6. © 2018 Juniper Networks 6
REQUIREMENT BGP
(modified for DC)
ISIS
(modified for DC)
01 Close to Zero Touch Provisioning
02 Link discovery/Automatic forming of trees/preventing cabling violations ⚠ ⚠
03 Minimal amount of routes/information on ToRs (cost-optimized)
04 High degree of ECMP (BGP needs lots knobs, memory, own-as-path violations) ⚠
05 Traffic engineering by Next-hops, Prefix modification
06 See all links in topology to support PCE/SR ⚠
07 Carry opaque configuration data (key-value) efficiently ⚠
08 Take a node out of production quickly and without disruption (overload)
09 Automatic disaggregation on failures to prevent black-holing
10 Minimal blast radius on failures
11 Fastest possible convergence on failures
DATACENTER FABRIC: KEY REQUIREMENTS
- 7. © 2018 Juniper Networks
LET’S TAKE A FRESH LOOK
Distance Vector
(RIP)
7
Link State
(ISIS, OSPF)
Path Vector
(BGP)
Vectors of destination and distance
“Tell you neighbors rest of the network”
Router announced LSDB, Dijkstra
“Tell rest of the network your neighbors”
Full-paths announced in BGP
“Paths described by sequence of ASs”
Routing protocols in our network
- 8. © 2018 Juniper Networks
LINK STATE v/s DISTANCE/PATH VECTOR
Link State
• Topology view à TE enabler
• Fast propagation
Distance/Path Vector
• Granular policy control & traffic engineering
time
time
Node 1
Node 0
Node 3
Node 2
Node 5
Node 4
Node 1
Node 0
Node 3
Node 2
Node 5
Node 4
computation
Update
tx-mission
Link State Convergence
Distance/Path Vector Convergence
Both protocols types (LS and Distance/Path Vector) are frequently used in todays networks
- 9. © 2018 Juniper Networks
RIFT: ROUTING IN FAT TREES
• CLOS optimized routing protocol
• Full BW Utilization
• Built in Fabric Provisioning
• Fast convergence
9
Clean slate approach to building DC Fabrics
Market Requirements
Juniper Invention
• Link-State (North) + Distance-Vector (South)
• Simplest leaf Implementation
• Failure Domain Containment
• Support all DC applications
- 10. © 2018 Juniper Networks
RIFT AT A GLANCE
1. Topological sort
• Uses the concept of directionality
2. Link-State flood Up (North)
• Full topology and all prefixes @ top spine
only
3. Distance Vector Down (South)
• 0/0 is sufficient to send traffic Up.
• More-specific prefixes advertised in specific
scenarios (link failures, traffic engineering)
4. Bounce
• Flood reduction
• Automatic dis-aggregation
- 11. © 2018 Juniper Networks
RIFT IN STEADY STATE – BASICS
Aggregation
Localization
Pfx: 0/0
Pfx Y
Pfx Z
Pfx ZPfx YPfx XPfx W
Pfx: 0/0
Spine (Level 2)Learn Pfx A,B,C,D from Spine (level 1)
Spine (Level 1)
Learn 0/0 from Spine (level 2)
Learn Pfx A,B,C,D from Leaf (level 0)
Leaf (Level 0)Learn 0/0 from Spine (level 1)
- 12. © 2018 Juniper Networks
POD 1
Pfx DPfx CPfx BPfx A
Spine (Level 2)
Spine (Level 1)
Leaf (Level 0)
RIFT FEATURES
DETECTING CABLING MIS-CONFIGURATION
Problem statement: Fabric should automatically detect and
block wrong cabling.
Automatic rejection of adjacencies based on minimal
configuration
• A1 to B1: Forbidden due to POD mismatch
• A0 to B1: Forbidden due to POD mismatch (A0 already
formed A0-A1 even if POD not configured on A0)
• B0 to C0: Forbidden based on level mismatch
POD 0
C0
A0
A1
B0
B1
- 13. © 2018 Juniper Networks
RIFT FEATURES
(NEAR) ZERO TOUCH PROVISIONING
Problem statement: Fabric should auto-configure with close to zero-touch
Automatic SystemID derivation
• RIFT SystemID (64 bits) is automatically derived from node’s EUI-64
Top-level (superspine) switches must be manually configured
• Either: with flag=SUPERSPINE (default level 16)
• Or: explicit level (e.g.: level 3 in the example)
A node with non-configured level derives its level from the neighbor’s level
(highest neighbor’s level – 1)
• E, F -> derived level 2
• I, J -> derived level 1
Node with flag=LEAF_ONLY has always derived level 0
J
N
F
Level 0
Level 1
Level 2
Level 3A
E
I
M
Flag = LEAF_ONLY Flag = LEAF_ONLY
level=3
manual
- 14. © 2018 Juniper Networks
A0
RIFT FEATURES
ROUTING IN FAILURE: AUTOMATIC DISAGGREGATION
Problem statement: Avoid any traffic black-holing due to Link
failures
1) Link C2 – B1 breaks. C2 looses reachability to Pfx Y & Z
2) C2 sends updates with only one Nbr (A1)
3) D2 receives update from C2:
• Our neighbors don’t match (B1 is missing)
• C2 has no reachability to pfx Y & Z
• Lower level nodes use 0/0 – risk of traffic black hole.
4) D2 originates new update w/ disaggregated prefixes (Y,Z)
Note:
• Nodes on lower level (A1, B1) get more specific route.
• Nodes further down [Level 0] still can use 0/0 only
A1
C2
Pfx ZPfx YPfx XPfx W
D2 learns C2 has
lost Nbr B1
3
D2
Pfx 0/0 à C2, D2
Pfx Y,Z à D2
Pfx 0/0 à A1, A2
B1C2 – B1
link fails
1
C2 sends only
Nbr A1 in update
2 D2 advertises specific
route to pfx Y & Z
4
- 15. © 2018 Juniper Networks
RIFT FEATURES
FLOODING REDUCTION: FOR HIGHLY MESHED DC TOPOLOGIES
Problem statement: Avoid redundant information in highly
meshed topologies
N-port spine switch
Level 2 spine – all N ports are southbound
Level 1 spine
• N/2 ports are Southbound
• N/2 ports are Northbound
Link-State Flooding become over-kill (known problem in link-
state protocols)
- 16. © 2018 Juniper Networks
RIFT FEATURES
FLOODING REDUCTION: HAPPENS IN THE NORTH DIRECTION
Each ‘L’ node which ‘L+2’ nodes are reachable via particular “L+1’
nodes
Single ‘L+1’ node can flood updates from ‘L’ node to given set of
‘L+2’ nodes -> Flood Repeater (FR) node
For redundancy, in RIFT ‘L’ node selects at least two ‘L+1’ nodes as
FRs (using a selection algorithm)
Updates sent to non-FRs marked with ‘do-not-reflect’ flag
Similar algorithm is executed at each level.L
L+1
L+2
XX XX
- 17. © 2018 Juniper Networks
RIFT FEATURES
WEIGHTED BANDWIDTH LOAD-BALANCING
Problem Statement: Load-balance traffic across links based on link capacity
Weighted Bandwidth load-balancing example:
1. Each upstream node gets a value based on available bandwidth
• Upstream node BW = BW to upstream node + uplink BW upstream node
• On X, upstream node I & J -> 2 x 10G + 4 x 40G = 180G
• Upstream node BW is converted to next exponent of 2
• On X, upstream node I & J -> 180G -> 8 (Note: 27 < 180 < 28)
• Exponent for I & J = 8
2. Received route’s metric is adjusted based on above value (BAD – Bandwidth
Adjusted Distance)
• BAD = original D * (1 + Max_Upstream_Exp – Current_Upstream_Exp)
• On X, upstream node I -> BAD = D * (1 + 8 - 8) = D
• On X, upstream node J -> BAD = D * (1 + 8 - 8) = D
• Equal BW load-balancing -> distance (metric) not adjusted
J
Y
F
A
E
I
X
10G
40G
100G
- 18. © 2018 Juniper Networks 18
REQUIREMENT BGP
(modified for DC)
ISIS
(modified for DC)
RIFT
01 Close to Zero Touch Provisioning
02 Link discovery/Automatic forming of trees/preventing cabling violations ⚠ ⚠
03 Minimal amount of routes/information on ToRs (cost-optimized)
04 High degree of ECMP (BGP needs lots knobs, memory, own-as-path violations) ⚠
05 Traffic engineering by Next-hops, Prefix modification
06 See all links in topology to support PCE/SR ⚠
07 Carry opaque configuration data (key-value) efficiently ⚠
08 Take a node out of production quickly and without disruption (overload)
09 Automatic disaggregation on failures to prevent black-holing
10 Minimal blast radius on failures
11 Fastest possible convergence on failures
RIFT FEATURES SUMMARY
DATACENTER FABRIC: KEY REQUIREMENTS
- 19. © 2018 Juniper Networks
INDUSTRY STATUS
Standardization
• Initiated by Antoni Przygienda (Juniper Networks)
• Standards Track Working Group Draft (I-D)
• Base for further work toward RFC
• https://tools.ietf.org/html/draft-ietf-rift-rift-06
Co-operation
• Join work at IETF WG (JNPR, CSCO, Nokia, Comcast)
• Contact authors, share opinion
• The data structures for packet are public (GPB)
I-D RFC STD
individual
Availability
• RIFT on python: https://github.com/brunorijsman/rift-
python
• RIFT trial code available from Juniper:
https://www.juniper.net/us/en/dm/free-rift-trial/
• Production-ready Juniper code: Q4’2019
Relevant drafts
• Policy-guided prefixes with RIFT:
https://tools.ietf.org/html/draft-atlas-rift-pgp-01
• RIFT YANG model:
https://tools.ietf.org/html/draft-ietf-rift-yang-00
• Segment Routing in Fat Trees (SRIFT):
https://tools.ietf.org/html/draft-zzhang-rift-sr-01
- 20. © 2018 Juniper Networks
SUMMARY: RIFT PROTOCOL ADVANTAGES
• Fastest possible convergence
• Automatic topology detection
• Minimal routes on TORs
• High degree of ECMP
• Fast de-commissioning of Nodes
• Excessive flooding
• Manual neighbor detection
• Zero-touch provisioning
• Automatic disaggregation on failure
• Minimal blast radius on failures
• Utilize all fabric paths without loops
• Support for non-ECMP paths
• Key-Value Store
Link-State and Distance Vector
Take
‘best of both’
Leave
‘not-so-good’
Unique RIFT additions