SlideShare a Scribd company logo
1 of 40
Dmitry Afanasiev, fl0w@yandex-team.ru
Daniel Ginsburg, dbg@yandex-team.ru
Network Architects
MPLS in DC and inter-DC
networks: the unified
forwarding mechanism for
network programmability at
scale
About Us
3
• Founded in 1993
• NASDAQ:YNDX, Mkt Cap ~$12.5B
• One of Europe's largest internet companies
and the leading search provider in Russia
• Over 60% of the local search market
• Monthly user audience of over 90 million
worldwide.
• Services: search, music, video, cloud storage,
news, weather, maps, traffic, email, ads ...
What is Yandex
4
• We're rather typical MS-DC
• Several DCs in Russia and abroad + MPLS
backbone to connect them
• About 100k servers and growing fast
• Mostly IPv6 internally, need to serve external
IPv4
• Network architecture is a bit outdated, needs
rethinking
Our Infrastructure
In Search of New Arch
6
• It needs to be:
– Scalable
– Flexible
– Programmable
• Lots of approaches out there, some get many
things right…
• But not one combines all the right pieces in the
right way
• It's really surprising because right combination
seems almost inevitable.
In Search of New Arch
7
• Many of the ideas have been around for years
(or even decades)
• Interconnection network topology – folded Clos
• Let the edge handle complexity
• Core just delivers packets edge to edge
• Overlay/underlay logical split
• Control: mix of centralized and distributed.
Needs a nice way to combine both
• Simple commodity network elements
• Hierarchy and automation to scale the network
Ideas to Build Upon
8
• All these are ideas are well known, understood
and almost universally accepted in the industry
• People are trying to implement them using a
wild mix of data plane mechanisms.
• And it introduces enormous complexity
• What's missing? Unified forwarding
mechanism
What’s missing
9
• Life is much easier when we don't have to deal
with multitude of data planes and forwarding
mechanisms.
• Fortunately, there is already well known, well
understood, standardized forwarding plane
mechanism upon which we can implement all
those ideas without compromising their value.
• It has well defined and standardized mapping
to many other popular forwarding panes.
• It's known as MPLS.
Missing… or overlooked?
Unified Forwarding: Why and How
11
• Different data plane mechanisms – different
features
• The unified data plane should be able to
support all useful features and produce their
combinations
• MPLS is very flexible:
– forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV-
TE does
– source routing over a previously discovered topology a-la Token Ring
networks - see Segment Routing proposal
– hierarchical LPM a-la IP - just split the address over several labels and
allow routers to act on the topmost one (not that we suggest it is practical,
but it is definitely possible)
Flexibility
12
• Best way to implement arbitrary semantics is to
get rid of any semantics in protocol headers
and assign it externally
• Hardware works with protocol headers
• Control software defines the semantics
An Abstract Note on Semantics
13
• Why combining? To have the right features at
the right place or produce useful combination
of features
• There're basically two ways to combine
different data-planes together: stitch or
interwork them, and overlay them on top of
each other
Combining Data Planes
14
• It’s pain
• Might be done for subset of protocol features
• Need to translate between protocols (complex,
never perfect, looses information)
• Need to provision interworking points: fragile,
operational nightmare, create bottlenecks
• Seems nobody really does this anymore… Or
maybe we still have to sometimes?
Stitching Data Planes
15
• Overlay to: scale, virtualize, augment one data
plane with properties of another
• Overlaying is building hierarchy
• But with multiple data planes it is limited and
ad-hoc
• Often ugly: IP over Ethernet over VXLAN over
IP over Ethernet
• MPLS is intrinsically hierarchical (overlayable,
if you will)
Overlaying Data Planes
16
• Many hierarchical structures are already in the
network: topology, addressing, management
and control
• Hierarchy is the most important and the most
reliable way to scale things
Hierarchy is your friend
17
• The ability to implement hierarchy natively
enables us to ditch the notion of hard
overlay/underlay boundary.
• In a stack of DC-label, ToR-label, port-label,
slice-label, vm-label, where's the boundary of
overlay/underlay? Not in the packet
• Placement of the boundary only depends on
how you structure your control
Overlay/underlay split is a metaphor
18
• Can be as granular or coarse-grained as one
wishes. There's no network-imposed limitation
• Easy behavior aggregation. Just add an extra
label on top
• Easy behavior disaggregation. One can
expose additional granularity by adding extra
label on bottom
FEC is hierarchical
How to Control MPLS
20
• MPLS control plane is notoriously complex
• Good news: you don’t have to use all of it, can
pick good parts
• Classical distributed control is Ok for transport
• Centralized control seems better for higher
level artifacts on the edge, sometimes called
services
• Both styles can (and should) be combined
MPLS is complex?
21
• The device has be a bit smarter than in OF
• Gets parts of label stack from different control
plane components
• Assembles the full stack from those parts,
using local logic to follow assembly instructions
provided by control plane
• Assembly instructions come in form of
referencing by “name”
• Assembly uses late binding
Enabling combinability
22
• MPLS VPN (abstraction A) refers to MPLS
tunnels (abstraction B), using next-hop
resolution.
• The resolution happens on the device itself,
and two control plane entities are loosely
coupled - MPLS tunnels paths can change
their paths, the assigned labels etc, without
MP-BGP caring about it
• VPN abstraction refers to tunnel abstraction
using next-hops. Next-hop is the name which
one control plane abstraction refers to another
Enabling combinability – example
23
• Recursive next-hop resolution with labeled
routes (RFC 3107) is the powerful way to
overlay one control plane abstraction over
another
• Able to express almost anything we currently
want. Still, more expressive way is desired
• BGP 3107 is the way to interact with all-
classically-controlled MPLS networks
Enabling Combinability – BGP 3107
24
• If you can ensure that the labels at some point
of the network always stay the same (because
you assigned them to be so), you can use
static configuration on the other side
• The way to go, when one wants to avoid any
signaling dependencies
• Static configuration can be calculated and
disseminated automatically
Static Configuration
25
• On the host! Or even right from the application
• Hypervisor switch is the easiest point. SW only,
very flexible.
• Naturally fits centralized control
• Helps to scale. Lots of RAM, each element
keeps only needed state
• Modern CPUs can forward 10s of Gbps without
breaking sweat
Where MPLS should start?
26
• A simple forwarding plane (3 simple ops)
• A simple software agent on the device
(receives parts of label stack from different
control plane components, assembles full
stack, and programs the HW)
• Centralized and distributed control, or anything
in between
• Combinability of different control plane
components with late binding via names, which
the device resolves
Looks SDNish
27
• “Modularity based on abstraction is the way
things get done” --Liskov
• “SDN ...Not a revolutionary technology... ...just
a way of organizing network functionality” --
Shenker
• “SDN is merely set of abstractions for control
plane, not a specific set of mechanisms.” --
Shenker
• “Most lasting legacy of SDN is not better
datacenters - But better ways of reasoning
about network control” --Shenker
What SDN is
28
• Let the edge handle complexity – do it on host
• Core just delivers packets edge to edge –
hierarchy enables the devices to be agnostic to
changes on the edge
• Overlay/underlay logical split – just a way to
implement hierarchy
• Control: mix of centralized and distributed.
Needs a nice way to combine both – yeah!
• Simple commodity network elements – cheap
MPLS capable silicon is finally there
How Ideas Map to MPLS
29
• Key point of S-MPLS was to extend MPLS to
access and separate transport and service in
MPLS network
• NFV describes how to host service nodes in
DC. If you don’t have MPLS in DC it’s no
longer seamless
• Fix is obvious – extend MPLS into DC
• Labels can carry additional metadata if one
wants them to
NFV and Seamless MPLS
Case Study: New Yandex DC
31
• Cheap and abundant bandwidth
• Scalable forwarding with minimal state
• Multitenancy (=> network virtualization)
• Efficient resource pooling
• InterDC traffic engineering
• Function chaining: load balancing, FW, etc.
• Interconnection with existing infrastructure
• Means to integrate all of above
• Local response to some events, e.g. failures
• Automation at scale
What we need?
32
We are trying to keep design really simple. Don’t
need many functions often perceived as
desireable:
• L2 (neither real, nor emulated)
• VM mobility
– In scale-out applications nodes coming and going is a norm, no need to
move them around while preserving state and identity
– VM mobility increases complexity as it depends on other features
• Multicast
• We don't have too many changes in topology
What we don’t need
33
• Host with vLER (MPLS capable vRouter)
• Fabric switching elements – LSRs
• Centralized controller
• Legacy routers. Need to interwork with fabric
LSRs and controller. BGP 3107 is the tool
Components
34
• 3-label stack: topmost for egress switch, next
for egress port, bottom for VM
• vRouter uses {dst prefix, VRF} to impose label
stack
• Bottom label processed by destination vLER
• Expected state on a fabric switch:
#switches_in_the_fabric + #local_access_ports
Forwarding model
35
• iBGP 3107 (in-path RR w/ NHS) inside fabric
for reachabilty and label distribution (draft-
lapukhov…, but with iBGP and labels)
• iBGP 3107 to interwork with legacy routers
– Session with connected network element with NHS for switch label
– Session with controller for remaining labels, binds to switch label via next
hop
• Label mappings on edge of the fabric are
stable, can be provisioned rather than signaled
• Internal fabric failures are handled locally
• Label mappings on vRouters are distributed
centrally
Control plane
Why Now and What’s Next?
37
“The world is changed… I smell it in the air”
• A lot of similar ideas in the industry
• Seems that thinking converges on something
• But ... a lot of ugly ad-hoc solutions are
popping out here and there
• Better implement good solution until bad ones
are entrenched
• It would be a shame and missed opportunity to
stick with VXLAN/… for years when we could
get MPLS instead
Why Now?
38
• Merchant silicon is finally MPLS capable. And
the price is almost right.
• Modern CPUs can process tens of Mpps in
SW, making host-based switching feasible.
• Several open source MPLS data plane
implementations are emerging
• Several "classical" MPLS control plane
components are very useful - BGP 3107, and
have been there for quite long time.
What’s Ready?
39
• All RFC3107 implementations are broken
(multiple labels). Talk to your vendor
• Silicon is not perfect. Talk to your vendor
• A more expressive way to control late binding
of control plane artifacts than BGP 3107
• Perception MPLS as complex technology. It's
current MPLS control plane that is complex
• Perception of MPLS as WAN or metro
technology
Gaps
Thank you!
Questions?

More Related Content

What's hot

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Android Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADKAndroid Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADKRajesh Sola
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Odinot Stanislas
 
Building Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMBuilding Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMSherif Mousa
 
Build a Micro HTTP Server for Embedded System
Build a Micro HTTP Server for Embedded SystemBuild a Micro HTTP Server for Embedded System
Build a Micro HTTP Server for Embedded SystemJian-Hong Pan
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityThomas Graf
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Cisco IPv6 Tutorial
Cisco IPv6 TutorialCisco IPv6 Tutorial
Cisco IPv6 Tutorialkriz5
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 

What's hot (20)

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Android Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADKAndroid Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADK
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Building Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMBuilding Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARM
 
Build a Micro HTTP Server for Embedded System
Build a Micro HTTP Server for Embedded SystemBuild a Micro HTTP Server for Embedded System
Build a Micro HTTP Server for Embedded System
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Hands-on ethernet driver
Hands-on ethernet driverHands-on ethernet driver
Hands-on ethernet driver
 
Cisco IPv6 Tutorial
Cisco IPv6 TutorialCisco IPv6 Tutorial
Cisco IPv6 Tutorial
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 

Viewers also liked

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminAlexander Lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Alexander Lyamin
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosbcantrill
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Alexander Krizhanovsky
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIbcantrill
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingBertrand Duvivier
 

Viewers also liked (6)

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumos
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering Routing
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Yandex
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsOpen Networking Summits
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingUS-Ignite
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptxHadeeb
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentationAbdullah Salama
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys Corporation
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptxKaythry P
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationNetCraftsmen
 
Why sdn
Why sdnWhy sdn
Why sdnlz1dsb
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptxImXaib
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale (20)

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of Protocols
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptx
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentation
 
Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
 
Topology.ppt
Topology.pptTopology.ppt
Topology.ppt
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network Virtualization
 
08-sdnfvmec.pdf
08-sdnfvmec.pdf08-sdnfvmec.pdf
08-sdnfvmec.pdf
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Why sdn
Why sdnWhy sdn
Why sdn
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptx
 
Raga_SDN_NSX_1
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Vaibhav (2)
Vaibhav (2)Vaibhav (2)
Vaibhav (2)
 
4_SDN.pdf
4_SDN.pdf4_SDN.pdf
4_SDN.pdf
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 

Recently uploaded

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Recently uploaded (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

  • 1. Dmitry Afanasiev, fl0w@yandex-team.ru Daniel Ginsburg, dbg@yandex-team.ru Network Architects MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale
  • 3. 3 • Founded in 1993 • NASDAQ:YNDX, Mkt Cap ~$12.5B • One of Europe's largest internet companies and the leading search provider in Russia • Over 60% of the local search market • Monthly user audience of over 90 million worldwide. • Services: search, music, video, cloud storage, news, weather, maps, traffic, email, ads ... What is Yandex
  • 4. 4 • We're rather typical MS-DC • Several DCs in Russia and abroad + MPLS backbone to connect them • About 100k servers and growing fast • Mostly IPv6 internally, need to serve external IPv4 • Network architecture is a bit outdated, needs rethinking Our Infrastructure
  • 5. In Search of New Arch
  • 6. 6 • It needs to be: – Scalable – Flexible – Programmable • Lots of approaches out there, some get many things right… • But not one combines all the right pieces in the right way • It's really surprising because right combination seems almost inevitable. In Search of New Arch
  • 7. 7 • Many of the ideas have been around for years (or even decades) • Interconnection network topology – folded Clos • Let the edge handle complexity • Core just delivers packets edge to edge • Overlay/underlay logical split • Control: mix of centralized and distributed. Needs a nice way to combine both • Simple commodity network elements • Hierarchy and automation to scale the network Ideas to Build Upon
  • 8. 8 • All these are ideas are well known, understood and almost universally accepted in the industry • People are trying to implement them using a wild mix of data plane mechanisms. • And it introduces enormous complexity • What's missing? Unified forwarding mechanism What’s missing
  • 9. 9 • Life is much easier when we don't have to deal with multitude of data planes and forwarding mechanisms. • Fortunately, there is already well known, well understood, standardized forwarding plane mechanism upon which we can implement all those ideas without compromising their value. • It has well defined and standardized mapping to many other popular forwarding panes. • It's known as MPLS. Missing… or overlooked?
  • 11. 11 • Different data plane mechanisms – different features • The unified data plane should be able to support all useful features and produce their combinations • MPLS is very flexible: – forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV- TE does – source routing over a previously discovered topology a-la Token Ring networks - see Segment Routing proposal – hierarchical LPM a-la IP - just split the address over several labels and allow routers to act on the topmost one (not that we suggest it is practical, but it is definitely possible) Flexibility
  • 12. 12 • Best way to implement arbitrary semantics is to get rid of any semantics in protocol headers and assign it externally • Hardware works with protocol headers • Control software defines the semantics An Abstract Note on Semantics
  • 13. 13 • Why combining? To have the right features at the right place or produce useful combination of features • There're basically two ways to combine different data-planes together: stitch or interwork them, and overlay them on top of each other Combining Data Planes
  • 14. 14 • It’s pain • Might be done for subset of protocol features • Need to translate between protocols (complex, never perfect, looses information) • Need to provision interworking points: fragile, operational nightmare, create bottlenecks • Seems nobody really does this anymore… Or maybe we still have to sometimes? Stitching Data Planes
  • 15. 15 • Overlay to: scale, virtualize, augment one data plane with properties of another • Overlaying is building hierarchy • But with multiple data planes it is limited and ad-hoc • Often ugly: IP over Ethernet over VXLAN over IP over Ethernet • MPLS is intrinsically hierarchical (overlayable, if you will) Overlaying Data Planes
  • 16. 16 • Many hierarchical structures are already in the network: topology, addressing, management and control • Hierarchy is the most important and the most reliable way to scale things Hierarchy is your friend
  • 17. 17 • The ability to implement hierarchy natively enables us to ditch the notion of hard overlay/underlay boundary. • In a stack of DC-label, ToR-label, port-label, slice-label, vm-label, where's the boundary of overlay/underlay? Not in the packet • Placement of the boundary only depends on how you structure your control Overlay/underlay split is a metaphor
  • 18. 18 • Can be as granular or coarse-grained as one wishes. There's no network-imposed limitation • Easy behavior aggregation. Just add an extra label on top • Easy behavior disaggregation. One can expose additional granularity by adding extra label on bottom FEC is hierarchical
  • 20. 20 • MPLS control plane is notoriously complex • Good news: you don’t have to use all of it, can pick good parts • Classical distributed control is Ok for transport • Centralized control seems better for higher level artifacts on the edge, sometimes called services • Both styles can (and should) be combined MPLS is complex?
  • 21. 21 • The device has be a bit smarter than in OF • Gets parts of label stack from different control plane components • Assembles the full stack from those parts, using local logic to follow assembly instructions provided by control plane • Assembly instructions come in form of referencing by “name” • Assembly uses late binding Enabling combinability
  • 22. 22 • MPLS VPN (abstraction A) refers to MPLS tunnels (abstraction B), using next-hop resolution. • The resolution happens on the device itself, and two control plane entities are loosely coupled - MPLS tunnels paths can change their paths, the assigned labels etc, without MP-BGP caring about it • VPN abstraction refers to tunnel abstraction using next-hops. Next-hop is the name which one control plane abstraction refers to another Enabling combinability – example
  • 23. 23 • Recursive next-hop resolution with labeled routes (RFC 3107) is the powerful way to overlay one control plane abstraction over another • Able to express almost anything we currently want. Still, more expressive way is desired • BGP 3107 is the way to interact with all- classically-controlled MPLS networks Enabling Combinability – BGP 3107
  • 24. 24 • If you can ensure that the labels at some point of the network always stay the same (because you assigned them to be so), you can use static configuration on the other side • The way to go, when one wants to avoid any signaling dependencies • Static configuration can be calculated and disseminated automatically Static Configuration
  • 25. 25 • On the host! Or even right from the application • Hypervisor switch is the easiest point. SW only, very flexible. • Naturally fits centralized control • Helps to scale. Lots of RAM, each element keeps only needed state • Modern CPUs can forward 10s of Gbps without breaking sweat Where MPLS should start?
  • 26. 26 • A simple forwarding plane (3 simple ops) • A simple software agent on the device (receives parts of label stack from different control plane components, assembles full stack, and programs the HW) • Centralized and distributed control, or anything in between • Combinability of different control plane components with late binding via names, which the device resolves Looks SDNish
  • 27. 27 • “Modularity based on abstraction is the way things get done” --Liskov • “SDN ...Not a revolutionary technology... ...just a way of organizing network functionality” -- Shenker • “SDN is merely set of abstractions for control plane, not a specific set of mechanisms.” -- Shenker • “Most lasting legacy of SDN is not better datacenters - But better ways of reasoning about network control” --Shenker What SDN is
  • 28. 28 • Let the edge handle complexity – do it on host • Core just delivers packets edge to edge – hierarchy enables the devices to be agnostic to changes on the edge • Overlay/underlay logical split – just a way to implement hierarchy • Control: mix of centralized and distributed. Needs a nice way to combine both – yeah! • Simple commodity network elements – cheap MPLS capable silicon is finally there How Ideas Map to MPLS
  • 29. 29 • Key point of S-MPLS was to extend MPLS to access and separate transport and service in MPLS network • NFV describes how to host service nodes in DC. If you don’t have MPLS in DC it’s no longer seamless • Fix is obvious – extend MPLS into DC • Labels can carry additional metadata if one wants them to NFV and Seamless MPLS
  • 30. Case Study: New Yandex DC
  • 31. 31 • Cheap and abundant bandwidth • Scalable forwarding with minimal state • Multitenancy (=> network virtualization) • Efficient resource pooling • InterDC traffic engineering • Function chaining: load balancing, FW, etc. • Interconnection with existing infrastructure • Means to integrate all of above • Local response to some events, e.g. failures • Automation at scale What we need?
  • 32. 32 We are trying to keep design really simple. Don’t need many functions often perceived as desireable: • L2 (neither real, nor emulated) • VM mobility – In scale-out applications nodes coming and going is a norm, no need to move them around while preserving state and identity – VM mobility increases complexity as it depends on other features • Multicast • We don't have too many changes in topology What we don’t need
  • 33. 33 • Host with vLER (MPLS capable vRouter) • Fabric switching elements – LSRs • Centralized controller • Legacy routers. Need to interwork with fabric LSRs and controller. BGP 3107 is the tool Components
  • 34. 34 • 3-label stack: topmost for egress switch, next for egress port, bottom for VM • vRouter uses {dst prefix, VRF} to impose label stack • Bottom label processed by destination vLER • Expected state on a fabric switch: #switches_in_the_fabric + #local_access_ports Forwarding model
  • 35. 35 • iBGP 3107 (in-path RR w/ NHS) inside fabric for reachabilty and label distribution (draft- lapukhov…, but with iBGP and labels) • iBGP 3107 to interwork with legacy routers – Session with connected network element with NHS for switch label – Session with controller for remaining labels, binds to switch label via next hop • Label mappings on edge of the fabric are stable, can be provisioned rather than signaled • Internal fabric failures are handled locally • Label mappings on vRouters are distributed centrally Control plane
  • 36. Why Now and What’s Next?
  • 37. 37 “The world is changed… I smell it in the air” • A lot of similar ideas in the industry • Seems that thinking converges on something • But ... a lot of ugly ad-hoc solutions are popping out here and there • Better implement good solution until bad ones are entrenched • It would be a shame and missed opportunity to stick with VXLAN/… for years when we could get MPLS instead Why Now?
  • 38. 38 • Merchant silicon is finally MPLS capable. And the price is almost right. • Modern CPUs can process tens of Mpps in SW, making host-based switching feasible. • Several open source MPLS data plane implementations are emerging • Several "classical" MPLS control plane components are very useful - BGP 3107, and have been there for quite long time. What’s Ready?
  • 39. 39 • All RFC3107 implementations are broken (multiple labels). Talk to your vendor • Silicon is not perfect. Talk to your vendor • A more expressive way to control late binding of control plane artifacts than BGP 3107 • Perception MPLS as complex technology. It's current MPLS control plane that is complex • Perception of MPLS as WAN or metro technology Gaps