This document discusses network services in carrier networks. It begins with an agenda for a 168 slide, 40 minute presentation on multiservice IP next-generation networks (NGN). It then discusses concepts like quality of service (QoS), multicasting, and TCP performance in the context of modern networking technologies like HTTP/2, over-the-top services, and 100 gigabit Ethernet. The rest of the document provides details on implementing QoS, guidelines for QoS for video, the history and uses of multicasting, and fundamentals of multicast addressing.
2. 2
PLAN NA DZISIAJ
168 slajdóww 40 minut (aż 14,3 sek na slajd ;) )
Dyskusja zamiast prezentacji
Dużo informacji ”na później”
3. 3
Czy w dobie “net neutrality”, wszechobecnych
HTTP2,OTT i sieci 100GE jest sens bawić się w
QoS,multicasty i temu podobne fanaberie? ;)
4. MULTISERVICE IP NGN
4
Peering
IPSNG
Residential Subscribers
Encoder
Terrestrial
Encoder
Ground Station Encoder
Ground Station
Encoder
Studio / OB
SNG
Peering
DistributionContribution
PE
Uplink
Post
Production
CDN
Fixed location
6. QOS GOLDEN RULES
Start with the goal in mind
There is no substitute for sufficient
bandwidth
Queuing and Scheduling can protect
voice and video from data
Only Call Admission Control can protect
voice from voice and video from video
Don’t mix UDP and TCP in the same
class
10. TCP LOSS
TCP design balance
Don’t over-run the receiver/network
Use available bandwidth
TCP will adjust to the correct rate
based on delay and drops
TCP drops packets!
12. TCP LOSS
There are 2 types of TCP loss
Detected by timeout (red area)
Detected by duplicate ACK (green
area)
13. UDP
UDP does not adjust to loss or delay
UDP is generally only used for real-time trafficwhere
drops are preferred to delays
DNS
Voice
Video (VC and live broadcasts)
Financial applications (ticker)
Video games
Multicast (non-real time)
Content distribution
§ IPSec NAT-T
Does not count
Treat like TCP?
16. GENERAL QOS GUIDELINES
DiffServ QoS model
Works on aggregate traffic classes rather than individual flows
Highly scalable
Best effort traffic can reuse non-utilized bandwidth
Real-time traffic classes with preferential treatment (Voice, Video, bi-dirTP)
Strict Priority when noVoice services are provided otherwise non-strict Priority
AF class withVoice when single PQ
Real-time traffic policed at ingress to avoid misconfiguration issues
Data services run as Best Effort traffic
Business traffic uses in-profile/out-profileQoS approach
17. QOS CHEAT SHEET Do not mixUDP &TCP traffic in the same class
Do not mixVoice & Video traffic in the same class
Per-subscriber SLA for Voice and Data applications
Per-subscriber SLA not applicable toVideo/IPTV
Over-the-top (Internet)Video traffic to be treated as default traffic
With dual Dual Priority queue
Use priority level 1 forVoice traffic
Use priority level 2is forVideo traffic
With Single Priority queue
Use priority queue for Voice traffic
Use AF queue with minimum bandwidth guarantee for video
18. 1
8
Queues Distribution
(A+B+C+D=100)
PQ
A% of Link BW
Class1 - Video
B% of Link BW
(tail-drop)
Class2 - Business
Critical
C% of Link BW
(WRED-DSCP/EXP)
Class3/ Default
D% of Link BW
Multi-Play Application Traffic
DSCP /
EXP
Broadcast Video AF41 / 4*
VoD AF42 / 1
Streaming TV AF43 / 1
Ad Traffic AF31 / 2
Content Distribution (Music, Video) AF11 / 0
VoIP Bearer EF / 5
Videoconferencing (Video/Audio Bearer) CS5 / 5
VoIP Signaling (incl. video conferencing) CS3 / 3
Prioritized Data Services (inc. Commercial
Services) AF21 / 2
Residential Data Services CS0 / 0
Gaming CS 0 / 0
Other Data CS0 / 0
Network Control - Routing CS6 / 6
Service Provisioning, Control & Mgmt. CS2 / 2
Network Management CS2 / 2
19. QOS GUIDELINESFORVIDEO
NetworkSLAs
Delay:May affect Contribution
Jitter: Bounded by receiver buffer size (IP-STBs up to 200 ms, DCM
up to 100 ms)
Packet-loss:Criticalfor compressed services. IPTV packet lossrate <
10-6 (onenoticeableartifact per hour of streaming @ 4 Mbps). No
packet lossfor Contribution services
Real-timeVideoTraffic
Not oversubscribed
Not congested
Video on Demand
Can be oversubscribed withCAC
Less priority than BroadcastVideo
20. RSVP
RSVP implementation could be
modified to address the problem for
private WANs
Requires routers to initiate
reservations
RSVP agent
RSVP and IOS
RSVP proxy
21. RSVP
RSVP AND QOS IN CISCO IOS ROUTERS
Scheduling + Policing
Call AdmissionControl
? YES
NO
RSVP
RSVP signaling
LLQ/CBWFQ
IntServ
model
Data
Control Plane
Data Plane
RSVP
IntServ/
DiffServ
model
Scheduling + Policing
Call AdmissionControl
? YES
NO
Data
Control Plane
Data Plane
RSVP signaling
22. RSVP
INTSERV/DIFFSERV—IOS MODEL INTERFACE QUEUING
“Usable”Bandwidth(75%)Reserved
TotalLinkBandwidth
0%
25%
50%
75%
100%
Priority
(33% max)
BWAssignedtoLLQClasses
iprsvp
bandwidth
RSVP flows admitted/
rejected based on ‘ip rsvp
bandwidth’ only
RSVP flows assigned to
priority queue based on
LLQ classes
(typically, DSCP)
BW reserved for LLQ/
CBWFQ classes based on
policy maps and service
policy
Packets assigned to LLQ
classes/queues based on
class maps (typically,
DSCP)
Provision priority
queue to match RSVP
bandwidth + L2
overhead
23. RSVP
INTSERV/DIFFSERV CISCO IOS MODEL: NOTES
LLQ/CBWFQ classes can be configured as usual
and bandwidth allocated to them on the interface
No bandwidth is reserved with ip rsvp bandwidth
Reservations accepted/rejected based
exclusively on value configured in ip rsvp
bandwidth
RSVP traffic assigned to queues based on LLQ
rules (RSVP is not involved in classification)
If non-RSVP real-time applications are present,
provision the PQ accordingly and ensure they
use a CAC mechanism to avoid oversubscription
ip rsvp resource-provider none
ip rsvp data-packet classification none
To enable this
model in IOS:
26. A BRIEF HISTORY OF MULTICAST
Steven Deering, 1985, Stanford University
Yeah, he was way ahead of his time and too clever for all of us.
A solution for layer2 applications in the growing layer3 campus
network
-Think overlay broadcast domain
Broadcast Domain
- all members receive
- all members can source
- members dynamically come and go
26
27. A BRIEF HISTORY OF MULTICAST
RFC966 - 1985
Multi-destination delivery is useful to several
applications, including:
- distributed, replicated databases [6,9].
- conferencing [11].
- distributed parallel computation, including
distributed gaming [2].
All inherently many-to-many applications
No mention of one-to-many services such as Video/IPTV
27
28. A BRIEF HISTORY OF MULTICAST
Overlay Broadcast Domain Requirements
- Tree building and maintenance
- Network-based source discovery
- Source route information
- Overlay mechanism – tunneling
The first solution had it all
Distance Vector Multicast Routing Protocol
DVMRP, RFC1075 – 1988 28
29. A BRIEF HISTORY OF MULTICAST
PIM – Protocol Independent Multicast
“Independent” of which unicast routing protocol you run
It does require that you’re running one. J
Uses local routing table to determine route to sources
Router-to-router protocol to build and maintain distribution trees
Source discovery handled one of two ways:
1) Flood-and-prune PIM-DM, Dense Mode
2) Explicit Join w/ Rendezvous Point (RP) PIM-SM,
Sparse Mode - The Current Standard
29
30. A BRIEF HISTORY OF MULTICAST
PIM-SM – Protocol Independent Multicast Sparse Mode
-Tree building and maintenance
- Network-based source discovery
- Source route information
- Overlay mechanism – tunneling
Long, Sordid IETF history
RFC4601 – 2006 (original draft was rewritten from
scratch)
Primary challenges to the final specificationwere in addressing Network-based
source discovery.
30
31. A BRIEF HISTORY OF MULTICAST
Today’s dominant applications are primarily one-to-many
IPTV, Contribution video over IP, etc.
Sources are well known
SSM – Source Specific Multicast
RFC3569, RFC4608 – 2003
-Tree building and maintenance
- Network-based source discovery
- Source route information
- Overlay mechanism – tunneling
Very simple and the preferred solution for one-to-many applications 31
32. MULTICAST USES
§ Any applications with multiple receivers
‒ One-to-many or many-to-many
§ Live video distribution
§ Collaborative groupware
§ Periodic data delivery—“push” technology
‒ Stock quotes, sports scores, magazines, newspapers, adverts
§ Server/Website replication
§ Reducing network/resource overhead
‒ More than multiple point-to-point flows
§ Resource discovery
§ Distributed interactive simulation (DIS)
‒ War games
‒ Virtual reality
32
33. MULTICAST CONSIDERATIONS
Multicast Is UDP-Based
Best effort delivery: Drops are to be expected; multicast applications should not expect
reliable delivery of data and should be designed accordingly; reliable multicast is still an
area for much research; expect to see more developments in this area; PGM, FEC,QoS
No congestion avoidance: Lack ofTCP windowing and “slow-start”mechanisms can
result in network congestion; if possible, multicast applications should attempt to detect
and avoid congestion conditions
Duplicates: Some multicast protocol mechanisms (e.g., asserts,registers, and SPT
transitions)result in the occasional generation of duplicate packets; multicast
applications shouldbe designed to expect occasional duplicate packets
Out of order delivery:Some protocol mechanisms may also result in out of order
delivery of packets
33
36. UNICASTVS. MULTICASTADDRESSING
src addr:
10.1.1.1
src addr:
10.1.1.1
How do we
address one
packet to
different
destinations?
12.1.1.1
11.1.1.1
13.1.1.1
..replicated at
each node along
the tree.
A unique packet
addressed to each
destination.
37. MULTICAST ADDRESSING
IPv4 Header
Options Padding
Time to Live Protocol Header Checksum
Identification Flags Fragment Offset
Version IHL Type of Service Total Length
Source Address
Destination AddressDestination
Source
224.0.0.0 - 239.255.255.255 (Class D) MulticastGroup Address Range
Destination
1.0.0.0 - 223.255.255.255 (Class A, B, C)
Source Always the unique unicast origin address of
the packet – same as unicast
37
38. MULTICAST ADDRESSING
Multicast Group addressesare NOT in the unicast route table.
A separate route table ismaintained for active multicast treesin the
network.
Multicast state entriesare initiated by receiverssignaling their request to
join a group.
Sources do not need to join,they just send.
Multicast routing protocolsbuild and maintainthe trees,hop-by-hop,based
on receiver membership and source reach ability.
Source reach ability isderived from the unicast route table.
Multicast relieson a dependable unicast infrastructure.
Class D Group addresses – 224/4
38
39. MULTICAST STATEClass D Group addresses – 224/4
barn#show ip mroute 232.1.1.109
(207.109.83.5, 232.1.1.109), 3w1d/00:02:40, flags: s
Incoming interface: Ethernet 0/0, RPF nbr 207.109.83.33
Outgoing interface list:
Ethernet 1/0, Forward/Sparse, 3w1d/00:02:40
Ethernet 2/0, Forward/Sparse, 2w0d/00:02:33
barn#show ip route 232.1.1.109
% Network not in table
barn#
Multicast route entries are in (S,G) form.
Incoming interface points upstream
toward the root of the tree.
Outgoing interface list is where receivers
have joined downstream and where packets
will be replicated and forwarded downstream.
Multicast Group addresses
are NEVER in the unicast
route table.
39
40. MULTICAST ADDRESSING—224/4§ Reserved link-local addresses
224.0.0.0–224.0.0.255
Transmitted with TTL = 1
Examples
224.0.0.1 All systems on this subnet
224.0.0.2 All routers on this subnet
224.0.0.5 OSPF routers
224.0.0.13 PIMv2 routers
224.0.0.22 IGMPv3
§ Other reserved addresses
224.0.1.0–224.0.1.255
Not local in scope (transmitted with TTL > 1)
Examples
224.0.1.1 NTP (Network Time Protocol)
224.0.1.32 Mtrace routers
224.0.1.78 Tibco Multicast1 40
41. MULTICAST ADDRESSING—224/4
§ Administratively scoped addresses
‒ 239.0.0.0–239.255.255.255
‒ Private address space
Similar to RFC1918 unicast addresses
Not used for global Internet traffic—scoped traffic
§ GLOP (honest, it’s not an acronym)
‒ 233.0.0.0–233.255.255.255
‒ Provides /24 group prefix per ASN
§ SSM (Source Specific Multicast) range
‒ 232.0.0.0–232.255.255.255
‒ Primarily targeted for Internet-style broadcast 41
42. MULTICAST ADDRESSING
32 Bits
28 Bits
25 Bits 23 Bits
48 Bits
01-00-5e-7f-00-01
1110
5 Bits
Lost
IP Multicast MAC Address Mapping
239.255.0.1
44. HOW ARE MULTICASTADDRESSES
ASSIGNED?§ Static global group address assignment (GLOP)
‒ Temporary method to meet immediate needs
‒ Group range: 233.0.0.0–233.255.255.255
Your AS number is inserted in middle two octets
Remaining low-order octet used for group assignment
‒ Defined in RFC 2770, updated in RFC 3180
“GLOPAddressing in 233/8”
‒ SSM does not require group address “ownership”
§ Manual address allocation by the admin
‒ Is still the most common practice
44
45. HOST-ROUTER SIGNALING:
INTERNET GROUP MANAGEMENT
PROTOCOL (IGMP)
§ How hosts tell routers about group membership
§ Routers solicit group membership from directly connected hosts
§ RFC 1112 specifies version 1 of IGMP
‒ Supported on Windows 95
§ RFC 2236 specifies version 2 of IGMP
‒ Supported on latest service pack for Windows and most
UNIX systems
§ RFC 3376 specifies version 3 of IGMP
‒ Supported in Window XP and various UNIX systems 45
47. HOST-ROUTER SIGNALING: IGMP
§ Router sends periodic queries to 224.0.0.1 Query
§ One member per group per subnet reports
224.1.1.1
Report
§ Other members suppress reports
224.1.1.1
Suppressed
X
224.1.1.1
Suppressed
X
Maintaining a Group
H3H2H1
48. HOST-ROUTER SIGNALING: IGMP
§ Host sends leave message to 224.0.0.2
H1
Leave to
224.0.0.2
224.1.1.1
#1
§ Router sends group-specific query to 224.1.1.1
Group Specific
Query to 224.1.1.1
#2
§ No IGMP report is received within ~ 3 seconds
§ Group 224.1.1.1 times out
H2
Leaving a Group (IGMPv2)
H3H3
49. HOST-ROUTER SIGNALING: IGMPV3
§ Adds include/exclude source lists
§ Enables hosts to listen only to a specified subset of the hosts
sending to the group
§ Requires new ‘IPMulticastListen’ API
§ New IGMPv3 stack required in the OS
§ Apps must be rewritten to use IGMPv3 include/ exclude features
RFC 3376 – enables SSM
50. HOST-ROUTER SIGNALING: IGMPV3
§ 224.0.0.22 (IGMPv3 routers)
‒ All IGMPv3 hosts send reports to this address
Instead of the target group address as in IGMPv1/v2
‒ All IGMPv3 routers listen to this address
‒ Hosts do not listen or respond to this address
§ No report suppression
‒ All hosts on wire respond to queries
Host’s complete IGMP state sent in single response
‒ Response interval may be tuned over broad range
Useful when large numbersof hostsreside on subnet
New Membership Report Address
51. IGMPV3—JOINING A GROUP
§ Joining member sends IGMPv3 report
to 224.0.0.22 immediately upon joining
H2
1.1.1.1
H1 H3
1.1.1.10 1.1.1.11 1.1.1.12
rtr-a
Group: 224.1.1.1
Include: (empty)
v3 Report
(224.0.0.22)
52. IGMPV3—JOINING SPECIFIC SOURCE(S)
§ IGMPv3 report contains desired
source(s) in the include list
H2
1.1.1.1
H1 H3
1.1.1.10 1.1.1.11 1.1.1.12
rtr-a
Group: 232.1.1.1
Include: 10.0.0.1
v3 Report
(224.0.0.22)
§ Only “Included” source(s) are joined
53. IGMPV3—MAINTAINING STATE
Query1.1.1.1
§ Router sends periodic queries
§ All IGMPv3 members respond
§ Reports contain multiple group state records
v3 Report
(224.0.0.22)
v3 Report
(224.0.0.22)
v3 Report
(224.0.0.22)
H2
1.1.1.10 1.1.1.11 1.1.1.12
H1 H3
54. MULTICAST L3 FORWARDING
§ Unicast routing is concerned about where the packet
is going
§ Multicast routing is concerned about where the packet came from
‒ Initially
Multicast Routing is Backwards from Unicast Routing
55. UNICASTVS. MULTICAST FORWARDING
§ Destination IP address directly indicates where to forward packet
§ Forwarding is hop-by-hop
‒ Unicast routing table determines interface and next-hop router to forward
packet
Unicast Forwarding
56. UNICASTVS. MULTICAST FORWARDING
§ Destination IP address (group) doesn’t directly indicate where to forward
packet
§ Forwarding is Outgoing Interface List dependent (OIF)
‒ Receivers must first be “connected” to the tree before traffic begins to flow
Connection messages (PIM joins) follow unicast routing
table toward multicast source
Build multicast distribution trees that determine where
to forward packets
Distribution trees rebuilt dynamically in case of network topology changes
Each router in the path maintains an OIF list per tree state
Multicast Forwarding
57. REVERSE PATH FORWARDING (RPF)
§ The multicast packet’s source address is checked against the unicast routing
table
§ This determines the interface and upstream router in the direction of the
source to which PIM joins are sent
§ This interface becomes the “Incoming” or RPF interface
‒ A router forwards a multicast datagram only if received on the RPF interface
The RPF Calculation
58. REVERSE PATH FORWARDING (RPF)
R1
C
D
10.1.1.1
E1
E2
Unicast Route Table
Network Interface
10.1.0.0/24 E0
Join
Join
A
E0
E
RPF Calculation
§ Based on source address
§ Best path to source found in unicast
route table
§ Determines where to send join
§ Joins continue towards source
to build multicast tree
§ Multicast data flows down tree
SRC
B
59. REVERSE PATH FORWARDING (RPF)
R1
C
D
A
R2
10.1.1.1
E1E0
E2
E
Join
Join
SRC
RPF Calculation
§ Based on source address
§ Best path to source found in unicast
route table
§ Determines where to send join
§ Joins continue towards source
to build multicast tree
§ Multicast data flows down tree
B
60. REVERSE PATH FORWARDING (RPF)
R1
B C
D E
A
10.1.1.1
E1E0
E2Unicast Route Table
Network Intfc Nxt-Hop
10.1.0.0/24 E0 1.1.1.1
10.1.0.0/24 E1 1.1.2.1
1.1.2.11.1.1.1 Join
F
RPF Calculation
§ What if we have equal-cost paths?
‒ We can’t use both
§ Tie-breaker
‒ Use highest next-hop IP address
SRC
64. MULTICAST DISTRIBUTIONTREES
Receiver 1
B
E
A F
Source 1 Notation: (*, G)
* = All Sources
G = Group
C
Receiver 2
Source 2
(RP) PIM Rendezvous Point
Shared Tree
Source Tree
D (RP)
Shared Distribution Tree
65. MULTICAST DISTRIBUTIONTREES
§ Source or shortest path trees
‒ Uses more memory O (S x G) but you get optimal paths from source to all
receivers; minimizes delay
§ Shared trees
‒ Uses less memory O(G) but you may get suboptimal paths
from source to all receivers; may introduce extra delay
Characteristics of Distribution Trees
66. MULTICAST TREE CREATION
§ PIM join/prune control messages
‒ Used to create/remove distribution trees
§ Shortest path trees
‒ PIM control messages are sent toward the source
§ Shared trees
‒ PIM control messages are sent toward RP
68. MAJOR DEPLOYED PIMVARIANTS
§ PIM-SM
‒ ASM
Any Source Multicast/RP/SPT/shared tree
‒ SSM
Source Specific Multicast, no RP, SPT only
‒ BiDir
Bidirectional PIM, no SPT, shared tree only
71. PIM-SM SENDER REGISTRATION
Receiver
RP
Source
Shared Tree
Source Tree RP Sends a Register-Stop Back
to the First-Hop Router to Stop
the Register Process(S, G) Register-Stop (unicast)
Traffic Flow
(S, G) Register (unicast)
(S, G) Traffic Begins Arriving at
the RP via the Source Tree
73. PIM-SM SPT SWITCHOVER
Receiver
RP
(S, G) Join
Source
Source Tree
Shared Tree
Last-Hop Router Joins the
Source Tree
Additional (S, G) State Is
Created Along New Part of the
Source Tree
Traffic Flow
74. PIM-SM SPT SWITCHOVER
Receiver
RP
Source
Source Tree
Shared Tree
(S, G)RP-bit Prune
Traffic Begins Flowing
Down the New Branch of
the Source Tree
Additional (S, G) State Is Created
Along the Shared Tree to Prune Off
(S, G) Traffic
Traffic Flow
78. PIM-SM—EVALUATION
§ Effective for sparse or “dense” distribution of multicast receivers
§ Advantages
‒ Traffic only sent down “joined” branches
‒ Can switch to optimal source-trees for high traffic sources dynamically
(sounds clever but it actuallyswitches for all sources by default)
‒ Unicast routing protocol-independent
‒ Basis for interdomain, multicast routing
When used with MBGP, MSDP and/or SSM
79. SOURCE SPECIFIC MULTICAST - SSM
§ Assume a one-to-many multicast model
‒ Example: video/audio broadcasts, stock market data
§ Why does ASM need a shared tree?
‒ So that hosts and last-hop routers can learn who the active source is for the source
discovery
§ What if this was already known?
‒ Hosts could use IGMPv3 to signal exactly which (S, G) SPT to join
‒ The shared tree and RP wouldn’t be necessary
‒ Different sources could share the same group address and not interfere with each
other
§ Result: Source Specific Multicast (SSM)
§ RFC 3569: An Overview of Source Specific Multicast (SSM)
82. SSM—EVALUATION
§ Ideal for applications with one source sending to many receivers
§ Uses a simplified subset of the PIM-SM protocol
Simpler network operation
§ Solves multicast address allocation problems
Flows differentiated by both source and group
• Not just by group
Content providers can use same group ranges
• Since each (S,G) flow is unique
§ More secure
No “Bogus” source traffic
• Can’t consume network bandwidth
• Not received by host application
83. MANY-TO-MANY STATE PROBLEM
§ Creates huge amounts of (S,G) state
‒ State maintenance workloads skyrocket
High OIL fan-out makes the problem worse
‒ Router performance begins to suffer
§ Using shared trees only
‒ Provides some (S, G) state reduction
Results in (S, G) state only along SPT to RP
Frequently still too much (S, G) state
Need a solution that only uses (*, G) state
86. BIDIR PIM—EVALUATION
§ Ideal for many to many applications
§ Drastically reduces network mroute state
‒ Eliminates all (S,G) state in the network
SPTs between sourcesto RP eliminated
Source trafficflows both up and down shared tree
‒ Allows many-to-many applications to scale
Permits virtuallyan unlimited number of sources
88. PIM-SM ASM RP REQUIREMENTS
§ Group to RP mapping
‒ Consistent in all routers within the PIM domain
§ RP redundancy requirements
‒ Eliminate any single point of failure
89. HOW DOESTHE NETWORK LEARN RP
ADDRESS?
§ Static configuration
‒ Manually on every router in the PIM domain
§ AutoRP
‒ Originally a Cisco® solution
‒ Facilitated PIM-SM early transition
§ BSR
‒ draft-ietf-pim-sm-bsr
90. STATIC RPS
§ Hard-configured RP address
‒ When used, must be configured on every router
‒ All routers must have the same RP address
‒ RP failover not possible
Exception: if anycast RPs are used
§ Command
‒ ip pim rp-address <address> [group-list <acl>]
[override]
‒ Optional group list specifies group range
Default: range = 224.0.0.0/4 (includes auto-RP groups!)
‒ Override keyword “overrides” auto-RP information
Default: auto-RP learned info takes precedence
91. AUTO-RP—FROM 10,000 FEET
Announce Announce
AnnounceAnnounce
Announce Announce
AnnounceAnnounce
RP-Announcements Multicast to the
Cisco Announce (224.0.1.39) Group
A
C D
C-RP
1.1.1.1
C-RP
2.2.2.2
B
MA MA
92. AUTO-RP—FROM 10,000 FEET
C D
C-RP
1.1.1.1
C-RP
2.2.2.2
RP-Discoveries Multicast to the
Cisco Discovery (224.0.1.40) Group
MA MA
A B
98. L2 MULTICAST FRAME SWITCHING
Problem:Layer 2 Floodingof Multicast Frames
PIM
TypicalL2 switchestreat multicast traffic
as unknown or broadcast andmust
“flood” the frame to every port
Static entries can sometimes be set to
specify whichports should receive which
group(s) of multicast traffic
Dynamic configuration of these entries
would cut down on user administration
Multicast M
98
99. L2 MULTICAST FRAME SWITCHINGIGMPv1–v2Snooping
IGMP
IGMP
Switches become “IGMP”-aware
IGMP packets intercepted by the NMP
or by special hardware ASICs
Requires special hardware to maintain throughput
Switch must examine contents of IGMP messages to determine which ports want what traffic
IGMP membership reports
IGMP leave messages
Impact on low-end, Layer 2 switches
Must process all Layer 2 multicast packets
Admin load increases with multicast traffic load
Generally results in switch meltdown
PIM
99
100. L2 MULTICAST FRAME SWITCHING
Impact of IGMPv3on IGMPSnooping
IGMPv3 reports sent to separate group (224.0.0.22)
Switches listen to just this group
OnlyIGMP traffic—nodata traffic
Substantiallyreduces loadon switchCPU
Permits low-end switchesto implement IGMPv3 snooping
No report suppressionin IGMPv3
Enables individualmember tracking
IGMPv3 supports source-specificincludes/excludes
100
101. SUMMARY—FRAME SWITCHES
Switches with Layer 3-aware hardware/ASICs
High-throughput performancemaintained
Increasescost of switches
Switches without Layer 3-aware hardware/ASICs
Suffer serious performancedegradationor
even meltdown!
Shouldn’t be a problemwhen IGMPv3 is implemented
101
IGMP Snooping
103. MBGPOVERVIEW
MBGP: MultiprotocolBGP
Defined in RFC 2858 (extensionstoBGP)
Can carry different typesof routes
Unicast
Multicast
Both routes carried in same BGP session
Does not propagatemulticast state info
That’s PIM’s job
Same pathselection andvalidation rules
AS-Path,LocalPref, MED…
103
104. MBGPOVERVIEW
§ Separate BGP tables maintained
‒ Unicast prefixes for unicast forwarding
‒ Unicast prefixes for multicast RPF checking
§ AFI = 1, Sub-AFI = 1
‒ Contains unicast prefixes for unicast forwarding
‒ Populated with BGP unicast NLRI
§ AFI = 1, Sub-AFI = 2
‒ Contains unicast prefixes for RPF checking
‒ Populated with BGP multicast NLRI
105. MBGPOVERVIEW
MBGPAllows Divergent Paths and Policies
Same IP address holdsdual significance
Unicast routing information
Multicast RPF information
For same IPv4 addresstwo different NLRI with
different next-hops
Can therefore support both congruentand
incongruenttopologies
105
107. MSDP – MULTICASTSOURCE DISCOVERY PROTOCOL
§ RFC 3618
§ ASM only
‒ RPs knows about all sources in their domain
Sources cause a “PIM Register” to the RP
Tell RPs in other domains of it’s sources
Via MSDP SA (Source Active) messages
‒ RPs know about receivers in a domain
Receivers cause a “(*, G) Join” to the RP
RP can join the source tree in the peer domain
Via normal PIM (S, G) joins
‒ MSDP required for interdomain ASM source discovery
109. MSDPOVERVIEWMSDP Example
Domain C
Domain B
Domain D
Domain E
SA
SA
SA SA
SA
SA
Source Active
Messages
SA
Domain A
SA Message
192.1.1.1,224.2.2.2
SA Message
192.1.1.1,224.2.2.2
MSDP Peers
Register
192.1.1.1,224.2.2.2
RP
RP
RP
RP
RP
Receiver
Source
109
114. MSDPWRT SSM—UNNECESSARY
Domain C
Domain B
Domain D
Domain E
Domain A
ASM MSDP Peers
(Irrelevant to SSM)
Source in 232/8
Receiver Learns
S and G Out of
Band, i.e.,
Webpage
RP
RP
RP
RP
RP
Receiver
Source
Multicast Traffic
114
115. MSDPWRT SSM—UNNECESSARY
Domain C
Domain B
Domain D
Domain E
Domain A
ASM MSDP Peers
(Irrelevant to SSM)
Source in 232/8
Data flows natively
along the interdomain
source tree
RP
RP
RP
RP
RP
Receiver
Source
Multicast Traffic
115
116. ANYCAST RP—OVERVIEW
§ Redundant RP technique for ASM which uses MSDP
for RP synchronization
§ Uses single defined RP address
Two or more routers have same RP address
RP address defined as a loopback interface
Loopback address advertised as a host route
Senders and receivers join/register with closest RP
Closest RP determined from the unicast routing table
Because RP is statically defined
§ MSDP session(s) run between all RPs
Informs RPs of sources in other parts of network
RPs join SPT to active sources as necessary
119. INTERNET IP MULTICAST
§ We can build multicast distribution trees.
‒ PIM
§ We can RPF on interdomain sources
‒ MBGP
§ We no longer need (or want) network-based source discovery
‒ SSM
§ So interdomain IP Multicast is in every home, right?
121. IPV4VS. IPV6 MULTICAST
IP Service IPv4 Solution IPv6 Solution
Address Range 32-Bit, Class D 128-Bit(112-Bit Group)
Routing
Protocol-Independent
All IGPs and GBP4+
Protocol-Independent
All IGPs and BGP4+
with v6 Mcast SAFI
Forwarding
PIM-DM, PIM-SM:
ASM, SSM, BiDir
PIM-SM: ASM, SSM, BiDir
Group Management IGMPv1, v2, v3 MLDv1, v2
Domain Control Boundary/Border Scope Identifier
Interdomain Source Discovery
MSDP Across IndependentPIM
Domains
Single RP Within Globally Shared
Domains
122. IPV6 MULTICASTADDRESSES (RFC 3513)
1111 1111
128 Bits
8 Bits 8 Bits
FF
Flags
Scope Flags =
T or Lifetime, 0 if Permanent, 1 if Temporary
P Proposed for Unicast-Based Assignments
Others Are Undefined and Must Be Zero
TP
FF
8
Flags
4
0Scope
4
Interface-ID
Scope =
1 = interface-local 2 =
link 4 =
admin-local
5 = site
8 = organization
E = global
125. IPV6 ROUTING FOR MULTICAST
§ RPF-based on reachability to v6 source same as
with v4 multicast
§ RPF still protocol-independent
‒ Static routes, mroutes
‒ Unicast RIB: BGP, ISIS, OSPF, EIGRP, RIP, etc.
‒ Multiprotocol BGP (mBGP)
Support for v6 mcast subaddress family
127. RP MAPPING MECHANISMS FOR IPV6
§ Static RP assignment
§ BSR
§ Auto-RP—no current plans
§ Embedded RP
128. EMBEDDED RPADDRESSING—RFC3956
§ Proposed new multicast address type
Uses unicast-based multicast addresses (RFC 3306)
§ RP address is embedded in multicast address
§ Flag bits = 0RPT
R = 1, P = 1, T = 1 à Embedded RP address
§ Network-Prefix::RPadr = RP address
§ For each unicast prefix you own, you now also own:
16 RPs for each of the 16 multicast scopes (256 total) with 2^32 multicast groups assigned to
each RP (2^40 total)
FF
8
Flags
4
Network-Prefix
64
Scope
4
Rsvd
4
RPadr
4
Group-ID
32
Plen
8
129. EMBEDDED RPADDRESSING—
EXAMPLE
MulticastAddress with Embedded RPAddress
FF76:0130:1234:5678:9abc::4321
1234:5678:9abc::1
Resulting RP Address
FF
8
Flags
4
Network-Prefix
64
Scope
4
Rsvd
4
RPadr
4
Group-ID
32
Plen
8
129
130. MULTICAST LISTENER DISCOVER—MLD
§ MLD is equivalent to IGMP in IPv4
§ MLD messages are transported over ICMPv6
§ Version number confusion
‒ MLDv1 corresponds to IGMPv2
RFC 2710
‒ MLDv2 corresponds to IGMPv3, needed for SSM
RFC 3810
§ MLD snooping
‒ draft-ietf-magma-snoop-12.txt
131. NOWYOU KNOW…
§ Why multicast?
§ Multicast fundamentals
§ PIM protocols
§ RP choices
§ Multicast at Layer 2
§ Interdomain IP multicast
§ IPv6 Multicast bits
133. MULTICAST VPN SOLUTIONS
1
3
Four major components
A.Encapsulation
IP
MPLS
B.P-coreTree-building method
PIM
RSVP TE
mLDP
C. Auto-Discovery MVPN member PE
BGP
D. PE-PE C-mroute Exchange
PIM
BGP
MPLS
(LSM)
mVPN
mLDP
p-to-mp TE
PIM
BGP
A B D
• Multicast VPN’s are built on a tunnel infrastructure through the
provider core
• These are Multipoint Tunnels
• Multicast route signalling is using the Tunnel or an out of band
signalling protocol, like BGP or PIM over TCP
C
137. MULTICASTTREETYPES AND BUILDINGOPTIONS
1
3
P-Tree types
• Point-to-Multi Point(P2MP)
• Multi Point-to-Multi Point (MP2MP)
P-Tree building protocols
• PIM
• RSVP-TE
In contrast to PIM, leaves are specified at root, i.e. head-end driven
MPLS Encapsulation required
Extensions defined to build P2MP trees
Only option that supports constraint-based routing
• MLDP
Receiver-driven (like PIM)
MPLS encapsulation required
Extensions to LDP to support both MP2MP and P2MP LSPs
138. P2MPTUNNEL SETUP
MLDP
Each leaf node initiates P2MP LSPsetup by sending mLDPLabel Mapping message towards theroot, using unicast
routing
Label Mapping message carries theidentity ofthe LSP, encoded as P2MPFEC
Each intermediate node along thepath from a leaf to theroot propagates
mLDPLabel Mapping towards theroot, using unicast routing
Service Edge Distribution/
Access
CoreSource Receivers
R1 (CE)
R6 (CE)
R7 (CE)R5 (PE)
R4 (PE)
R3 (P)
R2 (PE)
Label mapping P2MP:
(FEC: 200, Root: R2,
Label: L5)
Label mapping P2MP:
(FEC: 200, Root: R2,
Label: L1)
Label mapping P2MP:
(FEC: 200, Root: R2,
Label: L7)
139. MLDPRSVP-TE
LSM SIGNALING
P2MP TREE
The egress (leaf) receives a PIM Join.
The Leafs sends a BGPA-D leaf to notify the ingress
PE
The ingress sends RSVP-TE Path messages to the
leaves
The leaves respond with RSVP-TE Resv messages
The core router received 6 updates.
The egress (leaf) receives a PIM Join.
The leaf sends a MLDP label mapping to the ingress
PE.
The core router received 3 update messages
140. CONTROL PLANE SCALE COMPARISON
Similarities
• Both are based on existing MPLS technology (LDP or RSVP TE)
• Both require changes to support Multicast
• Both support FRR
Differences
• RSVP-TE
Support bandwidth reservation
No MP2MP support
Periodic refreshof states
• MLDP
Support MP2MP LSPs
TCP based protocol - no periodic refreshof states
Less signaling and state to support an LSP, more scalable.
142. AUTO DISCOVERY
Auto discoveryis a processof discoveringwhichPEs support
whichVPNs
Auto discoverymechanismisindependentof core tree building
and customer mcast routes exchangemethods
Candidate protocolsare PIM and BGP
If PIM is alsothe P-Tree buildingprotocol, it makessense to use it
also for auto discovery(as PIM is leaf driven)
BGP alsoeffective for auto discovery
144. MULTICAST SIGNALING
EXCHANGINGCUSTOMER MULTICAST ROUTES
Mechanics used for customer mcast route exchange is independent of core
tree building and auto discovery methods
In draft-ietf-l3vpn-2547bis-mcast-10two optionsare specified:
Option 1: Per-mVPN PIM peering among the PEs
This is deployed today (draft-ietf-l3vpn-2547bis-mcast-10,
a.k.a draft-rosen)
Option 2: BGP
Analogousto RFC4364 exchangeofVPN-IPv4routes, but
with new MVPNAFI/SAFI
145. MVPN
OTHER FACTORSTO WATCH
Aggregation
Aggregate traffic into a single tunnel:less state in P-routers
Build individual treesfor each multicast group:optimal forwarding
Compromise:amount ofP-router state vs. optimal forwarding
Migration from existingGREMVPNs
Encapsulation:GRE-> MPLS
P-Tree building protocol:PIM-> RSVP-TEor mLDP
Change in tree building protocol and encapsulation method doesnot
require a change in method used today to exchange c-mcast routes (which
is PIM)
PE routers still need to run PIM – even when P routers become PIM-free
147. COMPONENTS OF DELAY IN IP / MPLS NETWORKS
The dominant causesofdelay in IP / MPLS networksare:
Propagation delay
Arising from speed-of-light delayson wide area links;~5msper 1000km for
optical fibre
Queuing delays– in switches and routers
Other componentsofdelay are negligible for linksof1Gbpsand over
Serialization delay:~10µsfor 1500byte packet at 1Gbps
Switching delay:typically~10µsper hop
Since propagationdelaysare a fixed property ofthe topology,delay and jitter
are Minimized when queuing delaysare Minimized
Queuing delaysdepend upon the traffic profile
148. 100%
0%
micro-bursts
failure & growth
measured traffic
24 hours
IP / MPLS TRAFFICCHARACTERISATION
1
4
Network traffic measurementsare
normally long term,i.e.in the order of
minutes
Implicitly the measured rate isan
average of the measurement
interval
In the short term, i.e. milliseconds,
however, microburstscause queueing,
impacting the delay,jitter and loss
What’s the relationshipbetween the
measured load and the short term
microbursts?
How much bandwidth needsto be
provisioned,relative to the measured
load,to achieve a particular SLA target?
150. 1 hop
Avg: 0.23 ms
P99.9: 2.02 ms
2 hops
Avg: 0.46 ms
P99.9: 2.68 ms
MULTI-HOP QUEUING
[TELKAMP]
Multi-hop delay is not additive (1Gbps)
151. QUEUING DELAY SUMMARY
QueuingSimulation:
– Gigabit Ethernet (backbone)link
Overprovisioning percentage in the order of 10% is required to bound delay/jitter
to less than 1 ms
– Lower speeds (<1G)
Overprovisioningfactorissignificant
– Higher speeds (2.5G/10G)
Overprovisioning factor becomes very small
P99.9multi-hop delay/jitter is not additive
http://www.denog.de/meetings/denog1/pdf/013-Telkamp-How_Full_is_Full.pdf
153. PACKET LOSS ISA MAJOR PROBLEM FORVIDEO
Video is very sensitive to packet drops.
The lossof an I-Frame resultsin the loss of the
reference frame from which subsequent P and B
frames depend.
It is thisdependency that causescumulative picture
degradation (“melt-down”) until the arrival of the
next valid I-Frame
1
5
§ With a 50ms outage, probability of an I-frame loss is 34%
§ Resulting visual impairment will be at min ~500-600ms (GOP size)
§ End user QoE for a 50ms optimized network is sameas a 500ms
optimized network!
§ BUT cost/complexity of a 50ms optimized network is much higher
Details of analysis can be found in “Not All Packets Are Equal: The Impact of Network Packet
Loss on Video Transport” , IEEE Internet Computing, Vol. 13, March/April 2009
Slice error
Pixelisation
Ghosting
154. MPEG : IMPACT OF PACKET LOSS
1
5
§ Single Packet loss can cause artifacts for the whole GOPperiod – 500ms (I frame pkt loss)
§ [GREENGRASS]: Jason Greengrass,John Evans, Ali C. Begen, “Not All Packets Are Equal: The Impact of Network PacketLoss on Video
Transport” – IEEE Internet Computing, Nov 08
§ http://www.employees.org/~jevans/videopaper/videopaper.html
0
200
400
600
800
1000
1200
0 100 200 300 400 500
Duration of packet loss (ms)
Durationofimpairment(ms)
SD-low -w orst
SD-low -best
SD-high-w orst
SD-high-best
HD-low -w orst
HD-low -best
HD-high-w orst
HD-high-best
155. NETWORKTECHNIQUES FOR
MANAGING LOSS Fast IP routing protocolconvergence
Implementation and protocol optimisations, availableon all Cisco routingplatforms
Delivers sub second convergencetimes for unicast and multicast
Multicast-only Fast Reroute (MoFRR)
Enablecreation of resilient multicast trees
Efficient approach to achieving sub 50msec convergencetimes
Can be applied to topologiesthat can support multiplediversemulticast trees
MPLS Traffic Engineering (TE) Fast Reroute (FRR)
Enables pre-calculated backup Traffic Engineerd tunnels
Used to protect against linkand nodefailuresfor rerouting in sub-100ms[RFC4090]
Requiresadditional complexity ofMPLS-TE and additional provisioned bandwidth
IPoDWDM Proactive Protection
Report Optical layer failureto IP layer to speed up routing convergenceevents
Reduces convergencetime to sub 20ms
Requiresrouter integrated optical transpondersasoffered in Cisco IPoDWDM solutions
1
5
156. FAST IGP CONVERGENCE
Network converges(reroutes) based on globalupdates(old IGP
convergenceor now localvia LFA) on a core networkfailure (link
or node)
Fast Convergence(FC)
ü Lowest bandwidth requirements in working and failure cases
ü Lowest solution cost and complexity
! Requires fast converging network to Minimisevisible impact of loss
O Is NOT hitless – ~200ms Loss of connectivity before connectivity is restored (~ less than 50ms with Loop Free Alternate Fast Reroute.
Configuration is done once, globally, per system: one command line!)
Video
Source
Core
Router
Core
Router
Edge
Distribution
Core
Router
Primary Stream
Edge
Distribution
Core
Router
Reconverged
Stream
O
157. LOCAL FASTCONVERGENCE– MOFRR
Edge router chooses best content based on local information.
Multicast only Fast Reroute (MoFRR)
ü Low bandwidth requirementsin working and failurecases(for IPTV networks)
ü Lowest solution cost and complexity
! RequiresMoFRR counter modecapability at edgeand only benefitsmulticast
O Is NOT hitless(@ present)– ~35ms Loss of connectivitybeforeflow is restored
Video
Source
Core
Router
Core
Router
Edge
Distribution
Core
Router
O
Edge
Distribution
Core
Router
158. MULTICASTONLY FAST REROUTE
MOFRR
Localenhancement to PIM
Egress onlyfeature
No changes needed in therest ofthe network
Phased approach
Control,Data plane and RTPout-of-seq triggers
Vidmonmetrics
Not hitlesstoday
~35loss of connectivitybefore flow is restored
Topologydependent
Requires ECMP paths
1
5
= IGMP Join
= PIM Join
= Mcast Tr
IP/MPLS
Core Network
Source
e1 e2
10.1.1.1
Receiver
X
MoFRR Recv
159. MULTICAST (SSM) FAST
CONVERGENCE (ASR9K)
Tested with 2500 IGP prefixes and 250k BGP routes, IOS XR 3.9.1
Tests show for SecDistribution
MoFRR delivers consistent sub 50ms
Convergence @ no operational cost
160. TI-MOFRR
OVERCOMING THE ECMP LIMITATION
• Simple deployable solution
• 100% Path Diversity (i.e.TE-like ERO)
• Works in any ECMP or Non-ECMP topologiessuch as mesh, ring,
hub-spoke,star, etc. …
• Consistent and predictable: sub 50 msec solution
• No loopsor micro-loopsin the Network
• Foundation to lossless stream-merge
161. EXPLICIT PATHVECTORTLV
Bringing Path-Diversity to Multicast
It’s like RSVP-TEERO
It allowsexplicit-routing ofPIMJoins or MLDP Joins
No loopsor microloops
Explicit PathVectorTLV Encoding:
(Example)
Multicast Source IP: S = 10.0.0.1
R1: 11.0.0.1
R2: 12.0.0.1
R3: 13.0.0.1
R4: 14.0.0.1
1
6
S
IP:10.0.0.1
R1
R2R3R4
R5R6
Rx
IGMP Join
PIM Join
PIM Join
162. TI-MOFRR
CREATION OF PRIMARY/BACKUPTREE USING 2 MROUTES
IGMP Join
(S1,G)
PIM Join
(S1,G)
PIM Join
(S2,G)
TI-MoFRR
Leaf Multicast Router
TI-MoFRR Egress Multicast Router:
1. Explicit Path Vector TLV used for PIM-Tree explicit routing
2. Original (S1,G) PIM Join forwarded towards S1 source to build Primary tree
3. Cloned (S2,G) PIM Join used to build backup PIM tree
163. TI-MOFRR NATIVE MULTICAST
SOLUTION
SINGLED/DUALHOMED SOURCE
(S1,G)
(S1,G)
(S2,G)
(S1,G)
(S1,G)
PE1
PE3
PE4
Ingress TI-MoFRR Functions:
1. Clone (S1,G)
2. Re-write S1 to S2 => (S2,G)
Egress TI-MoFRR Functions:
1. Perform MoFRR
2. Specify S1/S2 prefixes in MoFRR
3. Re-write S2 to S1 => (S1,G)
Ingress
Demarcation
Egress
Demarcation
Any Transport
Between PEs
Protection
Domain
PE2
(S2,G)
164. VIDMON
QUALITYTRIGGERED TI-MOFRR RESILIENCY SOLUTION
(S1,G)
Primary
(S1,G)
Backup
(S2,G)
TI-MoFRR
Egress Multicast Router
VidmonVidmon
MDI(0:0:12)MDI(24:34:223)
Vidmon and TI-MoFRR Integration Solution Details:
1. Vidmon monitors MPEG MDI quality of Primary and Backup TI-MoFRR flow
2. Vidmon result at end of Monitoring Interval:
• Primary MDI is Poor with impairment
• Backup MDI is Good without impairment
3. Vidmon instructs TI-MoFRR switchover from Primary to Backup
165. LOCAL FAST CONVERGENCE
P2MP FAST REROUTE
Network reconverges (reroutes) based on local information (LOS) on a core link failure
Fast Reroute (FRR)
ü Lowest bandwidth requirements inworking and failure cases
! Medium solution cost and complexity
! Requires fast converging network to Minimise visible impact of loss
O Is NOT hitless – ~50ms Loss of connectivity before connectivity is restored
Video
Source
Core
Router
Core
Router
Edge
Distribution
Core
Router
Primary Stream
Edge
Distribution
Core
Router
FRR Stream
O
Core
Router
167. IPODWDM PROACTIVE
PROTECTION
IP / optical integration enablesthe
capability to identifydegraded link using
optical data (pre-FECBER) and start
protection (i.e.by signaling to the IGP)
before traffic starts failing,achieving
hitless protection in many cases Trans-
ponder
SR
port
on
router
WDM
port
on
router
Optical impairments
Correctedbits
FEC limit
Working
path
Switchover
lost data
Protected
path
BER
LOF
Optical impairments
Correctedbits
FEC limit
Protection
trigger
Working path Protect path
BER
Near-
hitless
switch
WDM WDM
FEC
FEC
Proactive protection
Router Has NoVisibility intoOptical
Transport Network
Pre-FEC FRR Fault
Packet Loss (ms)
Highest Lowest Average
No Optical-switch 11.47 11.54 11.37
No Noise-injection 7404.00 1193.00 4305.00
No Fibre-pull 28.81 18.52 21.86
No PMD-injection 129.62 122.51 125.90
Yes Optical-switch 11.50 11.18 11.37
Yes Noise-injection 0.02 0.00 0.00
Yes Fibre-pull 11.05 0.00 3.23
Yes PMD-injection 0.08 0.00 0.02