3. We all understand ARP, right?
• Messages carried directly on Ethernet
EtherType 0x806
• Device sends broadcast request
Who has x.x.x.x?
• Receivers check target against local addresses
• If it matches they send a unicast reply
• Result is cached
All nodes on the network need to process all ARP Requests.
High levels of ARP and you are going to have a bad day.
4. • Defined in http://tools.ietf.org/html/rfc4861
• Messages are carried within ICMPv6
• Includes:
• Router and prefix discovery
• Address resolution and neighbor unreachability detection
• Redirect function
• Address resolution is most relevant from IXP perspective
IPv6 Neighbor Discovery
5. Router and prefix discovery
• The main point on RD: “Don’t do it on the exchange”
• We have seen an increase in the number of members
sending RAs
• Please check your config and make sure you have it disabled
• We are improving our instrumentation and will be getting
more proactive
• This is an MoU violation, and will result in a chase
6. • Analogous to ARP query message
“I know your IP, what’s your MAC?”
• ICMPv6 Type 135, Code 0.
• Can be sent unicast to refresh neighbor cache
• Can be multicast to discover uncached neighbors
• Uses last 24-bits of target address to construct multicast destination
Target: 2001:7f8:4::1553:2
Destination: ff02::1:ff53:2
Group MAC: 33:33:ff:53:00:02
• RFC recommends no more than 1 solicitation per second per target
• Unicast solicitation used to refresh stale entry before removing
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Target Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+-+-+-+-+-+-+-+-
Neighbor Solicitation
7. Neighbor Advertisement
• Analogous to ARP reply message
• ICMPv6 Type 136, Code 0.
• R, S & O flags to indicate advertisement type
R & O flags outside scope here
• Can be sent unsolicited [S=0] (like gratuitous ARP)
In which case uses all nodes multicast address
• IP source can be any address on same interface as target
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|S|O| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Target Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+-+-+-+-+-+-+-+-
9. Unknown unicast
• VPLS is just a virtual switch – still needs to
learn MAC addresses
• Ports going down immediately flush database
entries causing short bursts of flooding while
MAC is relearnt
• Unidirectional flows can result in longer term
flooding if the destination ages out of the
database
• Stale routes can direct traffic to unknown
macs leading to extended flooding
• ARP can flush fdb entries on XOS (bug)
• We are investigating ways to better mitigate.
10. So why use multicast if it goes everywhere?
• A well designed NIC will filter in hardware
• ARP queries go to a single (broadcast)
destination and will always need to be
punted up the stack
• Neighbor solicitations are distributed over
a large number of multicast groups. Most
of them can be filtered out in hardware
11. More on NIC Filtering
• Ideally a NIC would have enough filter space for all subscribed groups
• Reality is that space is limited
• Different cards take different approaches
• Fallback to promiscuous mode
• Promiscuous for all multicast
• Hash the group address, accept any groups that hash to same value
• Caveat emptor. Know your hardware limits.
12. [linx-ops] LINX London Juniper LAN weirdness
• Nov 19th
2014 22:28 – Massive
increase in non-unicast traffic
• Investigation shows member with
fibre issue
• 2x10GE LAG, one link bouncing
• Member router not happy, sending
massive numbers of neighbor
solicitations
• Maxed out at around 3kp/s
• Caused instability for a number of
other members
13. [linx-ops] LINX London Juniper LAN weirdness
• “IXPWatch” is good at spotting this
for ARP
• Turns out not so good for IPv6 NS
• IPv6 NS stats were added to report
easily
• Detection and alerting still has room
for improvement
14. A note on addressing on LINX peering LANs
• LINX recommended IPv6 Address:
2001:7f8:4:{LAN}::{ASN}:1/64
• LAN administered by LINX
• ASN converted to hex, not BCD
• Examples:
LINX (5459) on Juniper LAN
2001:7f8:4::1553:1
LINX (8714) on IXCardiff
2001:7f8:4:4::220a:1
15. So how does that work with Neighbor Solicitations?
• LINX recommended IPv6 Address
2001:7f8:4:{LAN}::{ASN}:1/64
• Solicited nodes multicast address
33:33:ff:{A}:00:01
• A is the low order octet of the ASN
• 5th
byte is almost always zero
• 550+ unique member ASNs share 229 last octets
• Most group addresses match at least 2 members
• Some as high as 7
• Still much better than ARP
17. How busy is IPv6?
• Around 0.7% of traffic on Juniper LAN
• Follows very similar diurnal pattern to IPv4
• Not just BGP and monitoring – real traffic
18. How does ARP vs NS look?
wat?
There are more neighbor solicitations than ARP requests on the Juniper LAN
19. How do the distributions compare?
• Median interval between repeated
ARP requests is 8s
• Median for NS is only 4s
• ARP intervals more distributed
• NS has strong peaks at 1s, 3-5s
• Smaller peak at approx 60s
20. ND may attempt to be more efficient than ARP, but it sure seems chatty
• Repeat offenders? Maybe…
Top 5% of senders account for 34% of
requests*
• Down neighbors?
strong peak at 1s suggests retries
about 80% of destinations down
• I think we have a winner…
* Based on analysis of peak hour flooded traffic
What is causing the difference?
21. Could we / Should we do something?
• Obvious reaction might be to suggest higher RETRANS_TIMER value
• Before jumping to that conclusion we should ask
“Does it matter that there is more ND than ARP?”
• NS Addressing makes it easier for nodes to cope
• Extending timer also makes unreachability detection slower