Our job might be to build web applications, but we can't build apps that rely on networking if we don't know how these networks and the big network that connects them all (this thing called the Internet) actually work.
I'll walk through the basics of networking, then dive a lot deeper (from TCP/UDP to IPv4/6, source/destination ports, sockets, DNS and even BGP).
Prepare for an eye-opener when you realize how much a typical app relies on all of these (and many more) working flawlessly... and how you can prepare your app for failure in the chain.
8. Who am I ?
Wim Godden (@wimgtr)
Founder of Cu.be Solutions (https://cu.be)
Founder of Techpath Training Services (https://techpath.eu)
Open Source developer since 1997
Developer of PHPCompatibility, OpenX, ...
Speaker at PHP and Open Source conferences
9. Who are you ?
Developers ?
System engineers ?
Network engineers ?
Do you know how the Internet works ?
10. We’re dev/devops/sysops, not network engineers !
Know enough to build new stuff
Know enough to maintain existing stuff
What if...
Customer Support Desk Dev/devops
11. Do you know these ?
TCP
UDP
IP
DNS
BGP
MAC address
IPv4
IPv6
SYN
ACK
Source port
Destination port
Default gateway
Routing table
12. Basics : OSI model
Physical
Layer 1
Data Link
Layer 2
Network
Layer 3
Transport
Layer 4
Session
Layer 5
Presentation
Layer 6
Application
Layer 7
Wires, network card, wireless interface
Data protocol (ethernet, ...)
IP adressing
TCP, UDP, ports, ...
TLS, L2TP, SOCKS, PPTP, ...
Serialization, data translation
HTTP, DNS, SMTP, ...
14. Basics : packets
Packets always consist of :
Header
Contents
Packets contain other packets :
Packet type #1 header
Packet type #1 contents
Packet type #2 header
Packet type #2 contents
Packet type #3 header
etc.
15. Destination MAC (6 bytes) Source MAC (6 bytes) Type (2 bytes)
Payload (46 – 1500 bytes) CRC (4 bytes)
0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Version Header length DSCP ECN Total length
32 Identification Flags Fragment Offset
64 Time To Live Protocol Header Checksum
96 Source IP Address
128 Destination IP Address
160 Options (if required)
< Contents of the packet >
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Sequence number
64 Acknowledgment number
96 Data offset Flags Window size
128 Checksum Urgent pointer
160 Options (if required)
< Contents of the packet >
16. Basics : packets
Destination MAC (6 bytes) Source MAC (6 bytes) Type (2 bytes)
Payload (46 – 1500 bytes) CRC (4 bytes)
Part 1 : Ethernet frame
0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Version Header
length
DSCP ECN Total length
32 Identification Flags Fragment Offset
64 Time To Live Protocol Header Checksum
96 Source IP Address
128 Destination IP Address
160 Options (if required)
< Contents of the packet >
Part 2 : IPv4 header (min. 160 bytes)
Part 3 : TCP/UDP/… header and data
17. Basics : TCP packet
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Sequence number
64 Acknowledgment number
96 Data
offset
Flags Window size
128 Checksum Urgent pointer
160 Options (if required)
< Contents of the packet >
18. Basics : packets
Destination MAC (6 bytes) Source MAC (6 bytes) Type (2 bytes)
Payload (46 – 1500 bytes) CRC (4 bytes)
Part 1 : Ethernet frame
19. Sending on a local network
Pure forwarding of packets using a hub
Problem :
Multiple devices sending at same time
→ network collision
→ packet retransmit at TTL
Layer 1
20. Sending on a local network
Each network device (port) has a MAC address
Assigned by manufacturer
Can be overwritten (for VM or failover)
Same physical network → send packet to MAC address
Switch knows MAC address(es) of devices and forwards traffic
Layer 2
21. Sending IP traffic on local network
Requires IP addresses
Where to send ? We need to know MAC address
Uses ARP (Address Resolution Protocol) for lookup
Stores IP ↔ MAC relation in ARP table
What’s “local” ?
→ Same IP subnet
OK, what’s a subnet ?
Layer 3
16:58:56.933019 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.15 tell 192.168.0.12, length 28
16:58:56.938019 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.0.15 is-at 00:50:56:8b:6a:b7, length 46
22. IP addressing (IPv4)
IPv4 addressing = CIDR notation
xxx.xxx.xxx.xxx where 0 <= xxx <= 255
0.0.0.0 → 255.255.255.255
In reality :
8 bits 8 bits 8 bits 8 bits
11000000 00000100 00100000 00000001
192 . 4 . 32 . 1
Total amount of IP addresses available :
256 * 256 * 256 * 256 = 28
* 28
* 28
* 28
= 232
= 4.3 billion
IP networking requires :
IP address
Subnet mask
23. Subnet mask
Defines the range to which the IP belongs
IPs within the same range can talk to each other directly (local)
IP range : 194.50.97.0 – 194.50.97.255
Subnet mask : 255.255.255.0
or
Subnet mask : /24
→ 194.50.97.5 and 194.50.97.20 are on the same local network
24. Subnet mask
Typical notation uses a “mask” :
192.168.0.0 → 192.168.0.255 = 192.168.0.0/24
IPv4 provides 232
addresses
A /24 mask gives 2(32-24)
or 28
addresses = 256 addresses
Local network ranges :
10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16
25. Given a range 194.7.1.0/24
If you want 8 addresses for servers
/28
2(32-28)
= 24
= 2 * 2 * 2 * 2 = 16
Each subnet has 1 network address and 1 broadcast address
Each subnet needs a default gateway
16 – 3 = 13 usable addresses
Subnet = 194.7.1.0/28 or 194.7.1.0/255.255.255.240
This subnet doesn't have to be at the beginning :
194.7.1.16/28, 194.7.1.32/28, etc.
Subnets always start at a multiple of their number of addresses
Combinations make perfect sense too
194.7.1.0/25 = 2^(32-25) = 2^7 = 128 194.7.1.0 -194.7.1.127
194.7.1.128/27 = 2^(32-27) = 2^5 = 32 194.7.1.128-194.7.1.159
194.7.1.160/28 = 2^(32-28) = 2^4 = 16 194.7.1.160-194.7.1.177
194.7.1.178/28 = 2^(32-28) = 2^4 = 16 194.7.1.178-194.7.1.183
26. A little gem : is an IP inside a range
function ip_in_network($ip, $net_addr, $net_mask){
if ($net_mask <= 0) {
return false;
}
$ip_bin_string = sprintf("%032b", ip2long($ip));
$net_bin_string = sprintf("%032b", ip2long($net_addr));
return (substr_compare($ip_bin_string, $net_bin_string, 0, $net_mask) === 0);
}
27. IP addressing
“I think there is a world market for maybe five computers”
“640K is more memory than anyone will ever need”
“4.3 billion IP addresses is more than enough”
28. IP addressing (IPv6)
Created to solve lack of IP addresses (4.3 billion in IPv4)
Standard created in 90s (published in 1998)
Deployed on most major sites, but small sites behind
Addresses :
IPv4 address : 192.168.0.1
IPv6 address : 2001:0db8:0000:0000:0000:0000:0370:7334
Abbreviated : 2001:0db8::0370:7334
Can’t talk to eachother !
Address space :
2128
= 340,282,366,920,938,463,463,374,607,431,770,000,000
Client deployment rates (source : Google) :
Global : 22.24% (13.12% in June 2017)
US : 35.32% (29.78% in June 2017)
Canada : 23.27% (16.58% in June 2017)
Belgium : 53.28% (48.42% in June 2017)
Should you use it ? YES ! (But don’t forget about firewalling !)
29. Sending IP traffic on local network
Client
192.168.0.15/24
Server
192.168.0.2/24
MAC for
192.168.0.2 ?
AA:BB:CC:DD:EE:FF
Let’s talk !
30. How do IP packets find their way ? → Routing !
Each (Layer 3) network node has a routing table
Can be viewed easily :
Linux : route or route -n
Windows : route print
Flags :
U = Up
G = Gateway
Non-G routes are routes defined by the network interface
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.1 0.0.0.0 UG 204 0 0 eth1
10.0.0.0 0.0.0.0 255.255.0.0 U 204 0 0 eth1
10.0.64.0 192.168.201.101 255.255.192.0 UG 0 0 0 eth0
192.168.201.0 0.0.0.0 255.255.255.0 U 202 0 0 eth0
31. Sending IP traffic to remote device
Requires IP addresses
Where to send ?
Can not use ARP : MAC addresses are not shared beyond local network
Uses routing table
Matching route ? Send to the gateway specified
No matching route ? Send to default gateway
Provided by DHCP or
Set statically
Must be on same subnet → address found in ARP table
Layer 3
32. Sending IP traffic to remote device
Requires IP addresses
Where to send ?
Can not use ARP : MAC addresses are not shared beyond local network
Uses routing table
Matching route ? Send to the gateway specified
No matching route ? Send to default gateway
Provided by DHCP or
Set statically
Must be on same subnet → address found in ARP table
Layer 3
Client Router ServerInternet
192.168.0.15 192.168.0.1 194.7.1.4
Default gatewayMAC for
192.168.0.1 ?
AA:BB:CC:DD:EE:FF
Destination : AA:BB:CC:DD:EE:FF
Contents : TCP packet to 194.7.1.4
See ARP table : arp -a
See default gateway : route -n (Lin)
route print (Win)
98.12.31.42
33. Basics : TCP packet
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Sequence number
64 Acknowledgment number
96 Data
offset
Flags Window size
128 Checksum Urgent pointer
160 Options (if required)
< Contents of the packet >
34. Establishing a TCP connection
Client Server
SYN
SYN ACK
ACK
Data
Sequence no = 1002Acknowledge no = 9001
Sequence no = 1000
Sequence no = 1001
Acknowledge no = 9000
35. Establishing a TCP connection
Client Server
SYN
SYN ACK
ACK
Data
Sequence no = 1002Acknowledge no = 9001
Sequence no = 1000
Sequence no = 1001
Acknowledge no = 9000
Brussels Montreal45ms
0
45
90
135
36. Establishing a TCP connection
Client Server
SYN
SYN ACK
ACK
Data
Sequence no = 1002Acknowledge no = 9001
Sequence no = 1000
Sequence no = 1001
Acknowledge no = 9000
Brussels Montreal45ms
0
45
90
135
London10ms
10
20
30
37. TCP Window Size
Client Server
Brussels Montreal
SYN
SYN ACK
rwnd = 8192
rwnd = 8192
rwnd = 16384
ACK
DATA
sysctl net.ipv4.tcp_window_scaling
40. New vs existing connection
Client Server
Brussels Montreal45ms
0
45
90
135
280
325
SYN
SYN ACK
ACK
DATA
(x8)
ACK
(x4)
GET /url
DATA
(x4)
ACK
(x8) 370
415
Processing request
235
41. New vs existing connection
Client Server
GET /url
Brussels Montreal45ms
0
45
145
180
DATA
DATA
(x12)
ACK
(x12)
Processing request
225
42. TCP Performance
Upgrade to latest Linux kernel or OS
Check window size
Reduce latency (move servers closer to client)
Reuse already established connections
44. SSL/TLS with Session Resumption
Client Server
0
45
90
135
180
225
SYN
ClientHello
SYN ACK
ACK
ServerHello
ChangecipherSpec
Finished
ChangeCipherSpec
Finished
DATA
270
315
45. TLS → HSTS
HSTS = HTTP Strict Transport Security
Remembers that a site is HTTPS-only
Prevents users from going to http:// then redirected to https://
Prevents leaking of session cookies over unsecured wifi
46. UDP
User Datagram Protocol
Unreliable Datagram Protocol
Connectionless
→ No 3-way handshake required
Simple packet structure
Packets might not arrive
Packets might arrive out of order
Ideal for streaming, gaming, ...
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Length Checksum
< Contents of the packet >
47. TCP/UDP ports
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Sequence number
64 Acknowledgment number
96 Data offset Flags Window size
128 Checksum Urgent pointer
160 Options (if required)
< Contents of the packet >
Bit 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31
0 Source port Destination port
32 Length Checksum
< Contents of the packet >
TCP
UDP
48. Source and Destination ports
Destination port : defined by service
HTTP : TCP port 80
HTTPS : TCP port 443
DNS : UDP port 53
Source port : for identification of a connection
Client Server
80
80
80
5000
5001
5002
See active connections with
source/destination ports :
netstat -n
49. Fetching a website
Need to fetch https://cu.be
TCP doesn’t know what cu.be is
→ needs an IP address
Looks up IP address through DNS
Open a socket
Connect to IP address on port 443
Send HTTPS request over the connection
Get data back
Get images, CSS, javascript over the same connection
Close the connection
Show the webpage
50. DNS lookups
Through a DNS server
Authoritative : in charge of the domain name
Recursive : asks the authoritative server, then caches for a while
→ Cache time is defined by TTL
Usually you will use a recursive server (owned by your provider)
Client Recursive
DNS Server
IP for
cu.be ? Root DNS
server
IP for
cu.be ?
.be DNS
server
cu.be DNS
server
Ask the .be
DNS server
IP for
cu.be ?
Ask the cu.be
DNS server
IP forcu.be ?
194.50.97.38
194.50.97.38
51. DNS lookups
Actual lookups depend on type of DNS record
DNS holds lots of things :
A record = pointer to IPv4 addresses
AAAA record = pointer IPv6 addresses
CNAME records = aliases for A records
MX records = mail servers
NS records = DNS servers
TXT = various stuff (anti-spam mostly)
2 tools to debug DNS :
dig
nslookup
52. DNS fallback
Each domain has (should have) at least 2 DNS servers
Order is not important (round robin)
DNS = UDP based (port 53)
→ no acknowledgment
→ timeout after x seconds
→ tries other DNS server(s)
→ Can also work on TCP, but less often used
53. Sockets
The layer between your application and TCP, UDP, ...
Abstracts syntax
Makes it easy to switch between protocols
Provides an easy interface
No need to know implementation
Send a stream of data → split up in packets
Receive lots of data → converted from packets to string
See open sockets ?
→ netstat (-n)
54. Packets over the Internet
Client Router ServerInternet
192.168.0.15 192.168.0.1 194.7.1.4
BGP = Border Gateway Protocol
BGP protocol decides how packets are routed
Each public network has AS (Autonomous System) number
AS3356 = Level3
AS39628 = Cu.be
BGP announces subnets over BGP to its uplink providers :
“AS39628 here… you can reach 194.50.97.0/24 through me”
57. BGP routing
Looks up the IP range of destination → AS number
Looks at shortest number of AS hops in BGP routing table
If multiple routes found → calculate based on preference settings
Send packet to BGP gateway
58. The problem with mobile devices
Mobile devices switch between towers
Good mobile network → no problem
Poor mobile network → IP changes, lost packets, …
Three-way handshake is time consuming for slow connections
→ Use HTTP/2
→ Keep connections active
Apache :
KeepAlive on
KeepAliveTimeout 15
Nginx :
keepalive_timeout 60
Latency + jitter
59. HTTP
It’s what we use every day ;-)
There’s a “new” version : HTTP/2
Developed by Google as SPDY
Designed for speed
Multiple simultaneous requests/responses in 1 connection
Binary format (pro : more efficient – con : harder to debug)
TLS/SSL encryption is standard
Built-in prioritization
Server Push
Header compression
Try it out
Deploy it !
60. HTTP/2 – get it running
Apache (v2.4+)
Needs mod_http2
Add “Protocols h2 http/1.1” either globally or to a VirtualHost
Choose a strong SSLCipherSuite !
Nginx (v1.9.5+)
Add “http2” to the listen line
Make sure “ssl_prefer_server_ciphers” is set to on
Make sure the “ssl_ciphers” are set correctly
61. See IP information
ip addr : shows IPs, MAC addresses, port status, etc.
ifconfig : similar output, but includes packet and byte count
route (-n) : shows routing table
netstat (-n) : shows active connections
netstat -l -p : shows listening ports and processes
tcpdump : command-line based Wireshark
62. Network trouble example
Customer X
150.000 visits/day
News ticker :
XML feed from other site (owned by same customer)
Cached for 15 min
63. Customer X – fetching the feed
if (filectime(APP_DIR . '/tmp/cacheFile.xml') < time() - 900) {
unlink(APP_DIR . '/tmp/cacheFile.xml');
file_put_contents(
APP_DIR . '/tmp/cacheFile.xml',
file_get_contents('http://www.scrambledsitename.be/xml/feed.xml')
);
}
$xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/cacheFile.xml');
What's wrong with this code ?
64. Customer X – no feed without the source
Feed source
65. Customer X – no feed without the source
Feed source
66. Customer X : timeout
default_socket_timeout : 60 sec by default
Each visitor : 60 sec wait time
People keep hitting refresh → more load
More active connections → more load
Apache hits maximum connections → entire site down
72. Customer X : process early
$context = stream_context_create(
array(
'http' => array(
'timeout' => 5
)
)
);
if (filectime(APP_DIR . '/tmp/cacheFile.xml') < time() - 900) {
$feed = file_get_contents(
'http://www.scrambledsitename.be/xml/feed.xml',
false,
$context
);
if ($feed !== false) {
file_put_contents(
APP_DIR . '/tmp/cacheFile.xml',
ParseXmlFeed($feed)
);
}
73. Customer X : file_[get|put]_contents atomicity
if (filectime(APP_DIR . '/tmp/cacheFile.xml') < time() - 900) {
$feed = file_get_contents(
'http://www.scrambledsitename.be/xml/feed.xml',
false,
$context
);
if ($feed !== false) {
file_put_contents(
APP_DIR . '/tmp/cacheFile.xml',
ParseXmlFeed($feed)
);
}
}
Relying on user → concurrent requests → possible data corruption
Better : run every 15min through cronjob
74. Network resources
Use timeouts for all :
fopen
curl
SOAP
…
Data source trusted ?
→ setup a webservice
→ let them push updates when their feed changes
→ less load on data source
→ no timeout issues
Logging → early detection
75. Dealing with timeouts
Possible options :
Show an error to the user, then bail out
Retry the request
(and bail out if it fails again)
Ignore the timeout if you can
Fall back to a cached version
Don’t show the data you were trying to collect
None of these are perfect, but all of them are better than waiting 60 seconds and then
showing an unhandled error !
76. Sendig HTTP requests : rights and wrongs
Right :
Use a library
Examples : guzzle/guzzle, rmccue/requests, krisswallsmith/buzz (also available for React),
nategood/httpful
Sort-of-ok :
Using curl
Wrong :
file_get_contents (or similar) on a URL
fsockopen to port 80, then sending ‘GET / HTTP/1.0’, …
78. Connecting to services
Always handle failures on connection
Fallback to cache
Fallback to secondary service
At least show a nice error message
Did I mention logging and alerting ?
Another example :
$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');
$channel = $connection->channel();
79. Connecting to services
Always handle failures on connection
Fallback to cache
Fallback to secondary service
At least show a nice error message
Did I mention logging and alerting ?
Another example :
$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');
$channel = $connection->channel();
try {
$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');
} catch (AMQPTimeoutException $e) {
// Do something nice for the user… they’re your user after all
}
$channel = $connection->channel();
80. Async for multiple or slow requests
Need multiple pieces of data → handle them asynchronously
PHP has amazing asynchronous libraries
Pthreads
ReactPHP
Icicle
Amp
...
Slow requests → asynchronous again or queue them
RabbitMQ
Zeromq
...
82. Tools to simulate bad networks - Linux
IPTables
iptables -A INPUT -m statistic --mode random --probability 0.1 -j DROP
iptables -A OUTPUT -m statistic --mode random --probability 0.1 -j DROP
TC (Traffic Control)
tc qdisc add dev eth0 root netem delay 50ms 20ms distribution normal
tc qdisc change dev eth0 root netem reorder 0.02 duplicate 0.05 corrupt
0.01
Comcast (https://github.com/tylertreat/comcast)
“Simulating shitty network connections so you can build better systems”
Uses IPTables + TC in an intelligent way
85. Failover, disaster recover are great...
… if they work !
Should be tested at least once per year
If it doesn’t work, top priority to fix it
Includes :
Network failover
Network configuration recovery from backup
System failover
System restore from backup
Don&apos;t recognize terms
Seem familiar, but don&apos;t know what they do exactly
→ right place.
Layer 7 is where web applications reside, but as you can see it builds on top of 6 other layers → might be useful to know more about those
Every piece of information that we want to send over the network is sent in a packet or, in most cases, packets.
A packet is transmitted over the wire or wireless as electrical or radio signals and converted back to bits and bytes on the other end.
Types :
IPv4, ARP, Wake-on-LAN, VLANs, IPv6, HSR, …
DSCP & ECN = Type of Service / Quality of Service
Protocol = TCP, ICMP, etc.
Types :
IPv4, ARP, Wake-on-LAN, VLANs, IPv6, HSR, …
DSCP & ECN = Type of Service / Quality of Service
Protocol = TCP, ICMP, etc.
CIDR = Classless Inter-Domain Routing
3 way handshake
3 way handshake
&lt; 100ms = instantaneous
100ms – 300ms = laggy
&gt; 300ms = sluggish
&gt; 1sec = mental context switch
Importance of not closing connections !
Also, Linux kernels 3.7+ have TCP Fast Open = send data in SYN request
Need to send lots of data → need to send lots of packets
Flow control : set window size
Old maximum = 64KByte
Newer Linux kernels = 1GByte
If we need to send lots of data
Window size is important, but not only thing
Sending sequentially means waiting 90ms after every packet !
Slow start allows you to send multiple packets
Doubles number of packets sent every time
Exponential growth
When packet is lost → back to the previous value
Show MX records
Show DNS records (set Q=NS)
Show TXT records
Server on which feed located : crashed
Fine for few minutes (cache)
15 minutes : file_get_contents uses default_socket_timeout
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot
Better, not perfect.
What else is wrong ?
Multiple visitors hit expiring cache
→ file delete
→ xml feed hit a lot