2. TIPC FEATURES
Service Addressing
Some similarity to Unix Domain Sockets, but cluster wide and with more features
Service addresses are translated on-the-fly to system internal port numbers and node addresses
A socket can be bound to multiple service addresses
UDP or L2 Based Messaging Service with Three Modes
Datagram mode with unicast, anycast, multicast
Connection mode with stream or message oriented transport
Message bus mode with unicast, anycast, multicast, broadcast
Service and Topology Tracking
Subscription/event functionality for node and service addresses
Using this, users can continuously track presence of nodes, sockets, addresses and connections
Feedback about service availability or cluster topology changes is immediate
Fully automatic neighbor discovery
Implemented as Linux Kernel Driver
Present in main stream Linux (kernel.org) and major distros
Name space / container support
Accessed via regular socket API
3. TIPC == SIMPLICITY
No Need to Configure or Lookup Addresses
Addresses refer to services - not locations
Service addresses are always valid - can be hard-coded
No Need to Configure Node Identities
But you may if you want to
Must tell each node which interfaces to use
No need to actively monitor processes or nodes
No need for users to do active heart-beating
User will learn about changes - if he wants to know
Easy synchronization when starting an application process
First, bind to own service address, if any
Second, subscribe for service addresses you want to track
Third, start communicating as services become available
4. A service address consists of two parts, assigned by the developer
A 32-bit service type number – typically hard-coded
A 32-bit service instance number – typically calculated by user in run time
A service address is always qualified by a scope indicator
Indicating lookup scope on the calling side
node != 0 indicates that lookup should be performed only on that node
node == 0 indicates cluster global lookup
Indicating visibility scope on the binding side
Dedicated values for node local or cluster global visibility
SERVICE ADDRESSING
struct tipc_service_addr{
uint32_t type;
uint32_t instance;
};
Server Process
bind(type = 42,
instance = 2,,
scope = cluster)
bind(type = 42,
instance = 1,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
5. No restrictions on how to bind service addresses
Different service addresses can be bound to same socket
Same service address can be bound to different sockets
“Anycast” lookup with round-robin selection
Service address ranges can be bound to a socket
Only one service address per socket in message bus mode
SERVICE BINDING
struct tipc_service_range{
uint32_t type;
uint32_t lower;
uint32_t upper;
};
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
bind(type = 42,
instance = 2,
scope = cluster)
bind(type = 666,
instance = 17,
scope = node)
6. LOCATION TRANSPARENCY
Client never needs to know location of server
Translation from service address to socket address performed
on-the-fly at source node
Replica of global binding table for translation on each node
User can still indicate explicit socket address if he wants to
struct tipc_socket_addr{
uint32_t port;
uint32_t node;
};
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
port=123456
port=98765
port=763456
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
bind(type = 42,
instance = 2,
scope = cluster)
bind(type = 666,
instance = 17,
scope = node)
7. Reliable transport socket to socket
Receive buffer overload protection
No end-to-end flow control
Messages may still be rejected by receiving socket
Rejected messages may be dropped or returned to sender
Configurable in sending socket
If returned, message is truncated and equipped with an error code
Unicast, Anycast or Multicast
Depends on indicated address type
DATAGRAM MODE
Server Process
bind(type = 42,
instance = 2,,
scope = cluster)
bind(type = 42,
instance = 1,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
8. CONNECTION MODE
Established by using service address
One-way setup (a.k.a. “0-RTT”) using data-carrying messages
Traditional TCP-style setup/shutdown also available
Stream- or message oriented
End-to-end flow control for buffer overflow protection
No socket level sequence numbers, acknowledges or retransmissions
Link layer takes care of that
Connection breaks immediately if peer becomes unavailable
Leverages link level heartbeats and kernel/socket cleanup functionality
No socket level “keepalive” heartbeats needed
Node
Node
Socket
Process
Socket
Process
Socket
Process
Socket
Process
9. Communication Groups - brokerless bus instances
User instantiated
Same addressing properties (service addressing) as datagram mode
Different traffic properties, - no dropped or rejected messages
Four different message distribution methods
Delivery and sequence order guaranteed, even between different distribution methods
Leveraging L2 broadcast / UDP multicast when possible and deemed favorable
End-to-end flow control
Messages never dropped because of destination buffer overflow
Same mechanism covers all distribution methods
Point-to-multipoint, - “sliding window” algorithm
Multipoint-to-point, - “coordinated sliding window”
MESSAGE BUS MODE
Available from Linux 4.14
10. Members are sockets
Groups are closed, - members can only exchange messages with other sockets in same group
Each socket has two addresses: a <port:node> tuple bound by the system and a <group:member>
tuple bound by the user
<group:member> is a tipc service address, i.e., the same as <type:instance>
Member sockets may optionally deliver join/leave events for other members in the group
Membership events are just empty messages delivered along with the source member’s two addresses
The TIPC binding table serves as registry and distribution channel for member identities and events
join(<group:member>) TIPC
Distributed
Binding Table recvmsg(OOB,
<group:member>,
<port:node>);
leave() TIPC
Distributed
Binding Table recvmsg(OOB|EOR,
<group:member>,
<port:node);
recvmsg(OOB|EOR,
<group:member>,
<port:node>);
TIPC
Distributed
Binding Table
GROUP MEMBERSHIP
12. Users can subscribe for contents of the global address binding table
Receive events at each change matching the range in the subscription
There is a match when
Bound/unbound instance or range overlaps with range subscribed for
Received events contain the bound socket’s service address and socket address
SERVICE TRACKING
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
port=123456
port=98765
port=763456
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
subscribe(type = 42,
lower = 0,
upper = 10)
bind(type = 42,
instance = 2,
scope = cluster)
13. Special case of service tracking
Using same mechanism, - based on service binding table contents
Represented by the built-in service type zero (== “node availability”)
It is also possible to subscribe for availability of individual links
CLUSTER TOPOLOGY TRACKING
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
Client Process
subscribe(type = 0,
lower = 0,
upper = ~0)
14. NODE TO NODE LINKS
“L2.5” reliable link layer
Guarantees delivery and sequentiality for all packets
Acts as trunk for multiple connections, and keeps track of those
Keeps track of peer node’s address bindings in local replica of the binding table
Supervised by heartbeats at low traffic
Failure detection tolerance configurable from 50 ms to 10 s, - default 1.5 s
“Lost service address” events issued for bindings from peer node at lost contact
Breaks all connections to peer node at lost contact
Several links per node pair
Load sharing or active-standby, - but max two active
Disturbance-free failover to remaining link, if any
Node
Node
Socket
Process
Socket
Process
Socket
Process
Socket
Process
Socket
Process
Socket
Process
15. NEIGHBOR DISCOVERY
Nodes have a 128 bit node identity
By default assigned by system (from Linux 4.16)
Can also be set by user, e.g. a host name or a UUID
The identity is internally hashed into a guaranteed unique 32 bit node address
This is the node address used by the protocol
Clusters have a 32 bit cluster identity
Can be assigned by user if anything different from default value is needed
All nodes using the same cluster identity will establish mutual links
One link per interface, maximum two active links per node pair
Cluster identity determines network
Neighbor discovery by UDP multicast or L2 broadcast
If no broadcast/multicast support, discovery can be performed by explicitly configured IP addresses
<1.1.3>
Cluster id: 4711
Node id: goethe
Node #: 2f1c0ab4
Cluster id: 4711
Node id: schiller
Node #: 78fca34
Cluster id: 4711
Node id: heine
Node #: 8cfba40
Cluster id: 4711
Node id: brandes
Node #: c7f413cb
Cluster id: 4711
Node id: ibsen
Node #: f5430cba
Cluster id: 110956
Node id: 95719650-3c19-
11e8-b467-0ed5f89f718b
Node #: 8fa4ab00
Cluster id: 110956
Node id: 6c5719a38-38a6-
33b8-b467-0ed5f89f718b
Node #: 97df4a1b
Cluster id: 110956
Node id: 48719650-ba63-
12c8-b467-0ed5f89f77f2
Node #: 6f774bc4
Cluster id: 110956
Node id: 83719650-4c7b-
14b8-b467-0ed5f89f717a
Node #: 016a3f02
16. ➢ Sort all cluster nodes into a circular list
▪ All nodes use same algorithm and
criteria
➢ Select next [√N] - 1 downstream nodes in
the list as “local domain” to be actively
monitored
▪ CPU load increases by ~√N
➢ Distribute a record describing the local
domain to all other nodes in the cluster
➢ Select and monitor a set of “head” nodes
outside the local domain so that no node is
more than two active monitoring hops away
▪ There will be [√N] - 1 such nodes
▪ Guarantees failure discovery even at
accidental network partitioning
➢ Each node now monitors 2 x (√N – 1)
neighbors
• 6 neighbors in a 16 node cluster
• 56 neighbors in an 800 node cluster
➢ All nodes use this algorithm
➢ In total 2 x (√N - 1) x N actively monitored
links
• 96 links in a 16 node cluster
• 44,800 links in an 800 node cluster
+ x N =
(√N – 1) Local Domain
Destinations
(√N – 1) Remote
“Head” Destinations
2 x (√N – 1) x N Actively
Monitored Links
SCALABILITY
Overlapping Ring Monitoring Algorithm
Since Linux 4.7, TIPC comes with a unique auto-adaptive hierarchical neighbor monitoring algorithm.
This makes it possible to establish full-mesh clusters of 1000 nodes with a failure discovery time of 1.5 sec
17. PERFORMANCE
Latency times better than on TCP
~33% faster than TCP inter-node
2 times faster than TCP intra-node for 64 byte messages
7 times faster than TCP intra-node for 64 kB messages
TIPC transmits socket-to-socket instead of via the loopback interface
Throughput still somewhat lower than TCP
~65-90 % of max TCP throughput inter-node
Seems to be environment dependent
But 25-30% better than TCP intra-node
We are working on this….
18. Link
ARCHITECTURE
Socket Socket Socket
Ethernet Infiniband
Media Plugins
VxLAN UDP
Link Link Link Link
Binding Table
Topology
Service
Node Node
Link
Node
C Library
External: Carrier Media
L2/Internal: Fragmentation/Bundling/
Retransmission/Congestion Control
L3: Destination Lookup
L4: Connection Handling, Flow Control
Node Table
User Land Python
Socket
L2/Internal: Link Aggregation/
Synchronization/Failover/
Neighbor Discovery/Supervision
User App Go
19. API
Socket API
The original TIPC API
TIPC C API
Simpler and more intuitive
Available as libtipc from the tipcutils package at SourceForge
Python, Perl, Ruby, D, Go
But not yet for Java
ZeroMQ
Not yet with full features
More to come…
20. WHEN TO USE TIPC
TIPC does not replace IP based transport protocols
It is a complement to be used under certain conditions
It is an IPC
TIPC may be a good option if you
Need a high performing, configuration free, brokerless, message bus
Want startup synchronization and service discovery for free
Have application components that need to keep continuous watch on each other
Need short latency times
Traffic is heavily intra node or intra subnet
Don’t want to bother with cluster configuration
Are inside a security perimeter
Or can use IPSec or MACSec
21. WHO IS USING TIPC?
Ericsson mobile and fix core network systems
IMS, PGW, SGW, HSS…
Routers/switches such as SSR, AXE
Hundreds of installed sites
Tens of thousands of nodes
Tens of millions of subscribers
WindRiver
Mission critical system for Sikorsky Aircraft’s helicopters
Cisco
onePK, IOS-XE Software, NX-OS Software
Mirantis
OpenStack
Nokia, Huawei and numerous other companies and institutions
22. MORE INFORMATION
TIPC home page
http://tipc.sourceforge.net
TIPC project page
http://sourceforge.net/project/tipc
TIPC Demo/Test/Utility programs
http://sourceforge.net/project/tipc/files
TIPC Communication Groups
https://www.slideshare.net/JonMaloy/tipc-communication-groups
TIPC Overlapping Ring Neighbor Monitoring
https://www.youtube.com/watch?v=ni-iNJ-njPo
TIPC protocol specification (somewhat dated)
http://tipc.sourceforge.net/doc/draft-spec-tipc-10.html
TIPC programmer’s guide (somewhat dated)
http://tipc.sourceforge.net/doc/tipc_2.0_prog_guide.html