SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
TIPC
Transparent Inter Process Communication
“Cluster Domain Sockets”
by Jon Maloy
TIPC FEATURES
Service Addressing
 Some similarity to Unix Domain Sockets, but cluster wide and with more features
 Service addresses are translated on-the-fly to system internal port numbers and node addresses
 A socket can be bound to multiple service addresses
UDP or L2 Based Messaging Service with Three Modes
 Datagram mode with unicast, anycast, multicast
 Connection mode with stream or message oriented transport
 Message bus mode with unicast, anycast, multicast, broadcast
Service and Topology Tracking
 Subscription/event functionality for node and service addresses
 Using this, users can continuously track presence of nodes, sockets, addresses and connections
 Feedback about service availability or cluster topology changes is immediate
 Fully automatic neighbor discovery
Implemented as Linux Kernel Driver
 Present in main stream Linux (kernel.org) and major distros
 Name space / container support
 Accessed via regular socket API
TIPC == SIMPLICITY
No Need to Configure or Lookup Addresses
 Addresses refer to services - not locations
 Service addresses are always valid - can be hard-coded
No Need to Configure Node Identities
 But you may if you want to
 Must tell each node which interfaces to use
No need to actively monitor processes or nodes
 No need for users to do active heart-beating
 User will learn about changes - if he wants to know
Easy synchronization when starting an application process
 First, bind to own service address, if any
 Second, subscribe for service addresses you want to track
 Third, start communicating as services become available
A service address consists of two parts, assigned by the developer
 A 32-bit service type number – typically hard-coded
 A 32-bit service instance number – typically calculated by user in run time
A service address is always qualified by a scope indicator
 Indicating lookup scope on the calling side
 node != 0 indicates that lookup should be performed only on that node
 node == 0 indicates cluster global lookup
 Indicating visibility scope on the binding side
 Dedicated values for node local or cluster global visibility
SERVICE ADDRESSING
struct tipc_service_addr{
uint32_t type;
uint32_t instance;
};
Server Process
bind(type = 42,
instance = 2,,
scope = cluster)
bind(type = 42,
instance = 1,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
No restrictions on how to bind service addresses
 Different service addresses can be bound to same socket
 Same service address can be bound to different sockets
 “Anycast” lookup with round-robin selection
 Service address ranges can be bound to a socket
 Only one service address per socket in message bus mode
SERVICE BINDING
struct tipc_service_range{
uint32_t type;
uint32_t lower;
uint32_t upper;
};
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
bind(type = 42,
instance = 2,
scope = cluster)
bind(type = 666,
instance = 17,
scope = node)
LOCATION TRANSPARENCY
Client never needs to know location of server
 Translation from service address to socket address performed
on-the-fly at source node
 Replica of global binding table for translation on each node
 User can still indicate explicit socket address if he wants to
struct tipc_socket_addr{
uint32_t port;
uint32_t node;
};
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
port=123456
port=98765
port=763456
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
bind(type = 42,
instance = 2,
scope = cluster)
bind(type = 666,
instance = 17,
scope = node)
Reliable transport socket to socket
 Receive buffer overload protection
 No end-to-end flow control
 Messages may still be rejected by receiving socket
Rejected messages may be dropped or returned to sender
 Configurable in sending socket
 If returned, message is truncated and equipped with an error code
Unicast, Anycast or Multicast
 Depends on indicated address type
DATAGRAM MODE
Server Process
bind(type = 42,
instance = 2,,
scope = cluster)
bind(type = 42,
instance = 1,
scope = cluster)
Server Process
Client Process
sendto(type = 42,
instance = 2,
node = 0)
CONNECTION MODE
Established by using service address
 One-way setup (a.k.a. “0-RTT”) using data-carrying messages
 Traditional TCP-style setup/shutdown also available
Stream- or message oriented
 End-to-end flow control for buffer overflow protection
 No socket level sequence numbers, acknowledges or retransmissions
 Link layer takes care of that
Connection breaks immediately if peer becomes unavailable
 Leverages link level heartbeats and kernel/socket cleanup functionality
 No socket level “keepalive” heartbeats needed
Node
Node
Socket
Process
Socket
Process
Socket
Process
Socket
Process
Communication Groups - brokerless bus instances
 User instantiated
 Same addressing properties (service addressing) as datagram mode
 Different traffic properties, - no dropped or rejected messages
 Four different message distribution methods
 Delivery and sequence order guaranteed, even between different distribution methods
 Leveraging L2 broadcast / UDP multicast when possible and deemed favorable
End-to-end flow control
 Messages never dropped because of destination buffer overflow
 Same mechanism covers all distribution methods
 Point-to-multipoint, - “sliding window” algorithm
 Multipoint-to-point, - “coordinated sliding window”
MESSAGE BUS MODE
Available from Linux 4.14
Members are sockets
 Groups are closed, - members can only exchange messages with other sockets in same group
 Each socket has two addresses: a <port:node> tuple bound by the system and a <group:member>
tuple bound by the user
 <group:member> is a tipc service address, i.e., the same as <type:instance>
 Member sockets may optionally deliver join/leave events for other members in the group
 Membership events are just empty messages delivered along with the source member’s two addresses
 The TIPC binding table serves as registry and distribution channel for member identities and events
join(<group:member>) TIPC
Distributed
Binding Table recvmsg(OOB,
<group:member>,
<port:node>);
leave() TIPC
Distributed
Binding Table recvmsg(OOB|EOR,
<group:member>,
<port:node);
recvmsg(OOB|EOR,
<group:member>,
<port:node>);
TIPC
Distributed
Binding Table
GROUP MEMBERSHIP
Unicast
28
60
34
7
28
60
34
7
Anycast
Multicast Broadcast
28
60
34
7
28
60
34
7
sendto(SOCKET,<port:node>);
recvmsg(<group:member>,
<port:node>);
recvmsg(<group:member,
<port:node>);
recvmsg(<group:member>,
<port:node>);
recvmsg(<group:member>,
<port:node>);
send();
sendto(SERVICE,<group:member>);
sendto(MCAST,<group:member>);
Received messages are delivered with both source addresses
GROUP MESSAGING
Users can subscribe for contents of the global address binding table
 Receive events at each change matching the range in the subscription
There is a match when
 Bound/unbound instance or range overlaps with range subscribed for
Received events contain the bound socket’s service address and socket address
SERVICE TRACKING
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
port=123456
port=98765
port=763456
Server Process
bind(type = 42,
lower = 2,
upper = 20,
scope = cluster)
Server Process
Client Process
subscribe(type = 42,
lower = 0,
upper = 10)
bind(type = 42,
instance = 2,
scope = cluster)
Special case of service tracking
 Using same mechanism, - based on service binding table contents
 Represented by the built-in service type zero (== “node availability”)
 It is also possible to subscribe for availability of individual links
CLUSTER TOPOLOGY TRACKING
Node #9a6004c1
Node #1a6b7ce0
Node #c1f10e72
Client Process
subscribe(type = 0,
lower = 0,
upper = ~0)
NODE TO NODE LINKS
“L2.5” reliable link layer
 Guarantees delivery and sequentiality for all packets
 Acts as trunk for multiple connections, and keeps track of those
 Keeps track of peer node’s address bindings in local replica of the binding table
Supervised by heartbeats at low traffic
 Failure detection tolerance configurable from 50 ms to 10 s, - default 1.5 s
 “Lost service address” events issued for bindings from peer node at lost contact
 Breaks all connections to peer node at lost contact
Several links per node pair
 Load sharing or active-standby, - but max two active
 Disturbance-free failover to remaining link, if any
Node
Node
Socket
Process
Socket
Process
Socket
Process
Socket
Process
Socket
Process
Socket
Process
NEIGHBOR DISCOVERY
Nodes have a 128 bit node identity
 By default assigned by system (from Linux 4.16)
 Can also be set by user, e.g. a host name or a UUID
 The identity is internally hashed into a guaranteed unique 32 bit node address
 This is the node address used by the protocol
Clusters have a 32 bit cluster identity
 Can be assigned by user if anything different from default value is needed
 All nodes using the same cluster identity will establish mutual links
 One link per interface, maximum two active links per node pair
Cluster identity determines network
 Neighbor discovery by UDP multicast or L2 broadcast
 If no broadcast/multicast support, discovery can be performed by explicitly configured IP addresses
<1.1.3>
Cluster id: 4711
Node id: goethe
Node #: 2f1c0ab4
Cluster id: 4711
Node id: schiller
Node #: 78fca34
Cluster id: 4711
Node id: heine
Node #: 8cfba40
Cluster id: 4711
Node id: brandes
Node #: c7f413cb
Cluster id: 4711
Node id: ibsen
Node #: f5430cba
Cluster id: 110956
Node id: 95719650-3c19-
11e8-b467-0ed5f89f718b
Node #: 8fa4ab00
Cluster id: 110956
Node id: 6c5719a38-38a6-
33b8-b467-0ed5f89f718b
Node #: 97df4a1b
Cluster id: 110956
Node id: 48719650-ba63-
12c8-b467-0ed5f89f77f2
Node #: 6f774bc4
Cluster id: 110956
Node id: 83719650-4c7b-
14b8-b467-0ed5f89f717a
Node #: 016a3f02
➢ Sort all cluster nodes into a circular list
▪ All nodes use same algorithm and
criteria
➢ Select next [√N] - 1 downstream nodes in
the list as “local domain” to be actively
monitored
▪ CPU load increases by ~√N
➢ Distribute a record describing the local
domain to all other nodes in the cluster
➢ Select and monitor a set of “head” nodes
outside the local domain so that no node is
more than two active monitoring hops away
▪ There will be [√N] - 1 such nodes
▪ Guarantees failure discovery even at
accidental network partitioning
➢ Each node now monitors 2 x (√N – 1)
neighbors
• 6 neighbors in a 16 node cluster
• 56 neighbors in an 800 node cluster
➢ All nodes use this algorithm
➢ In total 2 x (√N - 1) x N actively monitored
links
• 96 links in a 16 node cluster
• 44,800 links in an 800 node cluster
+ x N =
(√N – 1) Local Domain
Destinations
(√N – 1) Remote
“Head” Destinations
2 x (√N – 1) x N Actively
Monitored Links
SCALABILITY
Overlapping Ring Monitoring Algorithm
Since Linux 4.7, TIPC comes with a unique auto-adaptive hierarchical neighbor monitoring algorithm.
This makes it possible to establish full-mesh clusters of 1000 nodes with a failure discovery time of 1.5 sec
PERFORMANCE
Latency times better than on TCP
 ~33% faster than TCP inter-node
 2 times faster than TCP intra-node for 64 byte messages
 7 times faster than TCP intra-node for 64 kB messages
 TIPC transmits socket-to-socket instead of via the loopback interface
Throughput still somewhat lower than TCP
 ~65-90 % of max TCP throughput inter-node
 Seems to be environment dependent
 But 25-30% better than TCP intra-node
 We are working on this….
Link
ARCHITECTURE
Socket Socket Socket
Ethernet Infiniband
Media Plugins
VxLAN UDP
Link Link Link Link
Binding Table
Topology
Service
Node Node
Link
Node
C Library
External: Carrier Media
L2/Internal: Fragmentation/Bundling/
Retransmission/Congestion Control
L3: Destination Lookup
L4: Connection Handling, Flow Control
Node Table
User Land Python
Socket
L2/Internal: Link Aggregation/
Synchronization/Failover/
Neighbor Discovery/Supervision
User App Go
API
Socket API
 The original TIPC API
TIPC C API
 Simpler and more intuitive
 Available as libtipc from the tipcutils package at SourceForge
Python, Perl, Ruby, D, Go
 But not yet for Java
ZeroMQ
 Not yet with full features
More to come…
WHEN TO USE TIPC
TIPC does not replace IP based transport protocols
 It is a complement to be used under certain conditions
 It is an IPC
TIPC may be a good option if you
 Need a high performing, configuration free, brokerless, message bus
 Want startup synchronization and service discovery for free
 Have application components that need to keep continuous watch on each other
 Need short latency times
 Traffic is heavily intra node or intra subnet
 Don’t want to bother with cluster configuration
 Are inside a security perimeter
 Or can use IPSec or MACSec
WHO IS USING TIPC?
Ericsson mobile and fix core network systems
 IMS, PGW, SGW, HSS…
 Routers/switches such as SSR, AXE
 Hundreds of installed sites
 Tens of thousands of nodes
 Tens of millions of subscribers
WindRiver
 Mission critical system for Sikorsky Aircraft’s helicopters
Cisco
 onePK, IOS-XE Software, NX-OS Software
Mirantis
 OpenStack
Nokia, Huawei and numerous other companies and institutions
MORE INFORMATION
TIPC home page
http://tipc.sourceforge.net
TIPC project page
http://sourceforge.net/project/tipc
TIPC Demo/Test/Utility programs
http://sourceforge.net/project/tipc/files
TIPC Communication Groups
https://www.slideshare.net/JonMaloy/tipc-communication-groups
TIPC Overlapping Ring Neighbor Monitoring
https://www.youtube.com/watch?v=ni-iNJ-njPo
TIPC protocol specification (somewhat dated)
http://tipc.sourceforge.net/doc/draft-spec-tipc-10.html
TIPC programmer’s guide (somewhat dated)
http://tipc.sourceforge.net/doc/tipc_2.0_prog_guide.html

Mais conteúdo relacionado

Mais procurados

Tổng quan về Access List
Tổng quan về Access List Tổng quan về Access List
Tổng quan về Access List nguyenhoangbao
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and DriversKernel TLV
 
Implementing BGP Flowspec at IP transit network
Implementing BGP Flowspec at IP transit networkImplementing BGP Flowspec at IP transit network
Implementing BGP Flowspec at IP transit networkPavel Odintsov
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
Router and Routing Protocol Attacks
Router and Routing Protocol AttacksRouter and Routing Protocol Attacks
Router and Routing Protocol AttacksConferencias FIST
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch YongKi Kim
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin
 
An intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocolAn intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocolIftach Ian Amit
 
Open Shortest Path First
Open Shortest Path FirstOpen Shortest Path First
Open Shortest Path FirstKashif Latif
 
Dhcp & dhcp relay agent in cent os 5.3
Dhcp & dhcp relay agent in cent os 5.3Dhcp & dhcp relay agent in cent os 5.3
Dhcp & dhcp relay agent in cent os 5.3Sophan Nhean
 
Introduction of tcp, ip & udp
Introduction of tcp, ip & udpIntroduction of tcp, ip & udp
Introduction of tcp, ip & udprahul kundu
 

Mais procurados (20)

Ospf area types
Ospf area typesOspf area types
Ospf area types
 
Tổng quan về Access List
Tổng quan về Access List Tổng quan về Access List
Tổng quan về Access List
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 
Implementing BGP Flowspec at IP transit network
Implementing BGP Flowspec at IP transit networkImplementing BGP Flowspec at IP transit network
Implementing BGP Flowspec at IP transit network
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Router and Routing Protocol Attacks
Router and Routing Protocol AttacksRouter and Routing Protocol Attacks
Router and Routing Protocol Attacks
 
CS6551 COMPUTER NETWORKS
CS6551 COMPUTER NETWORKSCS6551 COMPUTER NETWORKS
CS6551 COMPUTER NETWORKS
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Ospf
OspfOspf
Ospf
 
DHCP
DHCPDHCP
DHCP
 
Linux
Linux Linux
Linux
 
An intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocolAn intoroduction to the IS-IS IGP routing protocol
An intoroduction to the IS-IS IGP routing protocol
 
IP Multicasting
IP MulticastingIP Multicasting
IP Multicasting
 
Open Shortest Path First
Open Shortest Path FirstOpen Shortest Path First
Open Shortest Path First
 
Dhcp & dhcp relay agent in cent os 5.3
Dhcp & dhcp relay agent in cent os 5.3Dhcp & dhcp relay agent in cent os 5.3
Dhcp & dhcp relay agent in cent os 5.3
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Mininet Basics
Mininet BasicsMininet Basics
Mininet Basics
 
Introduction of tcp, ip & udp
Introduction of tcp, ip & udpIntroduction of tcp, ip & udp
Introduction of tcp, ip & udp
 

Semelhante a TIPC Overview

2.communcation in distributed system
2.communcation in distributed system2.communcation in distributed system
2.communcation in distributed systemGd Goenka University
 
Training Day Slides
Training Day SlidesTraining Day Slides
Training Day Slidesadam_merritt
 
Networking in college
Networking in collegeNetworking in college
Networking in collegeHarpreet Gaba
 
Filterbased addressing protocol for effective node auto configuration in ad h...
Filterbased addressing protocol for effective node auto configuration in ad h...Filterbased addressing protocol for effective node auto configuration in ad h...
Filterbased addressing protocol for effective node auto configuration in ad h...varun priyan
 
16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)Jeff Green
 
module11-ospf(Open Shortest Path First).ppt
module11-ospf(Open Shortest Path First).pptmodule11-ospf(Open Shortest Path First).ppt
module11-ospf(Open Shortest Path First).pptElectro00
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2amreshrai02
 
Introduction to Computer Networks and Network Security.pptx
Introduction to Computer Networks and Network Security.pptxIntroduction to Computer Networks and Network Security.pptx
Introduction to Computer Networks and Network Security.pptxShehanMarasinghe1
 
MC0087 Internal Assignment (SMU)
MC0087 Internal Assignment (SMU)MC0087 Internal Assignment (SMU)
MC0087 Internal Assignment (SMU)Krishan Pareek
 
Socket Programming TCP:IP PPT.pdf
Socket Programming TCP:IP PPT.pdfSocket Programming TCP:IP PPT.pdf
Socket Programming TCP:IP PPT.pdfPraveenKumar187040
 
group11_DNAA:protocol stack and addressing
group11_DNAA:protocol stack and addressinggroup11_DNAA:protocol stack and addressing
group11_DNAA:protocol stack and addressingAnitha Selvan
 
Fundamentals of Networking
Fundamentals of NetworkingFundamentals of Networking
Fundamentals of NetworkingIsrael Marcus
 

Semelhante a TIPC Overview (20)

Networking basics
Networking basicsNetworking basics
Networking basics
 
2.communcation in distributed system
2.communcation in distributed system2.communcation in distributed system
2.communcation in distributed system
 
Training Day Slides
Training Day SlidesTraining Day Slides
Training Day Slides
 
Networking in college
Networking in collegeNetworking in college
Networking in college
 
Filterbased addressing protocol for effective node auto configuration in ad h...
Filterbased addressing protocol for effective node auto configuration in ad h...Filterbased addressing protocol for effective node auto configuration in ad h...
Filterbased addressing protocol for effective node auto configuration in ad h...
 
Presentasi cisco
Presentasi ciscoPresentasi cisco
Presentasi cisco
 
Firewalls
FirewallsFirewalls
Firewalls
 
16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)
 
module11-ospf.ppt
module11-ospf.pptmodule11-ospf.ppt
module11-ospf.ppt
 
module11-ospf(Open Shortest Path First).ppt
module11-ospf(Open Shortest Path First).pptmodule11-ospf(Open Shortest Path First).ppt
module11-ospf(Open Shortest Path First).ppt
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2
 
Introduction to Computer Networks and Network Security.pptx
Introduction to Computer Networks and Network Security.pptxIntroduction to Computer Networks and Network Security.pptx
Introduction to Computer Networks and Network Security.pptx
 
TCPIP
TCPIPTCPIP
TCPIP
 
MC0087 Internal Assignment (SMU)
MC0087 Internal Assignment (SMU)MC0087 Internal Assignment (SMU)
MC0087 Internal Assignment (SMU)
 
Socket Programming TCP:IP PPT.pdf
Socket Programming TCP:IP PPT.pdfSocket Programming TCP:IP PPT.pdf
Socket Programming TCP:IP PPT.pdf
 
group11_DNAA:protocol stack and addressing
group11_DNAA:protocol stack and addressinggroup11_DNAA:protocol stack and addressing
group11_DNAA:protocol stack and addressing
 
socket programming
 socket programming  socket programming
socket programming
 
socket programming
socket programming socket programming
socket programming
 
Fundamentals of Networking
Fundamentals of NetworkingFundamentals of Networking
Fundamentals of Networking
 
Wan
WanWan
Wan
 

Último

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Último (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

TIPC Overview

  • 1. TIPC Transparent Inter Process Communication “Cluster Domain Sockets” by Jon Maloy
  • 2. TIPC FEATURES Service Addressing  Some similarity to Unix Domain Sockets, but cluster wide and with more features  Service addresses are translated on-the-fly to system internal port numbers and node addresses  A socket can be bound to multiple service addresses UDP or L2 Based Messaging Service with Three Modes  Datagram mode with unicast, anycast, multicast  Connection mode with stream or message oriented transport  Message bus mode with unicast, anycast, multicast, broadcast Service and Topology Tracking  Subscription/event functionality for node and service addresses  Using this, users can continuously track presence of nodes, sockets, addresses and connections  Feedback about service availability or cluster topology changes is immediate  Fully automatic neighbor discovery Implemented as Linux Kernel Driver  Present in main stream Linux (kernel.org) and major distros  Name space / container support  Accessed via regular socket API
  • 3. TIPC == SIMPLICITY No Need to Configure or Lookup Addresses  Addresses refer to services - not locations  Service addresses are always valid - can be hard-coded No Need to Configure Node Identities  But you may if you want to  Must tell each node which interfaces to use No need to actively monitor processes or nodes  No need for users to do active heart-beating  User will learn about changes - if he wants to know Easy synchronization when starting an application process  First, bind to own service address, if any  Second, subscribe for service addresses you want to track  Third, start communicating as services become available
  • 4. A service address consists of two parts, assigned by the developer  A 32-bit service type number – typically hard-coded  A 32-bit service instance number – typically calculated by user in run time A service address is always qualified by a scope indicator  Indicating lookup scope on the calling side  node != 0 indicates that lookup should be performed only on that node  node == 0 indicates cluster global lookup  Indicating visibility scope on the binding side  Dedicated values for node local or cluster global visibility SERVICE ADDRESSING struct tipc_service_addr{ uint32_t type; uint32_t instance; }; Server Process bind(type = 42, instance = 2,, scope = cluster) bind(type = 42, instance = 1, scope = cluster) Server Process Client Process sendto(type = 42, instance = 2, node = 0)
  • 5. No restrictions on how to bind service addresses  Different service addresses can be bound to same socket  Same service address can be bound to different sockets  “Anycast” lookup with round-robin selection  Service address ranges can be bound to a socket  Only one service address per socket in message bus mode SERVICE BINDING struct tipc_service_range{ uint32_t type; uint32_t lower; uint32_t upper; }; Server Process bind(type = 42, lower = 2, upper = 20, scope = cluster) Server Process Client Process sendto(type = 42, instance = 2, node = 0) bind(type = 42, instance = 2, scope = cluster) bind(type = 666, instance = 17, scope = node)
  • 6. LOCATION TRANSPARENCY Client never needs to know location of server  Translation from service address to socket address performed on-the-fly at source node  Replica of global binding table for translation on each node  User can still indicate explicit socket address if he wants to struct tipc_socket_addr{ uint32_t port; uint32_t node; }; Node #9a6004c1 Node #1a6b7ce0 Node #c1f10e72 port=123456 port=98765 port=763456 Server Process bind(type = 42, lower = 2, upper = 20, scope = cluster) Server Process Client Process sendto(type = 42, instance = 2, node = 0) bind(type = 42, instance = 2, scope = cluster) bind(type = 666, instance = 17, scope = node)
  • 7. Reliable transport socket to socket  Receive buffer overload protection  No end-to-end flow control  Messages may still be rejected by receiving socket Rejected messages may be dropped or returned to sender  Configurable in sending socket  If returned, message is truncated and equipped with an error code Unicast, Anycast or Multicast  Depends on indicated address type DATAGRAM MODE Server Process bind(type = 42, instance = 2,, scope = cluster) bind(type = 42, instance = 1, scope = cluster) Server Process Client Process sendto(type = 42, instance = 2, node = 0)
  • 8. CONNECTION MODE Established by using service address  One-way setup (a.k.a. “0-RTT”) using data-carrying messages  Traditional TCP-style setup/shutdown also available Stream- or message oriented  End-to-end flow control for buffer overflow protection  No socket level sequence numbers, acknowledges or retransmissions  Link layer takes care of that Connection breaks immediately if peer becomes unavailable  Leverages link level heartbeats and kernel/socket cleanup functionality  No socket level “keepalive” heartbeats needed Node Node Socket Process Socket Process Socket Process Socket Process
  • 9. Communication Groups - brokerless bus instances  User instantiated  Same addressing properties (service addressing) as datagram mode  Different traffic properties, - no dropped or rejected messages  Four different message distribution methods  Delivery and sequence order guaranteed, even between different distribution methods  Leveraging L2 broadcast / UDP multicast when possible and deemed favorable End-to-end flow control  Messages never dropped because of destination buffer overflow  Same mechanism covers all distribution methods  Point-to-multipoint, - “sliding window” algorithm  Multipoint-to-point, - “coordinated sliding window” MESSAGE BUS MODE Available from Linux 4.14
  • 10. Members are sockets  Groups are closed, - members can only exchange messages with other sockets in same group  Each socket has two addresses: a <port:node> tuple bound by the system and a <group:member> tuple bound by the user  <group:member> is a tipc service address, i.e., the same as <type:instance>  Member sockets may optionally deliver join/leave events for other members in the group  Membership events are just empty messages delivered along with the source member’s two addresses  The TIPC binding table serves as registry and distribution channel for member identities and events join(<group:member>) TIPC Distributed Binding Table recvmsg(OOB, <group:member>, <port:node>); leave() TIPC Distributed Binding Table recvmsg(OOB|EOR, <group:member>, <port:node); recvmsg(OOB|EOR, <group:member>, <port:node>); TIPC Distributed Binding Table GROUP MEMBERSHIP
  • 12. Users can subscribe for contents of the global address binding table  Receive events at each change matching the range in the subscription There is a match when  Bound/unbound instance or range overlaps with range subscribed for Received events contain the bound socket’s service address and socket address SERVICE TRACKING Node #9a6004c1 Node #1a6b7ce0 Node #c1f10e72 port=123456 port=98765 port=763456 Server Process bind(type = 42, lower = 2, upper = 20, scope = cluster) Server Process Client Process subscribe(type = 42, lower = 0, upper = 10) bind(type = 42, instance = 2, scope = cluster)
  • 13. Special case of service tracking  Using same mechanism, - based on service binding table contents  Represented by the built-in service type zero (== “node availability”)  It is also possible to subscribe for availability of individual links CLUSTER TOPOLOGY TRACKING Node #9a6004c1 Node #1a6b7ce0 Node #c1f10e72 Client Process subscribe(type = 0, lower = 0, upper = ~0)
  • 14. NODE TO NODE LINKS “L2.5” reliable link layer  Guarantees delivery and sequentiality for all packets  Acts as trunk for multiple connections, and keeps track of those  Keeps track of peer node’s address bindings in local replica of the binding table Supervised by heartbeats at low traffic  Failure detection tolerance configurable from 50 ms to 10 s, - default 1.5 s  “Lost service address” events issued for bindings from peer node at lost contact  Breaks all connections to peer node at lost contact Several links per node pair  Load sharing or active-standby, - but max two active  Disturbance-free failover to remaining link, if any Node Node Socket Process Socket Process Socket Process Socket Process Socket Process Socket Process
  • 15. NEIGHBOR DISCOVERY Nodes have a 128 bit node identity  By default assigned by system (from Linux 4.16)  Can also be set by user, e.g. a host name or a UUID  The identity is internally hashed into a guaranteed unique 32 bit node address  This is the node address used by the protocol Clusters have a 32 bit cluster identity  Can be assigned by user if anything different from default value is needed  All nodes using the same cluster identity will establish mutual links  One link per interface, maximum two active links per node pair Cluster identity determines network  Neighbor discovery by UDP multicast or L2 broadcast  If no broadcast/multicast support, discovery can be performed by explicitly configured IP addresses <1.1.3> Cluster id: 4711 Node id: goethe Node #: 2f1c0ab4 Cluster id: 4711 Node id: schiller Node #: 78fca34 Cluster id: 4711 Node id: heine Node #: 8cfba40 Cluster id: 4711 Node id: brandes Node #: c7f413cb Cluster id: 4711 Node id: ibsen Node #: f5430cba Cluster id: 110956 Node id: 95719650-3c19- 11e8-b467-0ed5f89f718b Node #: 8fa4ab00 Cluster id: 110956 Node id: 6c5719a38-38a6- 33b8-b467-0ed5f89f718b Node #: 97df4a1b Cluster id: 110956 Node id: 48719650-ba63- 12c8-b467-0ed5f89f77f2 Node #: 6f774bc4 Cluster id: 110956 Node id: 83719650-4c7b- 14b8-b467-0ed5f89f717a Node #: 016a3f02
  • 16. ➢ Sort all cluster nodes into a circular list ▪ All nodes use same algorithm and criteria ➢ Select next [√N] - 1 downstream nodes in the list as “local domain” to be actively monitored ▪ CPU load increases by ~√N ➢ Distribute a record describing the local domain to all other nodes in the cluster ➢ Select and monitor a set of “head” nodes outside the local domain so that no node is more than two active monitoring hops away ▪ There will be [√N] - 1 such nodes ▪ Guarantees failure discovery even at accidental network partitioning ➢ Each node now monitors 2 x (√N – 1) neighbors • 6 neighbors in a 16 node cluster • 56 neighbors in an 800 node cluster ➢ All nodes use this algorithm ➢ In total 2 x (√N - 1) x N actively monitored links • 96 links in a 16 node cluster • 44,800 links in an 800 node cluster + x N = (√N – 1) Local Domain Destinations (√N – 1) Remote “Head” Destinations 2 x (√N – 1) x N Actively Monitored Links SCALABILITY Overlapping Ring Monitoring Algorithm Since Linux 4.7, TIPC comes with a unique auto-adaptive hierarchical neighbor monitoring algorithm. This makes it possible to establish full-mesh clusters of 1000 nodes with a failure discovery time of 1.5 sec
  • 17. PERFORMANCE Latency times better than on TCP  ~33% faster than TCP inter-node  2 times faster than TCP intra-node for 64 byte messages  7 times faster than TCP intra-node for 64 kB messages  TIPC transmits socket-to-socket instead of via the loopback interface Throughput still somewhat lower than TCP  ~65-90 % of max TCP throughput inter-node  Seems to be environment dependent  But 25-30% better than TCP intra-node  We are working on this….
  • 18. Link ARCHITECTURE Socket Socket Socket Ethernet Infiniband Media Plugins VxLAN UDP Link Link Link Link Binding Table Topology Service Node Node Link Node C Library External: Carrier Media L2/Internal: Fragmentation/Bundling/ Retransmission/Congestion Control L3: Destination Lookup L4: Connection Handling, Flow Control Node Table User Land Python Socket L2/Internal: Link Aggregation/ Synchronization/Failover/ Neighbor Discovery/Supervision User App Go
  • 19. API Socket API  The original TIPC API TIPC C API  Simpler and more intuitive  Available as libtipc from the tipcutils package at SourceForge Python, Perl, Ruby, D, Go  But not yet for Java ZeroMQ  Not yet with full features More to come…
  • 20. WHEN TO USE TIPC TIPC does not replace IP based transport protocols  It is a complement to be used under certain conditions  It is an IPC TIPC may be a good option if you  Need a high performing, configuration free, brokerless, message bus  Want startup synchronization and service discovery for free  Have application components that need to keep continuous watch on each other  Need short latency times  Traffic is heavily intra node or intra subnet  Don’t want to bother with cluster configuration  Are inside a security perimeter  Or can use IPSec or MACSec
  • 21. WHO IS USING TIPC? Ericsson mobile and fix core network systems  IMS, PGW, SGW, HSS…  Routers/switches such as SSR, AXE  Hundreds of installed sites  Tens of thousands of nodes  Tens of millions of subscribers WindRiver  Mission critical system for Sikorsky Aircraft’s helicopters Cisco  onePK, IOS-XE Software, NX-OS Software Mirantis  OpenStack Nokia, Huawei and numerous other companies and institutions
  • 22. MORE INFORMATION TIPC home page http://tipc.sourceforge.net TIPC project page http://sourceforge.net/project/tipc TIPC Demo/Test/Utility programs http://sourceforge.net/project/tipc/files TIPC Communication Groups https://www.slideshare.net/JonMaloy/tipc-communication-groups TIPC Overlapping Ring Neighbor Monitoring https://www.youtube.com/watch?v=ni-iNJ-njPo TIPC protocol specification (somewhat dated) http://tipc.sourceforge.net/doc/draft-spec-tipc-10.html TIPC programmer’s guide (somewhat dated) http://tipc.sourceforge.net/doc/tipc_2.0_prog_guide.html