SlideShare uma empresa Scribd logo
1 de 28
Seminar Report

InfiniBand

CONTENTS

 Introduction
 Overview
 Design Considerations
 Architecture
 Layers
 Communication Stack
 Specification
 Comparison
 Benefits & Limitations
 Conclusion
 References

1
Seminar Report

InfiniBand

INTRODUCTION
Throughout the past decade of fast paced computer
development, the traditional Peripheral Component Interconnect
architecture has continued to be the dominant input/output
standard for most internal back-plane and external peripheral
connections. However, these days, the PCI bus, using shared-bus
approach is beginning to be noticeably lagging. Performance
limitations, poor bandwidth, and reliability issues are surfacing
within the higher market tiers, especially as the PCI bus is
quickly becoming an outdated technology

Is PCI No Longer Viable?
The Peripheral Component Interconnect architecture
currently provides the primary I/O functionality for nearly all
classes of computers and derivative accessory support products.
PCI is traditionally considered as the primary standard for I/O
compatibility with an economical balance of both price and
performance.
The PCI bus still proves viable for most desktop
computing situations. However, at the upper ranges of computing
products, PCI devices are introducing severe limitations to both
performance and reliability. Many desktop users recognize PCI as
the standard 32-bit, 33-MHz connector found within today’s
desktop computers. In reality, the PCI architecture extends
2
Seminar Report

InfiniBand

support for 64-bit connections with up to 66 MHz transfer rates
for higher-end workstations and servers. While this does yield
more impressive bandwidth, the base concerns regarding
reliability and quality of service are still present. The popularity
of expansive network implementation (WAN and even the
Internet) requires the ability of the underlying interconnect
architecture to offer more than an implied 99.9+% uptime while
retaining high performance characteristics. PCI simply does not
offer these features.
Next generation PCI-X specifications appear more
promising on paper, but high-end application of these
technologies could prove both expensive and inefficient. The 133
MHz PCI-X standard offers more available bandwidth, but limits
one device per controller terminating connection. In-order to
support the large number of devices required by upper-tier
computing systems, an internal PCI-C controller network would
require several channel controllers per end connection. Even with
this in mind, while these future PCI technologies offers sustained
real-world bandwidth up to 1 Gigabit per second, they still lack
the advanced integrity checking and correction features needed
for maximum system reliability.

Solution : InfiniBand
InfiniBand is an architecture and specification for data
flow between processors and I/O devices that promises greater
bandwidth and almost unlimited expandability. InfiniBand is
expected to gradually replace the existing Peripheral Component
3
Seminar Report

InfiniBand

Interconnect (PCI). Offering throughput of up to 2.5 gigabytes
per second and support for up to 64,000 addressable devices, the
architecture also promises increased reliability, better sharing of
data between clustered processors, and built-in security.The
InfiniBand Architecture spec was released by the InfiniBand
Trade Association in October of 2000. It was based on the mating
of two separate competing I/O initiatives: Intel, Dell, Hitachi, and
Sun Microsystem's Next Generation I/O and Compaq, HP, IBM
and 3Com's Future I/O.

OVERVIEW
InfiniBand is a high-speed, serial, channel-based,
switched-fabric, message-passing architecture that can have
server, fibre channel, SCSI RAID, router, and other end nodes,
each with its own dedicated fat pipe. Each node can talk to any
other node in a many-to-many configuration. Redundant paths
can be set up through an InfiniBand fabric for fault tolerance, and
InfiniBand routers can connect multiple subnets.
Figure, below, shows the simplest configuration of an
InfiniBand installation, where two or more nodes are connected
to one another through the fabric. A node represents either a host
device such as a server or an I/O device such as a RAID
subsystem. The fabric itself may consist of a single switch in the
simplest case or a collection of interconnected switches and
routers. Each connection between nodes, switches, and routers is
a point-to-point, serial connection.

4
Seminar Report

InfiniBand

Figure. InfiniBand Basic Fabric Topology
InfiniBand has a uni-directional 2.5Gbps (250MB/sec
using 10 bits per data byte encoding, called 8B/10B similar to
3GIO) wire speed connection, and uses either one differential
signal pair per direction, called 1X, or 4 (4X), or 12 (12X) for
bandwidth up to 30Gbps per direction (12 x 2.5Gbps). Bidirectional throughput with InfiniBand is often expressed in
MB/sec, yielding 500 MB/sec for 1X, 2GB/sec for 4X, and
6GB/sec for 12X, respectively.
Each bi-directional 1X connections consists of four
wires, two for send and two for receive. Both fiber and copper are
supported: Copper can be in the form of traces or cables, and
fiber distances between nodes can be as far as 300 meters or
more. Each InfiniBand subnet can host up to 64,000 nodes.

5
Seminar Report

InfiniBand

DESIGN CONSIDERATIONS
Large scale networks such as corporate or enterprise
area networks and even the Internet demand a maximum in both
performance and quality of service. The primary driving force
behind InfiniBand is the concepts of quality of service (QoS) and
reliability, availability, and serviceability (RAS). Computer
systems for mission critical scenarios such as e-commerce,
financial

institutes,

data

base

applications,

and

telecommunications require not only high-performance, but
stability and virtually 24/7 (hours/days) uptime.
IBA is envisioned as a replacement not only for the
current internal PCI, but the external network layer as well.
InfiniBand signaling technology offers not only PCB level
communication, but can also traverse across currently available
copper and fiber optic communication mediums. The overall
approach with IBA is to replace the myriad of internal and
external bus specifications with a single unified architecture.

6
Seminar Report

InfiniBand

The design considerations are:
• Transport level connections delivered in hardware
• Reliability
• Scalability
• Fault tolerance
• Hot swappability
• Quality of service (QoS)
• Clustering capabilities (RDMA and message passing)
• Multiple generations of compatibility to 10Gb/s
• I/O connectivity
• Multi-Protocol capabilities
• Inherent Redundancy
• Failover and active stand-by
• In-Band Management
• Interconnect manageability
• Error detection
• Software Backwards Compatibility

ARCHITECTURE
InfiniBand breaks through the bandwidth and fanout
limitations of the PCI bus by migrating from the traditional
shared bus architecture into a switched fabric architecture. Instead
of the traditional memory mapped I/O interface bus, the
InfiniBand architecture is developed with a switched serial fabric
approach. This switched nature allows for low-latency, high
bandwidth characteristics of the IBA architecture. Clustered

7
Seminar Report

InfiniBand

systems and networks require a connectivity standard that allows
for fault tolerant interconnects. A traditional shared bus lacks the
ability to perform advanced fault detection and correction, and
PCI is designed more for performance than reliability. As noted
earlier, the I/O architecture must also provide for scalability while
maintaining optimum connectivity. IBA is an attempt to satisfy
each of these pressing concerns, but break through the limitations
present with the shared bus design.
The negative effect produced by such technology is the
massive pin count required for proper device operation. A
traditional 64-bit PCI slot requires 90 pins for operation, thus
limiting the motherboard’s layout considerations and increasing
the requirements for more of the board’s printed surface area.
When designing a more complex motherboard, the slots can span
long

distances

from

the

central

PCI

controller.

When

implementing multiple devices in this type of configuration, the
ability to determine proper signal timing and termination is
severely impeded. The final result is often a potential degradation
in performance and stability. A further detracting issue is that to
allow for inter-system communications, a network layer must be
utilized, such as Ethernet (LAN/WAN) or Fibre Channel
(network storage).

8
Seminar Report

InfiniBand

Fabric Versus Shared Bus
Feature

Fabric

Bus

Topology

Switched

Shared

Pin Count

Low

High

Many

Limited

Kilometers

Inches

Excellent

Moderate

Highly Scalable

Yes

No

Fault Tolerant

Yes

No

Number

of

End

Points
Max Signal Length
Reliability

The switched fabric architecture of InfiniBand is designed
around a completely different approach as compared to the
limited capabilities of the shared bus. IBA specifies a point-topoint (PTP) communication protocol for primary connectivity.
Being based upon PTP, each link along the fabric terminates at
one connection point (or device). The actual underlying transport
addressing standard is derived from the impressive IP method
employed by advanced networks. Each InfiniBand device is
assigned a specific IP address, thus the load management and
signal termination characteristics are clearly defined and more
efficient. To add more TCA connection points or end nodes, the
simple addition of a dedicated IBA Switch is required. Unlike the
shared bus, each TCA and IBA Switch can be interconnected via
multiple data paths in order to sustain maximum aggregate device
bandwidth and provide fault tolerance by way of multiple
redundant connections.

9
Seminar Report

InfiniBand

The main components in the InfiniBand architecture are:
• Channel adapters
• Switches
• Routers
The InfiniBand specification classifies the channel
adapters into two categories: Host Channel Adapters (HCA) and
Target Channel Adapters (TCA). HCAs are present in servers or
even desktop machines and provide an interface that is used to
integrate the InfiniBand with the operating system. TCAs are
present on I/O devices such as a RAID subsystem or a JBOD
subsystem. Host and Target Channel adapters present an interface
to the layers above them that allow those layers to generate and
consume packets. In the case of a server writing a file to a storage
device, the host is generating the packets that are then consumed
by the storage device.
Each channel adapter may have one or more ports. A
channel adapter with more than one port, may be connected to
multiple switch ports. This allows for multiple paths between a
source and a destination, resulting in performance and reliability
benefits. By having multiple paths available in getting the data
from one node to another, the fabric is able to achieve transfer
rates at the full capacity of the channel, avoiding congestion
issues that arise in the shared bus architecture. Furthermore,
having alternative paths results in increased reliability and

10
Seminar Report

InfiniBand

availability since another path is available for routing of the data
in the case of failure of one of the links.
Two more unique features of the InfiniBand Architecture
are the ability to share storage devices across multiple servers and
the ability to perform third-party I/O. Third-party I/O is a term
used to refer to the ability of two storage devices to complete an
I/O transaction without the direct involvement of a host other
than in setting up the operation. This feature is extremely
important from the performance perspective since many such I/O
operations between two storage devices can be totally off-loaded
from the server, thereby eliminating the unnecessary CPU
utilization that would otherwise be consumed.
Switches simply forward packets between two of their
ports based on the established routing table and the addressing
information stored on the packets. A collection of end nodes
connected to one another through one or more switches form a
subnet. Each subnet must have at least one Subnet Manager that
is responsible for the configuration and management of the
subnet.

11
Seminar Report

InfiniBand

Figure :System Area Network based on the InfiniBand
Architecture
Routers are like switches in the respect that they simply
forward packets between their ports. The difference between the
routers and the switches, however, is that a router is used to
interconnect two or more subnets to form a larger multi-domain
system area network. Within a subnet, each port is assigned a
unique identifier by the subnet manager called the Local ID or
LID. In addition to the LID, each port is assigned a globally
unique identifier called the GID. Switches make use of the LIDs
for routing packets from the source to the destination, whereas
Routers make use of the GIDs for routing packets across
domains. One more feature of the InfiniBand Architecture that is
not available in the current shared bus I/O architecture is the
ability to partition the ports within the fabric that can
communicate with one another. This is useful for partitioning the
available storage across one or more servers for management
reasons and/or for security reasons.

12
Seminar Report

InfiniBand

LAYERS
InfiniBand is specified as a layered architecture with
physical, link, network, transport, and a number of upper layer
protocols that can be thought of as an application layer. The link
and transport layers define, switch, and deliver InfiniBand
packets and handle load balancing and quality-of-service
functions. The network layer gets involved if data is exchanged
among subnets, handling the movement of packets among
routers. The physical layer defines the hardware components.

The first area of examination is the physical layer of the
IBA specification. IBA allows for multiple connect paths scaling
up to 30 Gigabit per second in performance. Since the full duplex
serial communications nature of IBA specifies a requirement of
only four wires, a high-speed 12x implementation requires only a
moderate 48 wires (or pathways Other specifications of the IBA
physical layer include provisions for custom back plane I/O
connectors and hot swapping capabilities.

13
Seminar Report

InfiniBand

The link and transport layers of InfiniBand are perhaps the
most important aspect of this new interconnect standard. At the
packet communications level, two specific packet types for data
transfer and network management are specified. The management
packets provide operational control over device enumeration,
subnet directing, and fault tolerance. Data packets transfer the
actual information with each packet deploying a maximum of
four kilobytes of transaction information. Within each specific
device subnet the packet direction and switching properties are
directed via a Subnet Manager with 16 bit local identification
addresses.
The link layer also allows for the Quality of Service
characteristics of InfiniBand. The primary QoS consideration is
the

usage

of

the

Virtual

Lane

(VL)

architecture

for

interconnectivity. Even though a single IBA data path may be
defined at the hardware level, the VL approach allows for 16
logical links. With 15 independent levels (VL0-14) and one
management path (VL15) available, the ability to configure
device specific prioritization is available. Since management
requires the most priority, VL15 retains the maximum priority.
The ability to assert a priority driven architecture lends not only
to Quality of Service, but performance as well.
To further assure reliability and performance, InfiniBand
offers a credit-based flow control management routine and two
phases for data integrity checking. With the credit approach, each
receiving node the IBA network forwards the originating device

14
Seminar Report

InfiniBand

with a value representing the maximum amount of efficiently
transferable data with no packet loss. The credit data is
transferred via a dedicated link along the IBA fabric. No packets
are sent until the sending device is signaled that data transfer is
possible via the receiving devices primary communications
buffer. To insure data integrity, two CRC values are forwarded
within each network packet. For link level checking, a 16-bit
variant CRC value is assigned for each data field and is recalculated at each IBA fabric hop. A 32-bit invariant CRC value
is also assigned, but it only provides checking for static data that
does not change between each IBA hop point for end-to-end
integrity checking.
InfiniBand’s network layer provides for the routing of
packets from one subnet to another. Each routed packet features a
Global Route Header (GRH) and a 128-bit IP address for both
source and destination nodes. The network layer also embeds a
standard 64-bit unique global identifier for each device along all
subnets. By the intricate switching of these identifier values, the
IBA network allows for data transversal across multiple logical
subnets.
The last layer of the InfiniBand architecture is responsible
for the actual transporting of packets. The transport layer controls
several

key

aspects

including

packet

delivery,

channel

multiplexing, and base transport servicing. Basic network fabric
characteristics, such as Maximum Transfer Unit (MTU) size and

15
Seminar Report

InfiniBand

Base Transport Header (BTH) direction, are also directly handled
at this underlying transport layer.

COMMUNICATION STACK
In order to achieve better performance and scalability at a
lower cost, system architects have come up with the concept of
clustering, where two or more servers are connected together to
form a single logical server. In order to achieve the most benefit
from the clustering of multiple servers, the protocol used for
communication between the physical servers must provide high
bandwidth and low latency. Unfortunately, full-fledged network
protocols such as TCP/IP, in order to achieve good performance
across both LANs and WANs, have become so complex that both
incur considerable latency and require many thousands of lines of
code for their implementation.
To overcome these issues came up the Virtual Interface (VI)
Architecture (VIA) specification. The VI Architecture is a server
messaging protocol whose focus is to provide a very low latency
link between the communicating servers. In transferring a block
of data from one server to another, latency arises in the form of
overhead and delays that are added to the time needed to transfer
the actual data. If we were to break down the latency into its
components, the major contributors would be: a) the overhead of
executing network protocol code within the operating system, b)
context switches to move in and out of kernel mode to receive
16
Seminar Report

InfiniBand

and send out the data, and c) excessive copying of data between
the user level buffers and the NIC memory.
Since

VIA

was

only

intended

to

be

used

for

communication across the physical servers of a cluster (in other
words across high-bandwidth links with very high reliability), the
specification can eliminate much of the standard network
protocol code that deals with special cases. Also, because of the
well-defined environment of operation, the message exchange
protocol was defined to avoid kernel mode interaction and allow
for access to the NIC from user mode.
Finally, because of the direct access to the NIC,
unnecessary copying of the data into kernel buffers was also
eliminated since the user is able to directly transfer data from
user-space to the NIC. In addition to the standard send/receive
operations that are typically available in a networking library, the
VIA provides Remote Direct Memory Access operations where
the initiator of the operation specifies both the source and
destination of a data transfer, resulting in zero-copy data transfers
with minimum involvement of the CPUs.
InfiniBand uses basically the VIA primitives for its
operation at the transport layer. In order for an application to
communicate with another application over the InfiniBand it must
first create a work queue that consists of a queue pair (QP), each
of which consists of a Send Queue and Receive Queue. In order
for the application to execute an operation, it must place a work
queue element (WQE) in the work queue. From there the
17
Seminar Report

InfiniBand

operation is picked-up for execution by the channel adapter.
Therefore, the Work Queue forms the communications medium
between applications and the channel adapter, relieving the
operating system from having to deal with this responsibility.

Figure . InfiniBand Communication Stack
Each process may create one or more QPs for
communications purposes with another application. Since both
the protocol and the structures are all very clearly defined, queue
pairs can implemented in hardware, thereby off-loading most of
the work from the CPU.
At the target end, the Receive Queue of the target WQE
allocates the data to a location. Then, the WQE's close and QP's
shut down. Packet headers identify the data as well as its source,
destination, and priority for QOS applications, and verify its
integrity. Packets are sent to either a single target or multicast to
multiple targets. Once a WQE has been processed properly, a
completion queue element (CQE) is created and placed in the
completion queue. The advantage of using the completion queue
18
Seminar Report

InfiniBand

for notifying the caller of completed WQEs is because it reduces
the interrupts that would be otherwise generated.
The list of operations supported by the InfiniBand
architecture at the transport level for Send Queues are as follows:
Send/Receive: supports the typical send/receive operation where
one node submits a message and another node receives that
message. One difference between the implementation of the
send/receive operation under the InfiniBand architecture and
traditional networking protocols is that the InfiniBand defines the
send/receive operations as operating against queue pairs.
RDMA-Write: this operation permits one node to write data
directly into a memory buffer on a remote node. The remote node
must of course have given appropriate access privileges to the
node ahead of time and must have memory buffers already
registered for remote access.
RDMA-Read: this operation permits one node to read data
directly from the memory buffer of a remote node. The remote
node must of course have given appropriate access privileges to
the node ahead of time.
RDMA Atomics: this operation name actually refers to two
different operations. The Compare & Swap operation allows a
node to read a memory location and if its value is equal to a
specified value, then a new value is written in that memory
location. The Fetch Add atomic operation reads a value and

19
Seminar Report

InfiniBand

returns it to the caller and then add a specified number to that
value and saves it back at the same address.
For Receive Queue the only type of operation is:
Post Receive Buffer: identifies a buffer into which a client may
send to or receive data from through a Send, RDMA-Write,
RDMA-Read operation.
When a QP is created, the caller may associate with the QP
one of five different transport service types. A process may create
and use more than one QP, each of a different transport service
type. The InfiniBand transport service types are:
•

Reliable Connection (RC): reliable transfer of data between
two entities.

•

Unreliable Connection (UC): unreliable transfer of data
between two entities. Like RC there are only two entities
involved in the data transfer but message may be lost.

•

Reliable Datagram (RD): the QP can send and receive
messages from one or more QPs using a reliable datagram
channel (RDC) between each pair of reliable datagram
domains (RDDs).

•

Unreliable Datagram (UD): the QP can send and receive
messages from one or more QPs however the messages
may get lost.

•

Raw Datagram: the raw datagram is a data link layer
service which provides the QP with the ability to send and
receive raw datagram messages that are not interpreted
20
Seminar Report

InfiniBand

SPECIFICATIONS
InfiniBand Specifications
Layered Protocol
Multiple Layer Connectivity
Packet-based Communications
Multicast Capable
Packet and End Node Fault Tolerance
Subnet Management Capability
Varied Link Speeds – 1x, 4x, 12x
2.5 – 30 Gigabit per Second Transfers
PCB, Copper, and Fiber Mediums
Support for Remote DMA
Quality of Service Considerations

21
Seminar Report

InfiniBand

22
Seminar Report

InfiniBand

COMPARISON
Feature
Bandwidth
(Gbit/sec)
Pin Count

InfiniBand

4,16,48
4/16/48

PCI-X

Fibre Channel

8.51

4

Gigabit
Ethernet
1

90

4

4

Centimeters

Kilometers

Kilometers

Copper,

Copper,

Fiber

Fiber

Max
Signal

Kilometers

Length
Transport
Medium

PCB,
Copper,

PCB

Fiber

Simultaneous
Peer-to-Peer

15 Virtual Lanes

Supported

Connections
Native Virtual
Interface
Multicast
Support
End-to-End
Flow Control
Memory
Partitioning
QoS
Reliable
Connectivity
Scalable
Connectivity

Supported

Supported

Supported

Supported

Payload

Supported

Supported

Supported

Supported

4 KB

Supported

Supported

Supported

Maximum
Packet

Supported

Supported

Bus,
No Packets

23

Supported

2 KB

9 KB
Seminar Report

InfiniBand

Several current and near future interconnect standards are
competing for mass-market acceptance. Competitive alternatives
to InfiniBand include PCI-X, Gigabit Ethernet, Fibre Channel,
and Rapid I/O. PCI-X will likely represent the next generation in
back plane connectivity for the majority of computer users.
Even though certain performance deficiencies are better
negotiated by PCI-X, the shared bus nature of this technology
limits scalability and reliability. PCI-X requires a separate
controller for each 133 MHz device. However, IBA allows for
64,000 nodes per subnet through the employing of multiple port
density IBA Switches.
As compared to Gigabit Ethernet technology, IBA offers
several significant advantages, especially concerning the physical
layer device (PHY) model. InfiniBand specifies a minimal .25watt per port for operation over copper up to a maximum of
sixteen meters.
In contrast, Gigabit Ethernet defines a minimum of 2-watts
per port as this type of connection is best utilized at distances of
100 meters or more. The low power pathways of IBA allows for
more efficient hub designs with lower amperage demands.

24
Seminar Report

InfiniBand

BENEFITS & LIMITATIONS


BENEFITS
 Offload communication processing from OS & CPU
InfiniBand channel adapters have their own intelligence, they
offload lots of communications processing from the operating
system and CPU.
 Scalability: The switched nature of InfiniBand delivers tons of
scalability; it can support ever more numbers of simultaneous
collision-free connections with virtually no increase in
latency.
 Many-to many communication: InfiniBand's many-to-many
communications means that anything you add to the network,
be it a CPU, fibre channel SAN, or a simple SCSI drive, can
share its resources with any other node.
 Reliability: Reliability is assured as data can take many paths
across the fabric.
 Easier fault tolerence & trouble shooting: Fault isolation and
troubleshooting are easier in a switched environment since it's
easier to isolate problems to a single connection.
 Support clustering: InfiniBand will help clustering take off as
it will provide a standard high-speed, low-latency interconnect
that can be used across tens or hundreds of servers.
25
Seminar Report



InfiniBand

LIMITAIONS
• Long & difficult: The process will be long and difficult.
• Interoperating problems: As usual, vendors are promising
interoperability, and compatibility with existing hardware and
software. This always sounds great, but the reality is often one
of longer timetables, vendors adding their own proprietary
functionality so that interoperability becomes a problem.
• Software

implementation

implementations

typically

take
take

longer:

longer

than

Software
originally

anticipated.
• Management features problematic
• Initial

unanticipated

performance

problems:

Initial

unanticipated performance problems, and numerous bugs that
have to be ironed out.

CONCLUSION
Mellanox and related companies are now positioned to
release InfiniBand as a multifaceted architecture within several
market segments. The most notable application area is enterprise
class network clusters and Internet data centers. These types of
applications require extreme performance with the maximum in
fault tolerance and reliability. Other computing system uses
include Internet service providers, co-location hosting, and large
26
Seminar Report

InfiniBand

corporate networks. IBA will also see marketing in the embedded
device arena for high-speed routers, switches, and massive
telecommunications devices.
At least for the initial introduction, InfiniBand is positioned
as a complimentary architecture. IBA will move through a
transitional period where future PCI, IBA, and other interconnect
standards can be offered within the same system or network. The
understanding of PCI limitations (even PCI-X) should allow
InfiniBand to be an aggressive market contender as higher-class
systems move towards the conversion to IBA devices .
Currently, Mellanox is developing the IBA software
interface standard using Linux as their internal OS choice. The
open source nature of Linux allows for easy code modification.
At release, InfiniBand will also be supported by Microsoft
Windows and the future Windows Whistler. Expect Unix support
to be available as well, considering the relative ease of porting
Linux-based drivers.
Another key concern is the costs of implementing
InfiniBand at the consumer level. Industry sources are currently
projecting IBA prices to fall somewhere between the currently
available Gigabit Ethernet and Fibre Channel technologies.
Assuming these estimates prove accurate, InfiniBand could be
positioned as the dominant I/O connectivity architecture at all
upper tier levels. These same sources also offered current rumors
that InfiniBand is being researched at the vendor level as
possibility for a future I/O standard in mid-range PC desktops and
27
Seminar Report

InfiniBand

workstations. Considering the competitive price point of IBA,
this projection offers ominous potential at all market levels. This
is definitely a technology to watch.

REFERENCES
 William T. Futral, InfiniBand Architecture: Development and
Deployment. A Strategic Guide to Server I/O Solutions, Intel
Press, 2001).
 An Introduction to the InfiniBand Architecture by Odysseas
Pentakalos
 InfiniBand: What's Next? By Jeffrey Burt
 Mellanox Technologies, "Understanding PCI Bus, 3GIO, and
InfiniBand Architecture
 Infiniband in blade servers By Scott Tyler Shafer
 Compaq Computer Corporation, "PCI-X: An Evolution of the
PCI Bus," September 1999
 http://www.extremetech.com/
 http://searchstorage.techtarget.com/
 http://isp.webopedia.com/

28

Mais conteúdo relacionado

Mais procurados

Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...
Videoguy
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
IOSR Journals
 
The New Compact, Feature Rich Pbx
The New Compact, Feature Rich PbxThe New Compact, Feature Rich Pbx
The New Compact, Feature Rich Pbx
Alphonze Omondi
 
Wimax Whitepaper
Wimax  WhitepaperWimax  Whitepaper
Wimax Whitepaper
Amal Shah
 
Nueva presentacion cables y conectores
Nueva presentacion cables y conectoresNueva presentacion cables y conectores
Nueva presentacion cables y conectores
elyoarabia
 
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
FPGA Central
 

Mais procurados (18)

Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 
IBM System Storage SAN768B-2 and SAN384B-2
IBM System Storage SAN768B-2 and SAN384B-2IBM System Storage SAN768B-2 and SAN384B-2
IBM System Storage SAN768B-2 and SAN384B-2
 
A COMPARISON OF FOUR SERIES OF CISCO NETWORK PROCESSORS
A COMPARISON OF FOUR SERIES OF CISCO NETWORK PROCESSORSA COMPARISON OF FOUR SERIES OF CISCO NETWORK PROCESSORS
A COMPARISON OF FOUR SERIES OF CISCO NETWORK PROCESSORS
 
Fiber optics in-buildings infrastructure paper - OEA- Lebanon
Fiber optics in-buildings infrastructure paper - OEA- LebanonFiber optics in-buildings infrastructure paper - OEA- Lebanon
Fiber optics in-buildings infrastructure paper - OEA- Lebanon
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
 
Storage Performance Takes Off
Storage Performance Takes OffStorage Performance Takes Off
Storage Performance Takes Off
 
The New Compact, Feature Rich Pbx
The New Compact, Feature Rich PbxThe New Compact, Feature Rich Pbx
The New Compact, Feature Rich Pbx
 
Resume
ResumeResume
Resume
 
2002023
20020232002023
2002023
 
wp233
wp233wp233
wp233
 
Wimax Whitepaper
Wimax  WhitepaperWimax  Whitepaper
Wimax Whitepaper
 
Pub00138 r2 cip_adv_tech_series_ethernetip
Pub00138 r2 cip_adv_tech_series_ethernetipPub00138 r2 cip_adv_tech_series_ethernetip
Pub00138 r2 cip_adv_tech_series_ethernetip
 
5. profinet network design andy gilbert
5. profinet network design   andy gilbert5. profinet network design   andy gilbert
5. profinet network design andy gilbert
 
latencyin fiber optic networks
latencyin fiber optic networkslatencyin fiber optic networks
latencyin fiber optic networks
 
Nueva presentacion cables y conectores
Nueva presentacion cables y conectoresNueva presentacion cables y conectores
Nueva presentacion cables y conectores
 
Pag 1002
Pag 1002Pag 1002
Pag 1002
 
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
 

Destaque

Memristor report
Memristor reportMemristor report
Memristor report
Akash Garg
 
Sensors on 3 d digitization seminar report
Sensors on 3 d digitization seminar reportSensors on 3 d digitization seminar report
Sensors on 3 d digitization seminar report
Vishnu Prasad
 

Destaque (10)

Memristor report
Memristor reportMemristor report
Memristor report
 
SEMINAR REPORT ON GSM ARCHITECTURE
SEMINAR REPORT ON GSM ARCHITECTURESEMINAR REPORT ON GSM ARCHITECTURE
SEMINAR REPORT ON GSM ARCHITECTURE
 
Chitti%2027
Chitti%2027Chitti%2027
Chitti%2027
 
Technical Report On Barcode
Technical Report On BarcodeTechnical Report On Barcode
Technical Report On Barcode
 
Spintronics Report
Spintronics  ReportSpintronics  Report
Spintronics Report
 
The eye-gaze-communication-system-1.doc(updated)
The eye-gaze-communication-system-1.doc(updated)The eye-gaze-communication-system-1.doc(updated)
The eye-gaze-communication-system-1.doc(updated)
 
Seminar report
Seminar reportSeminar report
Seminar report
 
Sensors on 3 d digitization seminar report
Sensors on 3 d digitization seminar reportSensors on 3 d digitization seminar report
Sensors on 3 d digitization seminar report
 
Transparent electronics
Transparent electronicsTransparent electronics
Transparent electronics
 
Bhanuprakash123
Bhanuprakash123Bhanuprakash123
Bhanuprakash123
 

Semelhante a Infini band

High Performance Communication for Oracle using InfiniBand
High Performance Communication for Oracle using InfiniBandHigh Performance Communication for Oracle using InfiniBand
High Performance Communication for Oracle using InfiniBand
webhostingguy
 
PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card Design
Mohamad Tisani
 
InfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must KnowInfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must Know
Mellanox Technologies
 
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
Đồng Quốc Vương
 

Semelhante a Infini band (20)

InfiniBand
InfiniBandInfiniBand
InfiniBand
 
Infiband
InfibandInfiband
Infiband
 
iSCSi , FCSAN ,FC protocol stack and Infini band
iSCSi , FCSAN ,FC protocol stack and Infini bandiSCSi , FCSAN ,FC protocol stack and Infini band
iSCSi , FCSAN ,FC protocol stack and Infini band
 
The Next Frontier in AI Networking.pdf
The Next Frontier in AI Networking.pdfThe Next Frontier in AI Networking.pdf
The Next Frontier in AI Networking.pdf
 
Sunoltech
SunoltechSunoltech
Sunoltech
 
Seminar on Infiniband
Seminar on Infiniband Seminar on Infiniband
Seminar on Infiniband
 
Infini band presentation_shekhar
Infini band presentation_shekharInfini band presentation_shekhar
Infini band presentation_shekhar
 
Infini Band
Infini BandInfini Band
Infini Band
 
InfiniBand Presentation
InfiniBand PresentationInfiniBand Presentation
InfiniBand Presentation
 
High Performance Communication for Oracle using InfiniBand
High Performance Communication for Oracle using InfiniBandHigh Performance Communication for Oracle using InfiniBand
High Performance Communication for Oracle using InfiniBand
 
Networking hardware (2)
Networking hardware (2)Networking hardware (2)
Networking hardware (2)
 
PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card Design
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmap
 
A Comparison of Four Series of CISCO Network Processors
A Comparison of Four Series of CISCO Network ProcessorsA Comparison of Four Series of CISCO Network Processors
A Comparison of Four Series of CISCO Network Processors
 
A Comparison of Four Series of CISCO Network Processors
A Comparison of Four Series of CISCO Network ProcessorsA Comparison of Four Series of CISCO Network Processors
A Comparison of Four Series of CISCO Network Processors
 
InfiniBand in the Enterprise Data Center.pdf
InfiniBand in the Enterprise Data Center.pdfInfiniBand in the Enterprise Data Center.pdf
InfiniBand in the Enterprise Data Center.pdf
 
Thunderbolt seminar report
Thunderbolt seminar reportThunderbolt seminar report
Thunderbolt seminar report
 
InfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must KnowInfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must Know
 
Industry Brief: Tectonic Shift - HPC Networks Converge
Industry Brief: Tectonic Shift - HPC Networks ConvergeIndustry Brief: Tectonic Shift - HPC Networks Converge
Industry Brief: Tectonic Shift - HPC Networks Converge
 
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
Www ccnav5 net_ccna_1_chapter_4_v5_0_exam_answers_2014
 

Último

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Infini band

  • 1. Seminar Report InfiniBand CONTENTS  Introduction  Overview  Design Considerations  Architecture  Layers  Communication Stack  Specification  Comparison  Benefits & Limitations  Conclusion  References 1
  • 2. Seminar Report InfiniBand INTRODUCTION Throughout the past decade of fast paced computer development, the traditional Peripheral Component Interconnect architecture has continued to be the dominant input/output standard for most internal back-plane and external peripheral connections. However, these days, the PCI bus, using shared-bus approach is beginning to be noticeably lagging. Performance limitations, poor bandwidth, and reliability issues are surfacing within the higher market tiers, especially as the PCI bus is quickly becoming an outdated technology Is PCI No Longer Viable? The Peripheral Component Interconnect architecture currently provides the primary I/O functionality for nearly all classes of computers and derivative accessory support products. PCI is traditionally considered as the primary standard for I/O compatibility with an economical balance of both price and performance. The PCI bus still proves viable for most desktop computing situations. However, at the upper ranges of computing products, PCI devices are introducing severe limitations to both performance and reliability. Many desktop users recognize PCI as the standard 32-bit, 33-MHz connector found within today’s desktop computers. In reality, the PCI architecture extends 2
  • 3. Seminar Report InfiniBand support for 64-bit connections with up to 66 MHz transfer rates for higher-end workstations and servers. While this does yield more impressive bandwidth, the base concerns regarding reliability and quality of service are still present. The popularity of expansive network implementation (WAN and even the Internet) requires the ability of the underlying interconnect architecture to offer more than an implied 99.9+% uptime while retaining high performance characteristics. PCI simply does not offer these features. Next generation PCI-X specifications appear more promising on paper, but high-end application of these technologies could prove both expensive and inefficient. The 133 MHz PCI-X standard offers more available bandwidth, but limits one device per controller terminating connection. In-order to support the large number of devices required by upper-tier computing systems, an internal PCI-C controller network would require several channel controllers per end connection. Even with this in mind, while these future PCI technologies offers sustained real-world bandwidth up to 1 Gigabit per second, they still lack the advanced integrity checking and correction features needed for maximum system reliability. Solution : InfiniBand InfiniBand is an architecture and specification for data flow between processors and I/O devices that promises greater bandwidth and almost unlimited expandability. InfiniBand is expected to gradually replace the existing Peripheral Component 3
  • 4. Seminar Report InfiniBand Interconnect (PCI). Offering throughput of up to 2.5 gigabytes per second and support for up to 64,000 addressable devices, the architecture also promises increased reliability, better sharing of data between clustered processors, and built-in security.The InfiniBand Architecture spec was released by the InfiniBand Trade Association in October of 2000. It was based on the mating of two separate competing I/O initiatives: Intel, Dell, Hitachi, and Sun Microsystem's Next Generation I/O and Compaq, HP, IBM and 3Com's Future I/O. OVERVIEW InfiniBand is a high-speed, serial, channel-based, switched-fabric, message-passing architecture that can have server, fibre channel, SCSI RAID, router, and other end nodes, each with its own dedicated fat pipe. Each node can talk to any other node in a many-to-many configuration. Redundant paths can be set up through an InfiniBand fabric for fault tolerance, and InfiniBand routers can connect multiple subnets. Figure, below, shows the simplest configuration of an InfiniBand installation, where two or more nodes are connected to one another through the fabric. A node represents either a host device such as a server or an I/O device such as a RAID subsystem. The fabric itself may consist of a single switch in the simplest case or a collection of interconnected switches and routers. Each connection between nodes, switches, and routers is a point-to-point, serial connection. 4
  • 5. Seminar Report InfiniBand Figure. InfiniBand Basic Fabric Topology InfiniBand has a uni-directional 2.5Gbps (250MB/sec using 10 bits per data byte encoding, called 8B/10B similar to 3GIO) wire speed connection, and uses either one differential signal pair per direction, called 1X, or 4 (4X), or 12 (12X) for bandwidth up to 30Gbps per direction (12 x 2.5Gbps). Bidirectional throughput with InfiniBand is often expressed in MB/sec, yielding 500 MB/sec for 1X, 2GB/sec for 4X, and 6GB/sec for 12X, respectively. Each bi-directional 1X connections consists of four wires, two for send and two for receive. Both fiber and copper are supported: Copper can be in the form of traces or cables, and fiber distances between nodes can be as far as 300 meters or more. Each InfiniBand subnet can host up to 64,000 nodes. 5
  • 6. Seminar Report InfiniBand DESIGN CONSIDERATIONS Large scale networks such as corporate or enterprise area networks and even the Internet demand a maximum in both performance and quality of service. The primary driving force behind InfiniBand is the concepts of quality of service (QoS) and reliability, availability, and serviceability (RAS). Computer systems for mission critical scenarios such as e-commerce, financial institutes, data base applications, and telecommunications require not only high-performance, but stability and virtually 24/7 (hours/days) uptime. IBA is envisioned as a replacement not only for the current internal PCI, but the external network layer as well. InfiniBand signaling technology offers not only PCB level communication, but can also traverse across currently available copper and fiber optic communication mediums. The overall approach with IBA is to replace the myriad of internal and external bus specifications with a single unified architecture. 6
  • 7. Seminar Report InfiniBand The design considerations are: • Transport level connections delivered in hardware • Reliability • Scalability • Fault tolerance • Hot swappability • Quality of service (QoS) • Clustering capabilities (RDMA and message passing) • Multiple generations of compatibility to 10Gb/s • I/O connectivity • Multi-Protocol capabilities • Inherent Redundancy • Failover and active stand-by • In-Band Management • Interconnect manageability • Error detection • Software Backwards Compatibility ARCHITECTURE InfiniBand breaks through the bandwidth and fanout limitations of the PCI bus by migrating from the traditional shared bus architecture into a switched fabric architecture. Instead of the traditional memory mapped I/O interface bus, the InfiniBand architecture is developed with a switched serial fabric approach. This switched nature allows for low-latency, high bandwidth characteristics of the IBA architecture. Clustered 7
  • 8. Seminar Report InfiniBand systems and networks require a connectivity standard that allows for fault tolerant interconnects. A traditional shared bus lacks the ability to perform advanced fault detection and correction, and PCI is designed more for performance than reliability. As noted earlier, the I/O architecture must also provide for scalability while maintaining optimum connectivity. IBA is an attempt to satisfy each of these pressing concerns, but break through the limitations present with the shared bus design. The negative effect produced by such technology is the massive pin count required for proper device operation. A traditional 64-bit PCI slot requires 90 pins for operation, thus limiting the motherboard’s layout considerations and increasing the requirements for more of the board’s printed surface area. When designing a more complex motherboard, the slots can span long distances from the central PCI controller. When implementing multiple devices in this type of configuration, the ability to determine proper signal timing and termination is severely impeded. The final result is often a potential degradation in performance and stability. A further detracting issue is that to allow for inter-system communications, a network layer must be utilized, such as Ethernet (LAN/WAN) or Fibre Channel (network storage). 8
  • 9. Seminar Report InfiniBand Fabric Versus Shared Bus Feature Fabric Bus Topology Switched Shared Pin Count Low High Many Limited Kilometers Inches Excellent Moderate Highly Scalable Yes No Fault Tolerant Yes No Number of End Points Max Signal Length Reliability The switched fabric architecture of InfiniBand is designed around a completely different approach as compared to the limited capabilities of the shared bus. IBA specifies a point-topoint (PTP) communication protocol for primary connectivity. Being based upon PTP, each link along the fabric terminates at one connection point (or device). The actual underlying transport addressing standard is derived from the impressive IP method employed by advanced networks. Each InfiniBand device is assigned a specific IP address, thus the load management and signal termination characteristics are clearly defined and more efficient. To add more TCA connection points or end nodes, the simple addition of a dedicated IBA Switch is required. Unlike the shared bus, each TCA and IBA Switch can be interconnected via multiple data paths in order to sustain maximum aggregate device bandwidth and provide fault tolerance by way of multiple redundant connections. 9
  • 10. Seminar Report InfiniBand The main components in the InfiniBand architecture are: • Channel adapters • Switches • Routers The InfiniBand specification classifies the channel adapters into two categories: Host Channel Adapters (HCA) and Target Channel Adapters (TCA). HCAs are present in servers or even desktop machines and provide an interface that is used to integrate the InfiniBand with the operating system. TCAs are present on I/O devices such as a RAID subsystem or a JBOD subsystem. Host and Target Channel adapters present an interface to the layers above them that allow those layers to generate and consume packets. In the case of a server writing a file to a storage device, the host is generating the packets that are then consumed by the storage device. Each channel adapter may have one or more ports. A channel adapter with more than one port, may be connected to multiple switch ports. This allows for multiple paths between a source and a destination, resulting in performance and reliability benefits. By having multiple paths available in getting the data from one node to another, the fabric is able to achieve transfer rates at the full capacity of the channel, avoiding congestion issues that arise in the shared bus architecture. Furthermore, having alternative paths results in increased reliability and 10
  • 11. Seminar Report InfiniBand availability since another path is available for routing of the data in the case of failure of one of the links. Two more unique features of the InfiniBand Architecture are the ability to share storage devices across multiple servers and the ability to perform third-party I/O. Third-party I/O is a term used to refer to the ability of two storage devices to complete an I/O transaction without the direct involvement of a host other than in setting up the operation. This feature is extremely important from the performance perspective since many such I/O operations between two storage devices can be totally off-loaded from the server, thereby eliminating the unnecessary CPU utilization that would otherwise be consumed. Switches simply forward packets between two of their ports based on the established routing table and the addressing information stored on the packets. A collection of end nodes connected to one another through one or more switches form a subnet. Each subnet must have at least one Subnet Manager that is responsible for the configuration and management of the subnet. 11
  • 12. Seminar Report InfiniBand Figure :System Area Network based on the InfiniBand Architecture Routers are like switches in the respect that they simply forward packets between their ports. The difference between the routers and the switches, however, is that a router is used to interconnect two or more subnets to form a larger multi-domain system area network. Within a subnet, each port is assigned a unique identifier by the subnet manager called the Local ID or LID. In addition to the LID, each port is assigned a globally unique identifier called the GID. Switches make use of the LIDs for routing packets from the source to the destination, whereas Routers make use of the GIDs for routing packets across domains. One more feature of the InfiniBand Architecture that is not available in the current shared bus I/O architecture is the ability to partition the ports within the fabric that can communicate with one another. This is useful for partitioning the available storage across one or more servers for management reasons and/or for security reasons. 12
  • 13. Seminar Report InfiniBand LAYERS InfiniBand is specified as a layered architecture with physical, link, network, transport, and a number of upper layer protocols that can be thought of as an application layer. The link and transport layers define, switch, and deliver InfiniBand packets and handle load balancing and quality-of-service functions. The network layer gets involved if data is exchanged among subnets, handling the movement of packets among routers. The physical layer defines the hardware components. The first area of examination is the physical layer of the IBA specification. IBA allows for multiple connect paths scaling up to 30 Gigabit per second in performance. Since the full duplex serial communications nature of IBA specifies a requirement of only four wires, a high-speed 12x implementation requires only a moderate 48 wires (or pathways Other specifications of the IBA physical layer include provisions for custom back plane I/O connectors and hot swapping capabilities. 13
  • 14. Seminar Report InfiniBand The link and transport layers of InfiniBand are perhaps the most important aspect of this new interconnect standard. At the packet communications level, two specific packet types for data transfer and network management are specified. The management packets provide operational control over device enumeration, subnet directing, and fault tolerance. Data packets transfer the actual information with each packet deploying a maximum of four kilobytes of transaction information. Within each specific device subnet the packet direction and switching properties are directed via a Subnet Manager with 16 bit local identification addresses. The link layer also allows for the Quality of Service characteristics of InfiniBand. The primary QoS consideration is the usage of the Virtual Lane (VL) architecture for interconnectivity. Even though a single IBA data path may be defined at the hardware level, the VL approach allows for 16 logical links. With 15 independent levels (VL0-14) and one management path (VL15) available, the ability to configure device specific prioritization is available. Since management requires the most priority, VL15 retains the maximum priority. The ability to assert a priority driven architecture lends not only to Quality of Service, but performance as well. To further assure reliability and performance, InfiniBand offers a credit-based flow control management routine and two phases for data integrity checking. With the credit approach, each receiving node the IBA network forwards the originating device 14
  • 15. Seminar Report InfiniBand with a value representing the maximum amount of efficiently transferable data with no packet loss. The credit data is transferred via a dedicated link along the IBA fabric. No packets are sent until the sending device is signaled that data transfer is possible via the receiving devices primary communications buffer. To insure data integrity, two CRC values are forwarded within each network packet. For link level checking, a 16-bit variant CRC value is assigned for each data field and is recalculated at each IBA fabric hop. A 32-bit invariant CRC value is also assigned, but it only provides checking for static data that does not change between each IBA hop point for end-to-end integrity checking. InfiniBand’s network layer provides for the routing of packets from one subnet to another. Each routed packet features a Global Route Header (GRH) and a 128-bit IP address for both source and destination nodes. The network layer also embeds a standard 64-bit unique global identifier for each device along all subnets. By the intricate switching of these identifier values, the IBA network allows for data transversal across multiple logical subnets. The last layer of the InfiniBand architecture is responsible for the actual transporting of packets. The transport layer controls several key aspects including packet delivery, channel multiplexing, and base transport servicing. Basic network fabric characteristics, such as Maximum Transfer Unit (MTU) size and 15
  • 16. Seminar Report InfiniBand Base Transport Header (BTH) direction, are also directly handled at this underlying transport layer. COMMUNICATION STACK In order to achieve better performance and scalability at a lower cost, system architects have come up with the concept of clustering, where two or more servers are connected together to form a single logical server. In order to achieve the most benefit from the clustering of multiple servers, the protocol used for communication between the physical servers must provide high bandwidth and low latency. Unfortunately, full-fledged network protocols such as TCP/IP, in order to achieve good performance across both LANs and WANs, have become so complex that both incur considerable latency and require many thousands of lines of code for their implementation. To overcome these issues came up the Virtual Interface (VI) Architecture (VIA) specification. The VI Architecture is a server messaging protocol whose focus is to provide a very low latency link between the communicating servers. In transferring a block of data from one server to another, latency arises in the form of overhead and delays that are added to the time needed to transfer the actual data. If we were to break down the latency into its components, the major contributors would be: a) the overhead of executing network protocol code within the operating system, b) context switches to move in and out of kernel mode to receive 16
  • 17. Seminar Report InfiniBand and send out the data, and c) excessive copying of data between the user level buffers and the NIC memory. Since VIA was only intended to be used for communication across the physical servers of a cluster (in other words across high-bandwidth links with very high reliability), the specification can eliminate much of the standard network protocol code that deals with special cases. Also, because of the well-defined environment of operation, the message exchange protocol was defined to avoid kernel mode interaction and allow for access to the NIC from user mode. Finally, because of the direct access to the NIC, unnecessary copying of the data into kernel buffers was also eliminated since the user is able to directly transfer data from user-space to the NIC. In addition to the standard send/receive operations that are typically available in a networking library, the VIA provides Remote Direct Memory Access operations where the initiator of the operation specifies both the source and destination of a data transfer, resulting in zero-copy data transfers with minimum involvement of the CPUs. InfiniBand uses basically the VIA primitives for its operation at the transport layer. In order for an application to communicate with another application over the InfiniBand it must first create a work queue that consists of a queue pair (QP), each of which consists of a Send Queue and Receive Queue. In order for the application to execute an operation, it must place a work queue element (WQE) in the work queue. From there the 17
  • 18. Seminar Report InfiniBand operation is picked-up for execution by the channel adapter. Therefore, the Work Queue forms the communications medium between applications and the channel adapter, relieving the operating system from having to deal with this responsibility. Figure . InfiniBand Communication Stack Each process may create one or more QPs for communications purposes with another application. Since both the protocol and the structures are all very clearly defined, queue pairs can implemented in hardware, thereby off-loading most of the work from the CPU. At the target end, the Receive Queue of the target WQE allocates the data to a location. Then, the WQE's close and QP's shut down. Packet headers identify the data as well as its source, destination, and priority for QOS applications, and verify its integrity. Packets are sent to either a single target or multicast to multiple targets. Once a WQE has been processed properly, a completion queue element (CQE) is created and placed in the completion queue. The advantage of using the completion queue 18
  • 19. Seminar Report InfiniBand for notifying the caller of completed WQEs is because it reduces the interrupts that would be otherwise generated. The list of operations supported by the InfiniBand architecture at the transport level for Send Queues are as follows: Send/Receive: supports the typical send/receive operation where one node submits a message and another node receives that message. One difference between the implementation of the send/receive operation under the InfiniBand architecture and traditional networking protocols is that the InfiniBand defines the send/receive operations as operating against queue pairs. RDMA-Write: this operation permits one node to write data directly into a memory buffer on a remote node. The remote node must of course have given appropriate access privileges to the node ahead of time and must have memory buffers already registered for remote access. RDMA-Read: this operation permits one node to read data directly from the memory buffer of a remote node. The remote node must of course have given appropriate access privileges to the node ahead of time. RDMA Atomics: this operation name actually refers to two different operations. The Compare & Swap operation allows a node to read a memory location and if its value is equal to a specified value, then a new value is written in that memory location. The Fetch Add atomic operation reads a value and 19
  • 20. Seminar Report InfiniBand returns it to the caller and then add a specified number to that value and saves it back at the same address. For Receive Queue the only type of operation is: Post Receive Buffer: identifies a buffer into which a client may send to or receive data from through a Send, RDMA-Write, RDMA-Read operation. When a QP is created, the caller may associate with the QP one of five different transport service types. A process may create and use more than one QP, each of a different transport service type. The InfiniBand transport service types are: • Reliable Connection (RC): reliable transfer of data between two entities. • Unreliable Connection (UC): unreliable transfer of data between two entities. Like RC there are only two entities involved in the data transfer but message may be lost. • Reliable Datagram (RD): the QP can send and receive messages from one or more QPs using a reliable datagram channel (RDC) between each pair of reliable datagram domains (RDDs). • Unreliable Datagram (UD): the QP can send and receive messages from one or more QPs however the messages may get lost. • Raw Datagram: the raw datagram is a data link layer service which provides the QP with the ability to send and receive raw datagram messages that are not interpreted 20
  • 21. Seminar Report InfiniBand SPECIFICATIONS InfiniBand Specifications Layered Protocol Multiple Layer Connectivity Packet-based Communications Multicast Capable Packet and End Node Fault Tolerance Subnet Management Capability Varied Link Speeds – 1x, 4x, 12x 2.5 – 30 Gigabit per Second Transfers PCB, Copper, and Fiber Mediums Support for Remote DMA Quality of Service Considerations 21
  • 23. Seminar Report InfiniBand COMPARISON Feature Bandwidth (Gbit/sec) Pin Count InfiniBand 4,16,48 4/16/48 PCI-X Fibre Channel 8.51 4 Gigabit Ethernet 1 90 4 4 Centimeters Kilometers Kilometers Copper, Copper, Fiber Fiber Max Signal Kilometers Length Transport Medium PCB, Copper, PCB Fiber Simultaneous Peer-to-Peer 15 Virtual Lanes Supported Connections Native Virtual Interface Multicast Support End-to-End Flow Control Memory Partitioning QoS Reliable Connectivity Scalable Connectivity Supported Supported Supported Supported Payload Supported Supported Supported Supported 4 KB Supported Supported Supported Maximum Packet Supported Supported Bus, No Packets 23 Supported 2 KB 9 KB
  • 24. Seminar Report InfiniBand Several current and near future interconnect standards are competing for mass-market acceptance. Competitive alternatives to InfiniBand include PCI-X, Gigabit Ethernet, Fibre Channel, and Rapid I/O. PCI-X will likely represent the next generation in back plane connectivity for the majority of computer users. Even though certain performance deficiencies are better negotiated by PCI-X, the shared bus nature of this technology limits scalability and reliability. PCI-X requires a separate controller for each 133 MHz device. However, IBA allows for 64,000 nodes per subnet through the employing of multiple port density IBA Switches. As compared to Gigabit Ethernet technology, IBA offers several significant advantages, especially concerning the physical layer device (PHY) model. InfiniBand specifies a minimal .25watt per port for operation over copper up to a maximum of sixteen meters. In contrast, Gigabit Ethernet defines a minimum of 2-watts per port as this type of connection is best utilized at distances of 100 meters or more. The low power pathways of IBA allows for more efficient hub designs with lower amperage demands. 24
  • 25. Seminar Report InfiniBand BENEFITS & LIMITATIONS  BENEFITS  Offload communication processing from OS & CPU InfiniBand channel adapters have their own intelligence, they offload lots of communications processing from the operating system and CPU.  Scalability: The switched nature of InfiniBand delivers tons of scalability; it can support ever more numbers of simultaneous collision-free connections with virtually no increase in latency.  Many-to many communication: InfiniBand's many-to-many communications means that anything you add to the network, be it a CPU, fibre channel SAN, or a simple SCSI drive, can share its resources with any other node.  Reliability: Reliability is assured as data can take many paths across the fabric.  Easier fault tolerence & trouble shooting: Fault isolation and troubleshooting are easier in a switched environment since it's easier to isolate problems to a single connection.  Support clustering: InfiniBand will help clustering take off as it will provide a standard high-speed, low-latency interconnect that can be used across tens or hundreds of servers. 25
  • 26. Seminar Report  InfiniBand LIMITAIONS • Long & difficult: The process will be long and difficult. • Interoperating problems: As usual, vendors are promising interoperability, and compatibility with existing hardware and software. This always sounds great, but the reality is often one of longer timetables, vendors adding their own proprietary functionality so that interoperability becomes a problem. • Software implementation implementations typically take take longer: longer than Software originally anticipated. • Management features problematic • Initial unanticipated performance problems: Initial unanticipated performance problems, and numerous bugs that have to be ironed out. CONCLUSION Mellanox and related companies are now positioned to release InfiniBand as a multifaceted architecture within several market segments. The most notable application area is enterprise class network clusters and Internet data centers. These types of applications require extreme performance with the maximum in fault tolerance and reliability. Other computing system uses include Internet service providers, co-location hosting, and large 26
  • 27. Seminar Report InfiniBand corporate networks. IBA will also see marketing in the embedded device arena for high-speed routers, switches, and massive telecommunications devices. At least for the initial introduction, InfiniBand is positioned as a complimentary architecture. IBA will move through a transitional period where future PCI, IBA, and other interconnect standards can be offered within the same system or network. The understanding of PCI limitations (even PCI-X) should allow InfiniBand to be an aggressive market contender as higher-class systems move towards the conversion to IBA devices . Currently, Mellanox is developing the IBA software interface standard using Linux as their internal OS choice. The open source nature of Linux allows for easy code modification. At release, InfiniBand will also be supported by Microsoft Windows and the future Windows Whistler. Expect Unix support to be available as well, considering the relative ease of porting Linux-based drivers. Another key concern is the costs of implementing InfiniBand at the consumer level. Industry sources are currently projecting IBA prices to fall somewhere between the currently available Gigabit Ethernet and Fibre Channel technologies. Assuming these estimates prove accurate, InfiniBand could be positioned as the dominant I/O connectivity architecture at all upper tier levels. These same sources also offered current rumors that InfiniBand is being researched at the vendor level as possibility for a future I/O standard in mid-range PC desktops and 27
  • 28. Seminar Report InfiniBand workstations. Considering the competitive price point of IBA, this projection offers ominous potential at all market levels. This is definitely a technology to watch. REFERENCES  William T. Futral, InfiniBand Architecture: Development and Deployment. A Strategic Guide to Server I/O Solutions, Intel Press, 2001).  An Introduction to the InfiniBand Architecture by Odysseas Pentakalos  InfiniBand: What's Next? By Jeffrey Burt  Mellanox Technologies, "Understanding PCI Bus, 3GIO, and InfiniBand Architecture  Infiniband in blade servers By Scott Tyler Shafer  Compaq Computer Corporation, "PCI-X: An Evolution of the PCI Bus," September 1999  http://www.extremetech.com/  http://searchstorage.techtarget.com/  http://isp.webopedia.com/ 28