1. Project
experimentation
Evaluation of the RINA prototype in the iLab.t
virtual wall and Experimentatestbeds
GENI-FIRE Workshop, Belgium, October 2013
Investigating RINA as an Alternative to TCP/IP
2. Project at a glance
•
What? Main goals
– To advance the state of the art of RINAtowards an architecture reference
model and specificationsthat are closerto enable implementations
deployable in production scenarios.
– The designand implementation of a RINA prototype on top of Ethernet
will enable the experimentationand evaluation of RINA in comparison to
TCP/IP.
Who? 4 partners
5 activities:
WP1: Project management
WP2: Architecture, Use cases and
Requirements
WP3: Software Design and
Implementation
WP4: Deployment into OFELIA
testbed, Experimentation and
Validation
WP5: Dissemination, Standardisation
and Exploitation
IRATI - Investigating RINA as an Alternative to TCP/IP
Budget
Total Cost
1.126.660 €
EC Contribution
870.000 €
Duration
2 years
Start Date
1st January 2013
External Advisory Board
Juniper Networks, ATOS,
Cisco Systems, Telecom Italia, BU
2
3. RINA is an..
Innovative approach to computer networking
using inter-process communications (IPC), a set
of techniques for the exchange of data among
multiple threads in processes running on one or
more computers connected to a network.
Ref. : J. Day: “Patterns in Network Architecture: A Return to Fundamentals, Prentice Hall, 2008.
The RINA principle:
Networking is not a layered set of
different functions but rather a single
layer (DIF) of distributed IPC’s that
repeats over different scopes.
3
4. RINA Architecture
•
1
DIF A
2
1
DIF B
2
1
2 DIF C
2
DIF E
•
1
There’s asingle type of layer
thatrepeats as many times as
required by the network
designer
•
Separation of mechanism from
policy
2
1
2
DIF F
All layers have the same functions, with different scope and range.
–
•
3
•
DIF D
1
•
4
3
A structure of recursive layers
that provide IPC (Inter Process
Communication)services to
applications on top
Not all instances of layers may need all functions, but don’t need more.
A Layer is a Distributed Application that performs and manages IPC(a Distributed IPC
Facility –DIF-)
This yields a theory and an architecture that scales indefinitely,
– i.e. any bounds imposed are not a property of the architecture itself.
4
5. Architectural model
System (Host)
System
(Router)
Appl.
Process
Appl.
Process
Mgmt
Agemt
DIF
IPC Process
IPC Process
IPC Process
Mgmt
Agemt
Shim IPC
Process
Shim DIF
over TCP/UDP
Shim IPC
Process
Shim IPC
Process
System
(Host)
Shim DIF
over Ethernet
Mgmt
Agemt
Shim IPC
Process
IPC API
Data Transfer
SDU Delimiting
Relaying and
Multiplexing
SDU Protection
State Vector
State Vector
State Vector
Data Transfer
Data Transfer
Data Transfer
Layer Management
Data Transfer Control
Transmission
Transmission
Transmission
Control
Control
Control
Retransmission
Retransmission
Retransmission
Control
Control
Control
Flow Control
Flow Control
Flow Control
CACEP
RIB
Daemon
RIB
Enrollment
Authentication
Flow Allocation
CDAP
Parser/Generator
Resource Allocation
Forwarding Table
Generator
Increasing timescale (functions performed less often) and complexity
6
6. Flow of RINA (IRATI) R&D and
experimentation activities
(feedback between activities not shown for clarity reasons)
DIF
creation
Data
transfer
Manage
ment
Security
Multiplexing
Research on Application
policies for
discovery
different
Enrollment
areas
Routing
Policy
specs
Resource
allocation
Design and
development of
simulators
Study
different use
cases and
deployment
options
Research on
RINA
reference
model
Core
RINA
specs
Simul
ators
Use
case
analy
sis
Proto
types
Prototyping
Java
VM
Linux
OS
Data
and
conclu
sions
Experiment
ation and
validation
Different
Platforms
Android
OS
NetFP
GA
TCP/UDP
/IP
Coexisting
VLANs
with
different
technolog
WiFi
ies
MPLS
LTE
7. Phase 1: Basic Functionality - UNIX-like OS
•
•
•
•
Ongoing
Validate basic RINA functionality
Define the requirements of a RINA deployment within a local area
network (weak security requirements, support of legacy applications,
best-effort QoS, flat addressing scheme)
The target platform will be Debian with RINA in the kernel stack
Multi-island experiment on iLab.t
virtual wall and i2CAT’s
Experimentatestbed
Single-island deployment with
corresponding RINA DIFs
IRATI - Investigating RINA as an Alternative to TCP/IP
7
8. Phase 2: Scalability and JunOS
•
•
Planned July 2014
Target different deployment scenarios
– single network provider with different network hierarchies, different levels of
QoS, multiple network service providers, etc
•
•
Assume that all the networks are either RINA or Ethernet capable (i.e. no IP)
The UNIX- like OS and JunOS will be the target platforms of this phase
Single island with Juniper router and multiple RINA nodes within the Virtual Wall
IRATI - Investigating RINA as an Alternative to TCP/IP
8
9. Phase 3: IP gateway and interoperability
•
•
•
Planned Dec 2014
Interoperability between RINA prototypes, developed outside of the
project and deployed in a RINA network surrounded by an IP network
At this stage we will collaborate with the Pouzin Society through
Boston University
OFELIA
Interoperability between the PSOC and IRATI RINA prototypes
IRATI - Investigating RINA as an Alternative to TCP/IP
9
10. Requirement
IRATI
Resource discovery
Availability of nodes, potential VM capabilities (CPU, memory, HD, interfaces),
being able to design the L2 connectivity graph between virtual interfaces, VLAN
tagging support
Resource reservation
All resources available at once.
Resource provisioning
Instantiation of VMs and configuration of L2 switches (creation of the required
VLANs) to setup the connectivity between VMs. Configuration of the VLANs in the
interfaces of the VMs would be nice to have.
Experiment control
Experimenters log into the different VMs tosetup different configurations of the
RINA software, execute different test applications, etc.
Monitoring
Monitoring of traffic per VLAN, as well as the resource utilization of the virtual
machines (CPU, memory, virtual interfaces). Utilities for easily capturing and crafting
ARP and Ethernet frames would be nice to have.
Permanent storage
The virtual machines hosting the IRATI prototype require 15-20 GB of storage to
host the OS, RINA binaries plus traces, logs and state.
Identity management
Allow different experimenters within the project to setup independent slices
Authorization
Individual access to different slices (some of them can be shared between
multiple researchers)
SLA management
--
First Level Support
StatusinformationontheVirtualMachinesandconnectivity amongst them, plus
ability to request for corrective actions if something breaks
Dataplane
interconnection
For RINA over Ethernet: L2 interconnection with VLAN-tagging support, would be
nice to be able to choose different loss and delay distributions for the links. For
RINA over TCP/UDP: L3 interconnection (IPv4 at a minimum, IPv6 nice to have),
with accesstotheInternet (interop with other RINA prototypes)
11
11. Federation issue: VLAN transparency
•
IRATI requires VLAN tags as a DIF name
•
The iLab.t virtual wall uses VLANs to separate experiments.
•
The central switch does not support double tagging (802.1ad), all
frames with Ethertype 0x8100 are dropped by the central switch. VLANs
cannot be used inside experiments.
– Solution: patched the linux kernels (version 3.9.6) and NIC device
drivers of the machines to use Ethertype 0x7100 instead of 0x8100 for
802.1Q traffic inside vwall.
•
Need an additional machine to do “Ethertype translation” between the
i2CAT Experimenta and iLab.t virtual wall testbeds.
Investigating RINA as an Alternative to TCP/IP
12
12. Thanks for your attention!
Sergi Figuerola
(sergi.figuerola@i2cat.net)
Investigating RINA as an Alternative to TCP/IP
Notas do Editor
IPC Process ComponentsData transfer service APIThis is the only externally visible API for application processes using the IPC Process services. This API allows applications to make themselves available through a DIF and to request and use IPC services to other applications. The abstract API has six operations (implementations may have more operations for convenience of use and to adapt to the specifics of each operating system, but still logically providing the same operations):portId _allocateFlow(destAppName, List<qosParams>). This operation enables an application to allocate a flow to a destination application (identified by destAppName), specifying a list of desired QoS parameters. The operation returns a handle to the flow, the portId, used in other operations to read/write SDUs (Service Data Units, the user data) to the flow.void _write(portId, sdu). Sends an SDU through the flow identified by portId. SDUs are buffers of user data with a certain length. SDUs are delivered to the destination application as they where written by the source application.sdu _read(portId). Read an SDU from the flow identified by portId.void _registerApplication(appName, List<DIFName>). Register the application identified by appName to the DIFs identified in the list of difNames. This operation advertises the application within a DIF, so that flows can be allocated to it (it will be always up to the application to take the final decision refusing or accepting them).void _unregisterApplication(appName, List<DIFName>). Unregister an application from a set of DIFs or all the DIFs (if the second argument is not present).More information about the data transfer service API is available at the “Data Transfer Service Definition” specification, pages 179-192 of the “RINA specification handbook”. SDU DelimitingThe first step in this processing path is to delimit the SDUs posted by the application; since the data transfer protocol may implement concatenation and/or fragmentation of the SDUs in order to achieve a better data transport efficiency and/or to better adapt to the DIF characteristics.More information about the SDU Delimiting component is available at the “Specification template for a DIF delimiting module” specification, pages 193-194 of the “RINA specification handbook”. Error and Flow Control Protocol (EFCP)The Error and Flow Control Protocol (EFCP) is split into two parts: the data transfer protocol (DTP) and the Data Transfer Control Protocol (DTCP), loosely coupled through the use of a state vector. DTP performs the mechanisms that are tightly coupled to the transported SDU, such as fragmentation, reassembly, sequencing, addressing, concatenation and separation.DTCP performs the mechanisms that are loosely coupled to the transported SDU, such as transmission control, retransmission control and flow control. When a flow is allocated an instance of DTP and its associated state vector are created. The flows that require flow control, transmission control or retransmission control will have a companion DTCP instance allocated. The string of octets exchanged between two protocol machines is referred to as Protocol Data Unit (PDU). PDUs comprise of two parts, Protocol Control Information (PCI) and user data. PCI is the part understood by the DIF, while the user data is incomprehensible to the DIF and is passed to its user. The PDUs generated by EFCP are passed to the relaying and multiplexing components. RINA’s EFCP is designed based on delta-t, designed by Richard Watson in 1981 [9]. Watson proved that the necessary and sufficient conditions for reliable synchronization is to bound 3 timers: Maximum Packet Lifetime (MPL), Maximum time to acknowledge and Maximum time to keep retransmitting. In other words: SYNs and FINs in TCP are unnecessary, allowing for a simpler and more secure data transfer protocol.More information about the EFCP component is available at the “Error and Flow Control Protocol Specification: Data Transfer + Data Transfer Control” specification, pages 195-232 of the “RINA specification handbook”. Relaying and Multiplexing Task (RMT)The role of the Relaying task is to forward the PDUs passing through the IPC Process to the destination EFCP Protocol Machine (PM) by checking the destination address in the PCI. The decision on forwarding is based on the routing information and the Quality of Service agreed. The Multiplexing task multiplexes PDUs from different EFCP instances onto the points of attachment of lower ranking (N-1) DIFs. There are several policies that decide when and where the PDU are forwarded (management of queues, scheduling, length of queues). These policies affect the delivered Quality of Service.More information about the RMT component is available at the “Relaying and Multiplexing Task” specification, pages 233-240 of the “RINA specification handbook”. SDU ProtectionSDU Protection includes all the checks necessary to determine whether or not a PDU should be processed further (for incoming PDUs) or to protect the contents of the PDU while in transit to another IPC Process that is a member of the DIF (for outgoing PDUs). It may include but is not limited to checksums, CRCs, encryption, Hop Count/Time To Live mechanisms. The SDU Protection mechanisms to be applied may change hop by hop (since they depend on the characteristics of the underlying DIFs). In RINA, Deep Packet Inspection is unnecessary and often impossible.More information about the SDU Protection component is available at the “Specification Template for a DIF SDU Protection module” specification, pages 241-244 of the “RINA specification handbook”. The Resource Information Base (RIB) and the RIB DaemonThe Resource Information Base (RIB) is the logical representation of the objects that capture the information that define an application state. Looking at the IPC Process, this means objects that represent information about mappings of addresses, resource allocation, connectivity, available applications, security credentials, established flows, forwarding and routing tables, and so on. The RIB Daemon is the task that controls the access to the RIB, and also optimizes the operations on the RIB performed by other components of the IPC Processes.More information about the RIB and RIB Daemon components is available at the “Specification of Managed Objects for the Demo DIF” specification, pages 281-289 of the “RINA specification handbook”. The Common Distributed Application Protocol (CDAP) and the Common Application Connection Establishment Phase (CACEP)The Common Distributed Application Protocol, CDAP, is the canonical application protocol, similar to an assembly language that can be used to build all the distributed applications. CDAP provides six primitives to operate on remote objects: create, delete, read, write, start and stop. IPC Processes use CDAP to modify the RIBs of other IPC Processes, which triggers changes in the behaviour of the IPC Processes. CDAP is modelled after OSI’s CMIP, the Common Management Information Protocol. Any existing application protocol can use the DIF (can be transported by a flow), however we only use CDAP inside the DIF to test our theory that there is only one application protocol.More information about CACEP and CDAP is available at the “Common Application Establishment Phase” and “CDAP - Common Distributed Application Protocol” specifications, pages 106-118 and pages 119-160 of the “RINA specification handbook”, respectively.The Enrollment TaskAll communication goes through three phases: Enrollment, Allocation (Establishment), and Data Transfer. RINA is no exception. Enrollment is the procedure by which an IPC Process joins an existing DIF and obtains enough information to start operating as a member of this DIF. Enrollment starts when the joining IPC Process establishes an application connection with another IPC Process that is already a member of the DIF. During the application connection establishment, the IPC Process that is a DIF member may want to authenticate the joining process, depending on the DIF security requirements. The CACE component (Common Application Connection Establishment) is the one in charge of establishing and releasing application connections. Several authentication modules can be plugged into CACE, to implement different authentication policies. Once the application connection has been established, the joining IPC Process needs to acquire the DIF static information: what QoS classes are supported and what are its characteristics, what are the policies that the DIF supports, and other parameters such as the DIF’s MPL or maximum PDU size.More information about the Enrollment task component is available at the “Basic Enrollment” specification, pages 251-256 of the “RINA specification handbook”. The Flow Allocator (FA)Flow allocation is the component responsible for managing a flow’s lifecycle: allocation, monitoring and deallocation. Unlike with TCP, in RINA port allocation and data transfer are separate functions, meaning that a single flow can be supported by one or more data transport connections (in TCP a port number is mapped to one and only one TCP connection, the port numbers identify the TCP connection). The Flow Allocator (FA) component handles the flow allocation/deallocation requests. Among its tasks it has to: i) find the IPC Process through which the destination application is accessible; ii) map the requested QoS to policies that will be associated with the flow, iii) negotiate the flow allocation with the destination IPC Process FA (access control permissions, policies associated with the flow), iv) create one or more DTP and optionally DTCP instances to support the flow, v) monitor the DTP/DTCP instances to ensure the requested QoS is maintained during the flow lifetime, and take specific actions to correct any misbehaviours and vi) deallocate the resources associated to the flow once the flow is terminated.More information about the FA component is available at the “Flow Allocator” specification, pages 257-268 of the “RINA specification handbook”. The Forwarding Table Generator (Routing)The Forwarding Table Generator (or Routing) is the IPC Process component that exchanges connectivity information with other IPC processes of the DIF and applies an algorithm to generate the forwarding table used by the Relaying and Multiplexing Task (connectivity as well as QoS and resource allocation information is used to generate the forwarding table). The algorithms and information required to generate the forwarding table may be multiple, depending on the QoS classes supported by the DIF.More information about the routing component is available as one of the specifications proposed by the IRATI consortium. It can be found in section 6 of this document.The Resource Allocator (RA)The Resource Allocator is the component that decides how the resources in the IPC Process are allocated (dimensioning of the queues, creation/suspension/deletion of queues, creation/deletion of N-1 flows, and others).More information about the RA component is available at the “RINA Reference model part 3: Distributed InterProcess Communication” document, pages 79-80 of the “RINA specification handbook”. Shim IPC Process over TCP/UDPThis IPC Process wraps a TCP/UDP layer and presents it with the IPC API, allowing "normal" IPC Processes to be overlaid on IP layers.More information about the shim DIF over TCP/UDP component is available at the “Specification for shim IPC Processes over IP layers” document, pages 273-280 of the “RINA specification handbook”. Shim IPC Process over 802.1qThis IPC Process wraps an Ethernet layer and presents it with the IPC API, allowing "normal" IPC Processes to be overlaid on 8021.q layers (VLANs).More information about the shim IPC Process over 802.1q component is available as one of the specifications proposed by the IRATI consortium. It can be found in section 6 of this document. The management agentThe Management agent is used by the DIF Management System (DMS) to monitor the state of the DIF and to make configuration changes including policy changes relating to QoS and security.
With feedback between all the different activities
Figure 1 Single-island deployment (left) with corresponding RINA DIFs (right): The scenario compromises a single OFELIA island, and uses a slice with OpenFlow switches and the server's virtual machines (VMs). Some VMs will be used as RINA routers, others as RINA hosts and one as the OpenFlow controller programming the underlying Ethernet topology. The VMs with RINA software will run the modified UNIX-like kernel with the RINA network stack implemented by IRATI, acting either as endpoints or intermediate nodes (routers) or both. On top of the endpoint nodes, certain applications such as traffic generators (iperf), video streaming client/servers (vlc), http servers (apache) or others will be deployed in order make usage of the network stack. An additional VM, in this case with a standard kernel image, will be used to control the network infrastructure via OpenFlow. This VM might be coordinated with other hosts and services in order to carry on the experimentation and obtain the data traffic traces accordingly.Figure 2 Multi-island experiment with several RINA internetworks: The scenario proposed in Figure 2 is depicting one possible scenario in order to simulate a multi-domain environment. Two or more OFELIA islands could be used for it. In essence, this scenario is an extension of Scenario1 since OFELIA interconnection is a L2 bridge across all the islands. However this scenario would allow links with much higher delay that the local ones, allowing to instantiate RINA-enabled VMs on the boundaries of the domains acting as transit nodes or routers. Last, but not least, when the gateway TCP/IP- RINA can be used during the final stages of the project, OFELIA islands could be used a multi-domain and multi-technology network, mixing islands using the RINA stack and other islands using the TCP/IP legacy stack.
The IRATI project has also the challenge of implementing part of the RINA stack in a real production routing box by means of open SDKs. In this case, IRATI proposes to kindly make use of i2CAT's Juniper MX480 IP router and Juniper's JunOS SDK capabilities, to develop a highly experimental RINA network stack implementation on the routing device. This new implementation will enrich experimentation by allowing analysis of the interoperability of the network stack across different devices and, at the same time, will provide to IRATI first-hand information of the challenges that need to be faced when porting the RINA stack on current IP-based accelerated routing hardware. Besides this phase 2 will also conduct some scalability tests with the UNIX-like RINA prototype, by using the Virtual Wall facility available within OFELIA. The Virtual Wall enables the instantiation of several VMs interconnected by Ethernet segments.
The international scenario is a challenging one, which is encompassing experimentation inside and outside OFELIA FIRE facility. The objectives of this test scenario are, on one side, demonstrate that interaction between another RINA stack implementation, the Java implementation carried on by the Pouzin Society is completely possible; and, on the other side, demonstrate that RINA stack implementations can coexist and speak regardless of the level of adoption and within the current Internet (the Java stack being developed by the Pouzin society works on top of IP/UDP) within the current Internet. In this context, the depicted scenario proposes to connect RINA DIFs on top of UDP/IP between endpoints behind an OFELIA island and hosts residing on USA infrastructure. For doing so, a UDP/IP gateway will be set-up on the edge of the L2 RINA network inside the OFELIA island and a completely standard UDP connection through Internet (without the usage of tunnels) will done between the gateway and the endpoint running the UDP/IP prototype.