SlideShare a Scribd company logo
1 of 6
Download to read offline
An Improved Self-Reconfigurable Interconnection
     Scheme for a Coarse Grain Reconfigurable
                    Architecture
                         Muhammad Ali Shami                                          Ahmed Hemani
                              School of ICT                                           School of ICT
                   Royal Institute of Technology, KTH                      Royal Institute of Technology, KTH
                           Stockholm, Sweden                                       Stockholm, Sweden
                          Email: shami@kth.se                                    Email: hemani@kth.se




   Abstract—An improved Dynamic, Partial and self reconfig-                compose bigger systems using these CGIs by connecting
urable interconnection network (Hybrid-2 Network) is presented            them together. This is also a property of a computational
for Dynamically Reprogrammable Resource Array (DRRA),                     fabric.
which is a Coarse Grain Reconfiguration Architecture (CGRA).
To justify the design decision, Hybrid-2 network implementa-         4)   Local Connectivity: To reduce delay and energy con-
tion is compared against the possible implementations using               sumption, the interconnection network has local connec-
Multiplexer, NoC, Crossbar and already published Hybrid-1                 tivity which is limited to 3-hops communication.
interconnection network. Results shows that newly presented          5)   Non-blocking and Point to Point/Multi-Point: The
Hybrid-2 Interconnection network take (1.08x, 0.104x, 0.212x and          DRRA interconnection network is a Non-blocking,
0.681x) the area, (1x, 0.037x, 0.026x and 0.107x) the configuration
bits of Multiplexer, NoC, Crossbar and Hybrid-1 Implementation            Point-to-Point and Point-to-Multipoint network.
respectively. Hybrid-2 network is also 2.87x and 5.86x faster than   6)   Sliding Window connectivity: The local connectivity in
Multiplexer and Hybrid-1 networks.                                        non-overlapping segments restricts the interconnection
                                                                          network to create a fix maximum size CGI. By having
                      I. I NTRODUCTION
                                                                          connectivity in overlapping segments, a sliding window
   Flexibility of a reconfigurable architecture comes from a) its          style local connectivity is created which allows creation
ability to reconfigure computational logic and b) the ability to           of arbitrary size CGIs.
reconfigure the interconnection network to connect the compu-         7)   Dynamic Reconfiguration: Dynamic reconfiguration of
tational logic blocks with each other. Interconnection network,           a network allows the system to reconfigure the network
in any Coarse Grain Reconfigurable Architecture (CGRA), is                 at run-time. For dynamic reconfiguration, the number
a key component which makes a reconfigurable architecture                  of configuration bits and the configuration time should
flexible. This paper presents an improved interconnection                  be low. The DRRA network is reconfigurable during
network for Dynamically Reprogrammable Resource Array                     runtime on cycle basic.
(DRRA) which is a CGRA fabric. The old interconnection               8)   Partial Reconfiguration: The interconnection network
network, published in [10], will be referred as Hybrid-1.                 which allows configuration of only a segment of the net-
Moreover the new interconnection network, presented in this               work is Partially reconfigurable interconnection network.
paper, will be referred as Hybrid-2 in rest of the paper. The             Configuring only a segment of the network results in
DRRA fabric has the following properties;                                 fewer bits generation, and allow configuration of a part
   1) Creation of Coarse Grain Instruction (CGI): The in-                 of the network without disturbing the network connectiv-
       terconnection network enables creation of coarse grain             ity in the surrounding. In DRRA, even a single network
       instructions by connecting two or more computational               connection can be reconfigured without disturbing the
       resources with each other. The maximum size of the                 other network connections.
       CGI, which can be created, depends on the maximum             9)   Self Reconfiguration: DRRA interconnection network
       connectivity of the reconfigurable system.                          is self configurable which means that the CGIs, which
   2) Arbitrary Parallelism: The interconnection network al-              are created by the combination of the CGRA resources,
       lows creation of many such CGIs and run them in                    can reconfigure the interconnection network. This allows
       parallel. This is the property of the computational fabric         the algorithms running on a CGI to reprogram the
       like FPGA. A CGRA which has this property is called                interconnection network and hence the CGIs according
       a CGRA fabric.                                                     to their need. It also reduces the configuration time
   3) Implementation of large sub-system: In addition to cre-             since the main configuration manager doesn’t have to
       ation of CGI, the interconnection network is also able to          generate and send the configurations. This improvement


978-1-4244-8971-8/10$26.00 c 2010 IEEE
eliminates the need for a separate configuration network
      for the interconnection network.
Properties 1,2,3,4,5,6 and 7 were implemented with Hybrid-
1 in DRRA fabric. The Hybrid-2 implements properties 8
and 9, in addition to properties 1-7, in DRRA fabric. The
property 7 has also been improved by reducing the dynamic
reconfiguration time of the interconnection fabric. This paper
has two main contribution;
   • An improvement over existing Hybrid-1 interconnection
     scheme of DRRA fabric. The improvement not only
     includes new and improved functionality (property 7,
     8 and 9) but also includes a redesigned switchbox to
     reduce the number of configuration bits and configuration
     memory size.
   • A quantitative comparison to other Multiplexer, Cross-
                                                                    Fig. 1.   Dynamically Reprogrammable Resource Array(DRRA) Fabric
     bar and NoC based interconnect schemes including the
     Hybrid-1.
   Section-2 discusses the related work. Section-3 contains a     interconnection network. The first level offer nearest neighbor
brief introduction to DRRA. Section-4 presents the different      connectivity, second and third level consists of local and global
implementations of DRRA interconnection network. Section-5        buses.
presents the results while Section-6 concludes the paper.            Interconnect exploration for mapping of algorithms helps to
                                                                  find the best routing and interconnection scheme. This paper
                    II. R ELATED W ORK                            is an effort in exploring the implementation style for DRRA
   Two decade of research on CGRAs has produced a number          Interconnection network discussed in introduction section to
of CGRA architectures with different interconnection prop-        find the best implementation for area, configuration bits, and
erties and their implementation styles. These architectures       power. The DRRA interconnection network is different from
have been reviewed in [3] and [1]. This section will discuss      the above mentioned architectures because it is a computa-
the interconnection schemes in some of these architectures.       tional fabric like an FPGA, and allows creation of a number
ADRS[7] is a CGRA with a multiplexer based mesh network           arbitrary size partitions executing different algorithms.
with topologies like nearest neighboring connectivity, next
                                                                                              III. DRRA
hop connectivity, extra connection to central register file and
vertical busses etc. REMAC [8] also has a Multiplexer based          Dynamically Reprogrammable Resources Array (DRRA)
nearest neighbor connectivity along with full row and column      is a CGRA fabric, as shown in Figure 1, which consists
BUS connectivity. Multiplexer based networks are good to          of pool of a)Arithmatic/Logic (mDPU)[9], b)Storage (RFile)
provide Point-to-Multipoint connectivity, but this comes at the   and c)Control (Sequencers) Resources. These resources are
cost of long wires and high capacitance to drive. This has been   seamlessly partitionable to compose Coarse Grain Instructions
recognized by ADRS and they have proposed a full custom           (CGIs). The arithmetic resources are used to create the data-
transistor[7] to disconnect these segments of the wires which     path for the CGI. Two or more mDPUs can be connected
are not used during a specific network configuration. Crossbar      together to create a complex data-path which matches the
provides full connectivity but requires maximum number of         granularity of the algorithm. The RFile not only provides the
configuration bits and is not scalable. Colt uses a crossbar to    storage, but enough memory ports to feed this complex data-
communicate between data port and array of 4x4 elements           path. The sequencers are used to control these resources by
which are connected in mesh network with nearest neighbor         instantiating them in appropriate mode. The sequencers have
connectivity. VIRAM [6] processor also uses a crossbar for        an instruction memory of 64 words only.
communication between DRAM banks and vector lanes. The               In DRRA a CGI is composed by configuring the in-
crossbar is not scalable and has huge area and configuration       terconnection network which connects these arithmetic and
overhead. Chameleon[4] and Imagine[5] use circuit switched        storage resources with each other. Our goal is to design an
NoC for their interconnection network. Recently Multistage        interconnection network which can create a CGI as complex
Interconnection Network (MIN)[2] has also been proposed for       as Radix-4 FFT butterfly or bigger. To compose such big
CGRA. This network is created to provide arbitrary routing        data-paths, we found that a sliding window communication
by connecting together different stages of the network. Since     of 3-hops would be required. 3-hops communication window
creating a communication path in a NoC based network will         means that every DRRA resource can communication with
require involvement of many geologically distributed switches,    every other DRRA resource in either right or left direction up
creating a self reconfigurable network is not possible by using    to 3-columns away as shown in Figure 1. The Sliding window
this approach. MorphoSys[11] has a three level of Hybrid          means that these communication windows slides with respect
Fig. 3.   Circuit Switched NoC Based DRRA Interconnection Network



                                                                  configured in 6 cycles. Since all the sequencers can program
      Fig. 2.   Multiplexer Based DRRA Interconnection Network
                                                                  their interconnects in parallel, it takes 6 cycles at most to
                                                                  completely program this interconnection network in DRRA. A
to DRRA columns in a way that they are overlapping. The           configuration memory for one DRRA column can be designed
Figure 1 shows a 2x8 fabric of DRRA which is created with         which will be connected to both the sequencers. This will
these properties. It is important to mention that this fabric     result in enabling the two sequencers in a DRRA column to
is a fragment and in 90nm technology, a 10x10mm chip can          configure all the four switch-boxes by just configuring the
accommodate 324 DRRA Cells.                                       memory. The memory will be organized in 12x8 (12 rows and
                                                                  10 column). The first four column bits will decide the input
IV. I NTERCONNECTION I MPLEMENTATION E XPLORATION                 multiplexer which is to be configured while the rest of the
   An interconnection network for an architecture is designed     6-bits will configure the 56x1 multiplexer.
with two main considerations; a)the functionality of the             Multiplexer based network has two problems associated; a)
interconnection network and b)the physical overheads e.g.         The large size Multiplexors cause routing congestion during
area, power, speed, and configuration bits. An interconnection     floorplan, and b) A Point-to-Miltipoint connection results in
network with the functionality discussed in the introduction      every output driving all the inputs (7x12) in the intercon-
section can be implemented using multiple implementation          nection window as shown in Figure 2. This will not only
styles. Hence it becomes important to do an implementation        increase the length of the interconnection wire, but also
exploration of all these implementation styles to find the         increase the driving load of the output. This results in a slower
physical overheads. To do an implementation exploration of        interconnection network which consumes much energy. We
this interconnection network, we have implemented it in Multi-    can break the wire length by driving every output in either
plexer, Crossbar, NoC, Hybrid-1 and Hybrid-2 implementation       right direction or in left direction. That would result in driving
styles. The implementation details and results are discussed in   42 inputs which is still huge.
the sections/subsections below.
                                                                  B. Circuit Switch Network (NoC)
A. Multiplexer Based DRRA Network                                    A circuit switch network can be created for this kind of
   A DRRA interconnection network, as discussed in introduc-      fabric as shown in Figure 3. A fully non-blocking, sliding
tion, can be implemented using Multiplexers. Every resource       window interconnection network with 3-hops connectivity
input, in DRRA fabric, can receive data from resources up to 3-   requires 48 rows. Every column has 12-inputs and 8-outputs.
columns away on both sides as shown in Figure 2. This creates     These 20 input/outputs will be connected to these 48 rows.
an interconnection window of 7-columns. This window of            This will result in 480 4-way switches. Every NoC switch
connectivity moves with the resources, and that is why called     requires four configuration bits to configure resulting in 1920
sliding window. Each column has four resources with two out-      bits of configuration memory in every column.
puts from every resource. This results in selecting one out of       The problem with this network is that if a physical commu-
56(7x4x2) possible outputs for every single input and requires    nication channel is to be established between two resources,
a Multiplexer of size 56x1. Since a column has 12-inputs,         the geographically distributed switchboxes in the path between
twelve 56x1 multiplexers will be required for every column        these two resources will have to be configured. This can
in multiplexer based DRRA interconnection network. A 56x1         be done only by an external configuration unit, since the
multiplexer requires 6-bits to configure, therefore a DRRA         sequencers can only configure local switchboxes. So a self
column will require 72-bits to configure. This interconnection     reconfiguration of this network is not possible. This kind of
scheme is partial, dynamic and self reconfigurable, and doesn’t    NoC can also communicate beyond 3-hops. This communica-
require a dedicated interconnect reconfiguration network. A        tion will be blocking and the synthesis tool will report a lower
sequencer can configure one input per cycle by providing           clock frequency. To avoid this, the NoC switches will have to
6-bits. A complete DRRA column having 12 inputs can be            be pipelined, which will increase their power consumptions
and area.
C. CrossBar based network




       Fig. 4.   Crossbar Based DRRA Interconnecction Network

                                                                                   Fig. 5.   DRRA Hybrid-1 Network
   A crossbar based sliding window network is possible to
create by using small crossbars cascadedly connected together
as shown in Figure 4. To provide connectivity to resources        DRRA resource to drive all the inputs in the 7-column com-
on both sides up to 3-hops away, 48x56 crossbars will be          munication window. However this interconnection network
required. This will result in configuration memory of size         suffers from the delay of the crossbar based switchboxes.
2688-bits per column. These crossbars are used in sliding
window fashion i.e. every crossbar is connected to every          E. Hybrid-2 Network with Tri-state Multiplexers and BUSes
other crossbar up to 3-hops away to create a 3-hops sliding
window network. Crossbar based network can be used for
communication beyond 3-hops, but that communication will
be blocking and will decrease the system clock because of the
longer network delay.
   The problem with this implementation is its huge size,
configuration bits and large network delay. A crossbar has to
configure 2688 possible connections. If a self reconfiguration
requires one cycle to configure one connection, it will take
1344 cycles by the two sequencers to completely configure
the crossbar.
D. Hybrid-1 Network with Crossbars and BUSes
   A single column of DRRA Hybrid-1 interconnection net-
work using Crossbars and Buses is shown in figure 5. This
interconnection network is organized in horizontal and vertical
BUSes with 14x12 Crossbars at the intersection called H2V
crossbars. The horizontal BUSes consist of the outputs of the
DRRA resources which are connected to the inputs of the H2V                        Fig. 6.   DRRA Hybrid-2 Network
crossbars in sliding window fashion as discussed before. These
crossbars receives inputs from resources on both sides up to 3-      Two problems are identified in Hybrid-1 type intercon-
hops (3-columns) away. Each column has four H2V crossbars.        nection network; a)configuration bits are larger than the bits
One H2V crossbar requires 14x12=168 bits to configure. A           present in Multiplexer based network, and b)The network
single DRRA column requires 4x168=672 bits to configure.           delay of this network is also greater than the multiplexer based
This memory is configured by an external configuration unit         network because of the use of crossbars based switchboxes.
through an interconnect configuration network, so a self re-       Therefore the Hybrid-1 interconnection network is improved
configuration is not possible for this network. These horizontal   by redesigning the switchboxes. Figure 6 shows the Hybrid-
inputs to the H2V crossbars are configured to connect to the 12    2 interconnection network with a newer switchbox design.
vertical BUSes which are then connected to the inputs of the      This switchbox consists of twelve 14x1 multiplexers which
resources. This organization of interconnection network, with     are connected to a tri-state buffer. These tri-state buffers are
H2V crossbar based switchboxes, prevents an output from a         permanently connected to one of the twelve vertical buses.
Area      Cfg.Bits   Cfg.Cycles   NetworkDelay
This design has three advantages over the previous design                       (Gates)                           (pS)
a) the configuration bits are reduced, b) the area of the              MUX       8402       120           6        707
switchbox is reduced and c)delay of the switchbox is also             NoC       87840      1920       Variable    Variable
                                                                     Crossbar   43008      2688        1344       Variable
reduced. The new switchbox requires 48 bits to configure in
                                                                     Hybrid-1   13416      672        6*CND       1443
this interconnection network. Since all four switchboxes drives      Hybrid-2   9147       120           6        246
the same vertical buses, their tri-state drivers are mutually
                                                                                             TABLE I
exclusive to each other. We can use this property to create              C OMPARISON B ETWEEN D IFFERENT I MPLEMENTATIONS
a memory organized as 12x6 bits (12 rows and 10 columns).
Every row corresponds to the output connected to one of the
vertical BUSes. First two column bits select the switchbox
from one of the four switchboxes in one column, the next 4-       interconnection network. To configure one complete DRRA
bits select the vertical BUS which is to be derived, and the      column, twelve inputs are configured by the two sequencers.
last 4-bits select the horizontal BUS which will be driving the   A sequencer takes single cycle to configure one input, hence it
selected vertical BUS.                                            takes 6 cycles to completely configure a DRRA column. Since
                                                                  all the DRRA columns are configured independently by their
                                                                  own sequencers, a complete DRRA fabric, no matter how big,
                                                                  can be configured in 6 cycles.
                                                                                           V. R ESULTS
                                                                    The above mentioned interconnect implementations are
                                                                  synthesized for DRRA using TSMC 90nm technology in
                                                                  Cadence RTL Compiler. The Table I contains the data for
                                                                  Area, Configuration Bits, Configuration Cycles and Network
                                                                  delay of these implementations after the synthesis. This data
                                                                  shows that;
                                                                    1) Multiplexer based networks are the best in terms of area,
                                                                        configuration bits and number of cycles to configure
                                                                        the network. However Multiplexer based networks are
                                                                        slow because of the long Point-to-Multipoint wires. This
                                                                        problem has been realized by ADRS as well. To remove
                                                                        this problem they have designed pass transistor based
                                                                        full custom switches to break the wires [7].
                                                                    2) Crossbar and NoC based solutions are very expensive
                                                                        in terms of Area, configuration bits and configuration
                                                                        cycles etc. In NoC based solutions, configuration of a
                Fig. 7.   Application Mapping Flow                      link depends on the number of switches in the path.
                                                                        Partial and Dynamic reconfiguration can be supported
   1) Self Reconfiguration: The new configuration memory                  in NoC and Crossbar based network using an external
has very few bits to configure and is designed as the two port           configuration network. Self reconfiguration cannot be
memory to allow connectivity with the two sequencers present            supported in NoC because the sequencers cannot recon-
in same DRRA column. This allows the sequencers to program              figure the geographically distributed switches involved
the configuration memory hence creating a self reconfiguration            in establishing a communication channel between two
system. Using sequencers, we can dynamically and partially              resources. The configuration cycles in NoC and Cross-
reprogram the interconnection network without the need of               bar based interconnection network also depends on the
the external configuration unit. So the external reconfiguration          number of switches/crossbars involved and configuration
network for interconnects has been completely removed. The              network delay (CND).
interconnect configurations are stored inside the sequencer dur-     3) The Hybrid-1 is better than NoC and Crossbar based
ing storage of the program/configware. The application map-              networks. However it takes more area and configuration
ping flow is shown in figure 7. A DRRA program/configware                  bits, as compared to Multiplexer based network. It is
contains Memory, Data-path and Interconnect instructions.               also slower than the Multiplexer based network. The
This program is loaded into the DRRA sequencer. When                    number of cycles to configure a DRRA Column depends
the sequencer starts, it executes the interconnect instructions         on the Configuration Network Delay (CND) of Hybrid-1
to configure the interconnection network. Once the network               network.
is configured, the data-path and memory instructions are             4) The Hybrid-2 network, as can be seen in Table I,
executed. During execution of the algorithm, the sequencer              has almost same area and configuration bits as that
can issue new Interconnect instructions to re-configure the              of a Multiplexer based network. Since the network is
self reconfigurable, the configuration network delay in                                ACKNOWLEDGMENT
     this network is one. Hence it takes only 6 cycles to           The Author is thankful to Swedish Research Council and
     completely reconfigure a DRRA column. Furthermore            Higher Education Commission of Pakistan for funding this
     all DRRA columns can be reconfigured in parallel,            research.
     therefore it takes only 6 cycles to completely reconfigure
     the whole DRRA fabric. Hybrid-2 network is also faster                                    R EFERENCES
     than the Multiplexer based network. Hybrid-2 network,        [1] M. Baron. Trends in use of reconfigurable platforms. In 41st Pro-
     in reality is a Multiplexer based network with tri-state         ceedings of Design Automation Conference, pages 415–415. IEEE, July
                                                                      2004.
     buffers. The increase in size of the area is because of      [2] R. Ferreira, M. Laure, A. C. Beck, T. Lo, M. Rutzig, and L. Carro. A
     these tri-state buffers. Using this Hybrid-2 approach, we        low cost and adaptable routing network for reconfigurable systems. In
     have broken down the long Point-to-Multipoint wires              Proc. IEEE Int. Symp. Parallel & Distributed Processing IPDPS 2009,
                                                                      pages 1–8, 2009.
     of Multiplexer based network into Point-to-Point wires       [3] R. Hartenstein. A decade of reconfigurable computing: a visionary
     using switchboxes. This doesn’t affect the Point-to-             retrospective. In Design, Automation and Test in Europe, pages 642–649.
     Multipoint capability of the network. This approach is           IEEE, March 2001.
                                                                  [4] P. M. Heysters. Coarse-Grained Reconfigurable Processors; Flexibility
     better than [7] in which pass transistor based switch            Meets Efficiency. PhD Thesis, ISBN:90-365-2076-2, Neitherlands, 2003.
     was used to break the long wires of Multiplexer based        [5] B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D.
     network. Furthermore, the switchboxes in Hybrid-2 have           Owens, B. Towles, A. Chang, and S. Rixner. Imagine: media processing
                                                                      with streams. IEEE MICRO, 21(2):35–46, 2001.
     been designed completely in standard cell technology         [6] C. E. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic,
     which keeps the design flow simple and reduce the time            N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas,
     to market.                                                       N. Treuhaft, and K. Yelick. Scalable processors in the billion-transistor
                                                                      era: Iram. Computer, 30(9):75–78, 1997.
  5) DRRA with Hybrid-2 network is synthesized and floor-          [7] Z. Kwok and S. J. E. Wilton. Register file architecture optimization in a
     planned in 90nm using Cadence RTL compiler and SoC               coarse-grained reconfigurable architecture. In Proc. 13th Annual IEEE
     Encounter. Using this network, 2x8 fabric of DRRA                Symp. Field-Programmable Custom Computing Machines FCCM 2005,
                                                                      pages 35–44, 2005.
     shown in Figure 1 runs at a frequency of 720MHz and          [8] T. Miyamori and K. Olukotun. Remarc:reconfigurable multimedia
     can support a peak local bandwidth of 138GB/s.                   array coprocessor. IEICE Transactions on Information and Systems,
                                                                      82(5):389–397, November 1998.
                                                                  [9] M. A. Shami and A. Hemani. Morphable dpu: Smart and efficient data
                                                                      path for signal processing applications. In Proc. IEEE Workshop Signal
          VI. C ONCLUSION AND F UTURE W ORK                           Processing Systems SiPS 2009, pages 167–172, 2009.
                                                                 [10] M. A. Shami and A. Hemani. Partially reconfigurable interconnection
                                                                      network for dynamically reprogrammable resource array. In IEEE 8th
                                                                      International Conference on ASIC, pages 122–125. IEEE, Octoer 2009.
   An improved implementation of Hybrid-2 interconnection        [11] H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M.
network for Dynamically Reprogrammable Resource Array                 Chaves Filho. Morphosys: an integrated reconfigurable system for data-
has been presented. To justify the design decisions, an in-           parallel and computation-intensive applications. IEEE Transactions on
                                                                      Computers, 49(5):465–481, 2000.
terconnect exploration is done by implementing the same
network using Multiplexer, NoC and Crossbar based network.
Hybrid-2 network is then compared against Multiplexer, NoC,
Crossbar and previously published Hybrid-1 network. Results
show that newly presented network takes (1.08x, 0.104x,
0.212x and 0.681x) the area, (1x, 0.037x, 0.026x and 0.107x)
the configuration bits of Multiplexer, NoC, Crossbar and
Hybrid-1 Implementation. Hybrid-2 network is 2.87x and
5.86x better in terms of speed as compared to Multiplexer
and Hybrid-1 networks. Hybrid-2 network also takes minimum
number of cycles to configure/reconfigure the complete DRRA
column.
   A future version of the interconnection network with ad-
justable sliding window has been planned. By lowering the
clock frequency, the width of the sliding window can be
increased to allow mapping of more complex data paths than
what is possible today. Similarly at higher clock frequencies
this width can be reduced. The future version of DRRA will
have voltage frequency scaling and power shut off method-
ology. This may result in some parts of DRRA working in
different voltage/frequency range or completely turned off.
The DRRA switchboxes will be improved to handle such
situations by having level shifters, or isolators.

More Related Content

What's hot

Congestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksCongestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksNemesio Jr. Macabale
 
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLDYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLIJCNCJournal
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetscsandit
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...IJASCSE
 
ENERGY EFFICIENT MULTICAST ROUTING IN MANET
ENERGY EFFICIENT MULTICAST ROUTING IN MANET ENERGY EFFICIENT MULTICAST ROUTING IN MANET
ENERGY EFFICIENT MULTICAST ROUTING IN MANET ijac journal
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingM H
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS ijwmn
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferEswar Publications
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manetAlexander Decker
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkIJMTST Journal
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networksambitlick
 

What's hot (19)

Congestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksCongestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh Networks
 
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLDYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
 
Ns2 x graphs
Ns2 x graphsNs2 x graphs
Ns2 x graphs
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manets
 
Ax24329333
Ax24329333Ax24329333
Ax24329333
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
 
V25112115
V25112115V25112115
V25112115
 
ENERGY EFFICIENT MULTICAST ROUTING IN MANET
ENERGY EFFICIENT MULTICAST ROUTING IN MANET ENERGY EFFICIENT MULTICAST ROUTING IN MANET
ENERGY EFFICIENT MULTICAST ROUTING IN MANET
 
G0544650
G0544650G0544650
G0544650
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
 
C0431320
C0431320C0431320
C0431320
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
 
B031201016019
B031201016019B031201016019
B031201016019
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection Network
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networks
 

Similar to 83

Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc NetworksSimulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networkscscpconf
 
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksDWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksTal Lavian Ph.D.
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...IJCSEIT Journal
 
Multicast Routing Protocol with Group-Level Congestion Prediction and Perman...
Multicast Routing Protocol with Group-Level Congestion  Prediction and Perman...Multicast Routing Protocol with Group-Level Congestion  Prediction and Perman...
Multicast Routing Protocol with Group-Level Congestion Prediction and Perman...IOSR Journals
 
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...IDES Editor
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
 
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...GiselleginaGloria
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureHiroshi Ono
 
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...IAEME Publication
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit
 
Experimental assessment of abno driven multicast connectivity in flexgrid net...
Experimental assessment of abno driven multicast connectivity in flexgrid net...Experimental assessment of abno driven multicast connectivity in flexgrid net...
Experimental assessment of abno driven multicast connectivity in flexgrid net...ieeepondy
 
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...CSCJournals
 
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETCross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETijcncs
 
Performance evaluation of qos in
Performance evaluation of qos inPerformance evaluation of qos in
Performance evaluation of qos incaijjournal
 
Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyIJRES Journal
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach toIJCNCJournal
 
Java and .net IEEE 2012
Java and .net IEEE 2012Java and .net IEEE 2012
Java and .net IEEE 2012Vipin Jacob
 
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRES
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRESHYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRES
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRESijcsit
 

Similar to 83 (20)

Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc NetworksSimulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
Simulator for Energy Efficient Clustering in Mobile Ad Hoc Networks
 
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksDWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
 
Multicast Routing Protocol with Group-Level Congestion Prediction and Perman...
Multicast Routing Protocol with Group-Level Congestion  Prediction and Perman...Multicast Routing Protocol with Group-Level Congestion  Prediction and Perman...
Multicast Routing Protocol with Group-Level Congestion Prediction and Perman...
 
Ijetr021235
Ijetr021235Ijetr021235
Ijetr021235
 
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...
Cross-layer Design of an Asymmetric Loadpower Control Protocol in Ad hoc Netw...
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
 
Jz2417141717
Jz2417141717Jz2417141717
Jz2417141717
 
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...
PERFORMANCE ANALYSIS OF ENERGY EFFICIENT SCALABLE HEIRARCHIAL PROTOCOL FOR HO...
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
Experimental assessment of abno driven multicast connectivity in flexgrid net...
Experimental assessment of abno driven multicast connectivity in flexgrid net...Experimental assessment of abno driven multicast connectivity in flexgrid net...
Experimental assessment of abno driven multicast connectivity in flexgrid net...
 
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...
Energy Behavior in Ad Hoc Network Minimizing the Number of Hops and Maintaini...
 
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETCross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
 
Performance evaluation of qos in
Performance evaluation of qos inPerformance evaluation of qos in
Performance evaluation of qos in
 
Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A survey
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach to
 
Java and .net IEEE 2012
Java and .net IEEE 2012Java and .net IEEE 2012
Java and .net IEEE 2012
 
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRES
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRESHYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRES
HYBRID OPTICAL AND ELECTRICAL NETWORK FLOWS SCHEDULING IN CLOUD DATA CENTRES
 

More from srimoorthi (20)

84
8484
84
 
75
7575
75
 
73
7373
73
 
72
7272
72
 
70
7070
70
 
69
6969
69
 
68
6868
68
 
63
6363
63
 
62
6262
62
 
61
6161
61
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
56
5656
56
 
50
5050
50
 
55
5555
55
 
52
5252
52
 
53
5353
53
 
51
5151
51
 
49
4949
49
 

83

  • 1. An Improved Self-Reconfigurable Interconnection Scheme for a Coarse Grain Reconfigurable Architecture Muhammad Ali Shami Ahmed Hemani School of ICT School of ICT Royal Institute of Technology, KTH Royal Institute of Technology, KTH Stockholm, Sweden Stockholm, Sweden Email: shami@kth.se Email: hemani@kth.se Abstract—An improved Dynamic, Partial and self reconfig- compose bigger systems using these CGIs by connecting urable interconnection network (Hybrid-2 Network) is presented them together. This is also a property of a computational for Dynamically Reprogrammable Resource Array (DRRA), fabric. which is a Coarse Grain Reconfiguration Architecture (CGRA). To justify the design decision, Hybrid-2 network implementa- 4) Local Connectivity: To reduce delay and energy con- tion is compared against the possible implementations using sumption, the interconnection network has local connec- Multiplexer, NoC, Crossbar and already published Hybrid-1 tivity which is limited to 3-hops communication. interconnection network. Results shows that newly presented 5) Non-blocking and Point to Point/Multi-Point: The Hybrid-2 Interconnection network take (1.08x, 0.104x, 0.212x and DRRA interconnection network is a Non-blocking, 0.681x) the area, (1x, 0.037x, 0.026x and 0.107x) the configuration bits of Multiplexer, NoC, Crossbar and Hybrid-1 Implementation Point-to-Point and Point-to-Multipoint network. respectively. Hybrid-2 network is also 2.87x and 5.86x faster than 6) Sliding Window connectivity: The local connectivity in Multiplexer and Hybrid-1 networks. non-overlapping segments restricts the interconnection network to create a fix maximum size CGI. By having I. I NTRODUCTION connectivity in overlapping segments, a sliding window Flexibility of a reconfigurable architecture comes from a) its style local connectivity is created which allows creation ability to reconfigure computational logic and b) the ability to of arbitrary size CGIs. reconfigure the interconnection network to connect the compu- 7) Dynamic Reconfiguration: Dynamic reconfiguration of tational logic blocks with each other. Interconnection network, a network allows the system to reconfigure the network in any Coarse Grain Reconfigurable Architecture (CGRA), is at run-time. For dynamic reconfiguration, the number a key component which makes a reconfigurable architecture of configuration bits and the configuration time should flexible. This paper presents an improved interconnection be low. The DRRA network is reconfigurable during network for Dynamically Reprogrammable Resource Array runtime on cycle basic. (DRRA) which is a CGRA fabric. The old interconnection 8) Partial Reconfiguration: The interconnection network network, published in [10], will be referred as Hybrid-1. which allows configuration of only a segment of the net- Moreover the new interconnection network, presented in this work is Partially reconfigurable interconnection network. paper, will be referred as Hybrid-2 in rest of the paper. The Configuring only a segment of the network results in DRRA fabric has the following properties; fewer bits generation, and allow configuration of a part 1) Creation of Coarse Grain Instruction (CGI): The in- of the network without disturbing the network connectiv- terconnection network enables creation of coarse grain ity in the surrounding. In DRRA, even a single network instructions by connecting two or more computational connection can be reconfigured without disturbing the resources with each other. The maximum size of the other network connections. CGI, which can be created, depends on the maximum 9) Self Reconfiguration: DRRA interconnection network connectivity of the reconfigurable system. is self configurable which means that the CGIs, which 2) Arbitrary Parallelism: The interconnection network al- are created by the combination of the CGRA resources, lows creation of many such CGIs and run them in can reconfigure the interconnection network. This allows parallel. This is the property of the computational fabric the algorithms running on a CGI to reprogram the like FPGA. A CGRA which has this property is called interconnection network and hence the CGIs according a CGRA fabric. to their need. It also reduces the configuration time 3) Implementation of large sub-system: In addition to cre- since the main configuration manager doesn’t have to ation of CGI, the interconnection network is also able to generate and send the configurations. This improvement 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. eliminates the need for a separate configuration network for the interconnection network. Properties 1,2,3,4,5,6 and 7 were implemented with Hybrid- 1 in DRRA fabric. The Hybrid-2 implements properties 8 and 9, in addition to properties 1-7, in DRRA fabric. The property 7 has also been improved by reducing the dynamic reconfiguration time of the interconnection fabric. This paper has two main contribution; • An improvement over existing Hybrid-1 interconnection scheme of DRRA fabric. The improvement not only includes new and improved functionality (property 7, 8 and 9) but also includes a redesigned switchbox to reduce the number of configuration bits and configuration memory size. • A quantitative comparison to other Multiplexer, Cross- Fig. 1. Dynamically Reprogrammable Resource Array(DRRA) Fabric bar and NoC based interconnect schemes including the Hybrid-1. Section-2 discusses the related work. Section-3 contains a interconnection network. The first level offer nearest neighbor brief introduction to DRRA. Section-4 presents the different connectivity, second and third level consists of local and global implementations of DRRA interconnection network. Section-5 buses. presents the results while Section-6 concludes the paper. Interconnect exploration for mapping of algorithms helps to find the best routing and interconnection scheme. This paper II. R ELATED W ORK is an effort in exploring the implementation style for DRRA Two decade of research on CGRAs has produced a number Interconnection network discussed in introduction section to of CGRA architectures with different interconnection prop- find the best implementation for area, configuration bits, and erties and their implementation styles. These architectures power. The DRRA interconnection network is different from have been reviewed in [3] and [1]. This section will discuss the above mentioned architectures because it is a computa- the interconnection schemes in some of these architectures. tional fabric like an FPGA, and allows creation of a number ADRS[7] is a CGRA with a multiplexer based mesh network arbitrary size partitions executing different algorithms. with topologies like nearest neighboring connectivity, next III. DRRA hop connectivity, extra connection to central register file and vertical busses etc. REMAC [8] also has a Multiplexer based Dynamically Reprogrammable Resources Array (DRRA) nearest neighbor connectivity along with full row and column is a CGRA fabric, as shown in Figure 1, which consists BUS connectivity. Multiplexer based networks are good to of pool of a)Arithmatic/Logic (mDPU)[9], b)Storage (RFile) provide Point-to-Multipoint connectivity, but this comes at the and c)Control (Sequencers) Resources. These resources are cost of long wires and high capacitance to drive. This has been seamlessly partitionable to compose Coarse Grain Instructions recognized by ADRS and they have proposed a full custom (CGIs). The arithmetic resources are used to create the data- transistor[7] to disconnect these segments of the wires which path for the CGI. Two or more mDPUs can be connected are not used during a specific network configuration. Crossbar together to create a complex data-path which matches the provides full connectivity but requires maximum number of granularity of the algorithm. The RFile not only provides the configuration bits and is not scalable. Colt uses a crossbar to storage, but enough memory ports to feed this complex data- communicate between data port and array of 4x4 elements path. The sequencers are used to control these resources by which are connected in mesh network with nearest neighbor instantiating them in appropriate mode. The sequencers have connectivity. VIRAM [6] processor also uses a crossbar for an instruction memory of 64 words only. communication between DRAM banks and vector lanes. The In DRRA a CGI is composed by configuring the in- crossbar is not scalable and has huge area and configuration terconnection network which connects these arithmetic and overhead. Chameleon[4] and Imagine[5] use circuit switched storage resources with each other. Our goal is to design an NoC for their interconnection network. Recently Multistage interconnection network which can create a CGI as complex Interconnection Network (MIN)[2] has also been proposed for as Radix-4 FFT butterfly or bigger. To compose such big CGRA. This network is created to provide arbitrary routing data-paths, we found that a sliding window communication by connecting together different stages of the network. Since of 3-hops would be required. 3-hops communication window creating a communication path in a NoC based network will means that every DRRA resource can communication with require involvement of many geologically distributed switches, every other DRRA resource in either right or left direction up creating a self reconfigurable network is not possible by using to 3-columns away as shown in Figure 1. The Sliding window this approach. MorphoSys[11] has a three level of Hybrid means that these communication windows slides with respect
  • 3. Fig. 3. Circuit Switched NoC Based DRRA Interconnection Network configured in 6 cycles. Since all the sequencers can program Fig. 2. Multiplexer Based DRRA Interconnection Network their interconnects in parallel, it takes 6 cycles at most to completely program this interconnection network in DRRA. A to DRRA columns in a way that they are overlapping. The configuration memory for one DRRA column can be designed Figure 1 shows a 2x8 fabric of DRRA which is created with which will be connected to both the sequencers. This will these properties. It is important to mention that this fabric result in enabling the two sequencers in a DRRA column to is a fragment and in 90nm technology, a 10x10mm chip can configure all the four switch-boxes by just configuring the accommodate 324 DRRA Cells. memory. The memory will be organized in 12x8 (12 rows and 10 column). The first four column bits will decide the input IV. I NTERCONNECTION I MPLEMENTATION E XPLORATION multiplexer which is to be configured while the rest of the An interconnection network for an architecture is designed 6-bits will configure the 56x1 multiplexer. with two main considerations; a)the functionality of the Multiplexer based network has two problems associated; a) interconnection network and b)the physical overheads e.g. The large size Multiplexors cause routing congestion during area, power, speed, and configuration bits. An interconnection floorplan, and b) A Point-to-Miltipoint connection results in network with the functionality discussed in the introduction every output driving all the inputs (7x12) in the intercon- section can be implemented using multiple implementation nection window as shown in Figure 2. This will not only styles. Hence it becomes important to do an implementation increase the length of the interconnection wire, but also exploration of all these implementation styles to find the increase the driving load of the output. This results in a slower physical overheads. To do an implementation exploration of interconnection network which consumes much energy. We this interconnection network, we have implemented it in Multi- can break the wire length by driving every output in either plexer, Crossbar, NoC, Hybrid-1 and Hybrid-2 implementation right direction or in left direction. That would result in driving styles. The implementation details and results are discussed in 42 inputs which is still huge. the sections/subsections below. B. Circuit Switch Network (NoC) A. Multiplexer Based DRRA Network A circuit switch network can be created for this kind of A DRRA interconnection network, as discussed in introduc- fabric as shown in Figure 3. A fully non-blocking, sliding tion, can be implemented using Multiplexers. Every resource window interconnection network with 3-hops connectivity input, in DRRA fabric, can receive data from resources up to 3- requires 48 rows. Every column has 12-inputs and 8-outputs. columns away on both sides as shown in Figure 2. This creates These 20 input/outputs will be connected to these 48 rows. an interconnection window of 7-columns. This window of This will result in 480 4-way switches. Every NoC switch connectivity moves with the resources, and that is why called requires four configuration bits to configure resulting in 1920 sliding window. Each column has four resources with two out- bits of configuration memory in every column. puts from every resource. This results in selecting one out of The problem with this network is that if a physical commu- 56(7x4x2) possible outputs for every single input and requires nication channel is to be established between two resources, a Multiplexer of size 56x1. Since a column has 12-inputs, the geographically distributed switchboxes in the path between twelve 56x1 multiplexers will be required for every column these two resources will have to be configured. This can in multiplexer based DRRA interconnection network. A 56x1 be done only by an external configuration unit, since the multiplexer requires 6-bits to configure, therefore a DRRA sequencers can only configure local switchboxes. So a self column will require 72-bits to configure. This interconnection reconfiguration of this network is not possible. This kind of scheme is partial, dynamic and self reconfigurable, and doesn’t NoC can also communicate beyond 3-hops. This communica- require a dedicated interconnect reconfiguration network. A tion will be blocking and the synthesis tool will report a lower sequencer can configure one input per cycle by providing clock frequency. To avoid this, the NoC switches will have to 6-bits. A complete DRRA column having 12 inputs can be be pipelined, which will increase their power consumptions
  • 4. and area. C. CrossBar based network Fig. 4. Crossbar Based DRRA Interconnecction Network Fig. 5. DRRA Hybrid-1 Network A crossbar based sliding window network is possible to create by using small crossbars cascadedly connected together as shown in Figure 4. To provide connectivity to resources DRRA resource to drive all the inputs in the 7-column com- on both sides up to 3-hops away, 48x56 crossbars will be munication window. However this interconnection network required. This will result in configuration memory of size suffers from the delay of the crossbar based switchboxes. 2688-bits per column. These crossbars are used in sliding window fashion i.e. every crossbar is connected to every E. Hybrid-2 Network with Tri-state Multiplexers and BUSes other crossbar up to 3-hops away to create a 3-hops sliding window network. Crossbar based network can be used for communication beyond 3-hops, but that communication will be blocking and will decrease the system clock because of the longer network delay. The problem with this implementation is its huge size, configuration bits and large network delay. A crossbar has to configure 2688 possible connections. If a self reconfiguration requires one cycle to configure one connection, it will take 1344 cycles by the two sequencers to completely configure the crossbar. D. Hybrid-1 Network with Crossbars and BUSes A single column of DRRA Hybrid-1 interconnection net- work using Crossbars and Buses is shown in figure 5. This interconnection network is organized in horizontal and vertical BUSes with 14x12 Crossbars at the intersection called H2V crossbars. The horizontal BUSes consist of the outputs of the DRRA resources which are connected to the inputs of the H2V Fig. 6. DRRA Hybrid-2 Network crossbars in sliding window fashion as discussed before. These crossbars receives inputs from resources on both sides up to 3- Two problems are identified in Hybrid-1 type intercon- hops (3-columns) away. Each column has four H2V crossbars. nection network; a)configuration bits are larger than the bits One H2V crossbar requires 14x12=168 bits to configure. A present in Multiplexer based network, and b)The network single DRRA column requires 4x168=672 bits to configure. delay of this network is also greater than the multiplexer based This memory is configured by an external configuration unit network because of the use of crossbars based switchboxes. through an interconnect configuration network, so a self re- Therefore the Hybrid-1 interconnection network is improved configuration is not possible for this network. These horizontal by redesigning the switchboxes. Figure 6 shows the Hybrid- inputs to the H2V crossbars are configured to connect to the 12 2 interconnection network with a newer switchbox design. vertical BUSes which are then connected to the inputs of the This switchbox consists of twelve 14x1 multiplexers which resources. This organization of interconnection network, with are connected to a tri-state buffer. These tri-state buffers are H2V crossbar based switchboxes, prevents an output from a permanently connected to one of the twelve vertical buses.
  • 5. Area Cfg.Bits Cfg.Cycles NetworkDelay This design has three advantages over the previous design (Gates) (pS) a) the configuration bits are reduced, b) the area of the MUX 8402 120 6 707 switchbox is reduced and c)delay of the switchbox is also NoC 87840 1920 Variable Variable Crossbar 43008 2688 1344 Variable reduced. The new switchbox requires 48 bits to configure in Hybrid-1 13416 672 6*CND 1443 this interconnection network. Since all four switchboxes drives Hybrid-2 9147 120 6 246 the same vertical buses, their tri-state drivers are mutually TABLE I exclusive to each other. We can use this property to create C OMPARISON B ETWEEN D IFFERENT I MPLEMENTATIONS a memory organized as 12x6 bits (12 rows and 10 columns). Every row corresponds to the output connected to one of the vertical BUSes. First two column bits select the switchbox from one of the four switchboxes in one column, the next 4- interconnection network. To configure one complete DRRA bits select the vertical BUS which is to be derived, and the column, twelve inputs are configured by the two sequencers. last 4-bits select the horizontal BUS which will be driving the A sequencer takes single cycle to configure one input, hence it selected vertical BUS. takes 6 cycles to completely configure a DRRA column. Since all the DRRA columns are configured independently by their own sequencers, a complete DRRA fabric, no matter how big, can be configured in 6 cycles. V. R ESULTS The above mentioned interconnect implementations are synthesized for DRRA using TSMC 90nm technology in Cadence RTL Compiler. The Table I contains the data for Area, Configuration Bits, Configuration Cycles and Network delay of these implementations after the synthesis. This data shows that; 1) Multiplexer based networks are the best in terms of area, configuration bits and number of cycles to configure the network. However Multiplexer based networks are slow because of the long Point-to-Multipoint wires. This problem has been realized by ADRS as well. To remove this problem they have designed pass transistor based full custom switches to break the wires [7]. 2) Crossbar and NoC based solutions are very expensive in terms of Area, configuration bits and configuration cycles etc. In NoC based solutions, configuration of a Fig. 7. Application Mapping Flow link depends on the number of switches in the path. Partial and Dynamic reconfiguration can be supported 1) Self Reconfiguration: The new configuration memory in NoC and Crossbar based network using an external has very few bits to configure and is designed as the two port configuration network. Self reconfiguration cannot be memory to allow connectivity with the two sequencers present supported in NoC because the sequencers cannot recon- in same DRRA column. This allows the sequencers to program figure the geographically distributed switches involved the configuration memory hence creating a self reconfiguration in establishing a communication channel between two system. Using sequencers, we can dynamically and partially resources. The configuration cycles in NoC and Cross- reprogram the interconnection network without the need of bar based interconnection network also depends on the the external configuration unit. So the external reconfiguration number of switches/crossbars involved and configuration network for interconnects has been completely removed. The network delay (CND). interconnect configurations are stored inside the sequencer dur- 3) The Hybrid-1 is better than NoC and Crossbar based ing storage of the program/configware. The application map- networks. However it takes more area and configuration ping flow is shown in figure 7. A DRRA program/configware bits, as compared to Multiplexer based network. It is contains Memory, Data-path and Interconnect instructions. also slower than the Multiplexer based network. The This program is loaded into the DRRA sequencer. When number of cycles to configure a DRRA Column depends the sequencer starts, it executes the interconnect instructions on the Configuration Network Delay (CND) of Hybrid-1 to configure the interconnection network. Once the network network. is configured, the data-path and memory instructions are 4) The Hybrid-2 network, as can be seen in Table I, executed. During execution of the algorithm, the sequencer has almost same area and configuration bits as that can issue new Interconnect instructions to re-configure the of a Multiplexer based network. Since the network is
  • 6. self reconfigurable, the configuration network delay in ACKNOWLEDGMENT this network is one. Hence it takes only 6 cycles to The Author is thankful to Swedish Research Council and completely reconfigure a DRRA column. Furthermore Higher Education Commission of Pakistan for funding this all DRRA columns can be reconfigured in parallel, research. therefore it takes only 6 cycles to completely reconfigure the whole DRRA fabric. Hybrid-2 network is also faster R EFERENCES than the Multiplexer based network. Hybrid-2 network, [1] M. Baron. Trends in use of reconfigurable platforms. In 41st Pro- in reality is a Multiplexer based network with tri-state ceedings of Design Automation Conference, pages 415–415. IEEE, July 2004. buffers. The increase in size of the area is because of [2] R. Ferreira, M. Laure, A. C. Beck, T. Lo, M. Rutzig, and L. Carro. A these tri-state buffers. Using this Hybrid-2 approach, we low cost and adaptable routing network for reconfigurable systems. In have broken down the long Point-to-Multipoint wires Proc. IEEE Int. Symp. Parallel & Distributed Processing IPDPS 2009, pages 1–8, 2009. of Multiplexer based network into Point-to-Point wires [3] R. Hartenstein. A decade of reconfigurable computing: a visionary using switchboxes. This doesn’t affect the Point-to- retrospective. In Design, Automation and Test in Europe, pages 642–649. Multipoint capability of the network. This approach is IEEE, March 2001. [4] P. M. Heysters. Coarse-Grained Reconfigurable Processors; Flexibility better than [7] in which pass transistor based switch Meets Efficiency. PhD Thesis, ISBN:90-365-2076-2, Neitherlands, 2003. was used to break the long wires of Multiplexer based [5] B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. network. Furthermore, the switchboxes in Hybrid-2 have Owens, B. Towles, A. Chang, and S. Rixner. Imagine: media processing with streams. IEEE MICRO, 21(2):35–46, 2001. been designed completely in standard cell technology [6] C. E. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic, which keeps the design flow simple and reduce the time N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, to market. N. Treuhaft, and K. Yelick. Scalable processors in the billion-transistor era: Iram. Computer, 30(9):75–78, 1997. 5) DRRA with Hybrid-2 network is synthesized and floor- [7] Z. Kwok and S. J. E. Wilton. Register file architecture optimization in a planned in 90nm using Cadence RTL compiler and SoC coarse-grained reconfigurable architecture. In Proc. 13th Annual IEEE Encounter. Using this network, 2x8 fabric of DRRA Symp. Field-Programmable Custom Computing Machines FCCM 2005, pages 35–44, 2005. shown in Figure 1 runs at a frequency of 720MHz and [8] T. Miyamori and K. Olukotun. Remarc:reconfigurable multimedia can support a peak local bandwidth of 138GB/s. array coprocessor. IEICE Transactions on Information and Systems, 82(5):389–397, November 1998. [9] M. A. Shami and A. Hemani. Morphable dpu: Smart and efficient data path for signal processing applications. In Proc. IEEE Workshop Signal VI. C ONCLUSION AND F UTURE W ORK Processing Systems SiPS 2009, pages 167–172, 2009. [10] M. A. Shami and A. Hemani. Partially reconfigurable interconnection network for dynamically reprogrammable resource array. In IEEE 8th International Conference on ASIC, pages 122–125. IEEE, Octoer 2009. An improved implementation of Hybrid-2 interconnection [11] H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. network for Dynamically Reprogrammable Resource Array Chaves Filho. Morphosys: an integrated reconfigurable system for data- has been presented. To justify the design decisions, an in- parallel and computation-intensive applications. IEEE Transactions on Computers, 49(5):465–481, 2000. terconnect exploration is done by implementing the same network using Multiplexer, NoC and Crossbar based network. Hybrid-2 network is then compared against Multiplexer, NoC, Crossbar and previously published Hybrid-1 network. Results show that newly presented network takes (1.08x, 0.104x, 0.212x and 0.681x) the area, (1x, 0.037x, 0.026x and 0.107x) the configuration bits of Multiplexer, NoC, Crossbar and Hybrid-1 Implementation. Hybrid-2 network is 2.87x and 5.86x better in terms of speed as compared to Multiplexer and Hybrid-1 networks. Hybrid-2 network also takes minimum number of cycles to configure/reconfigure the complete DRRA column. A future version of the interconnection network with ad- justable sliding window has been planned. By lowering the clock frequency, the width of the sliding window can be increased to allow mapping of more complex data paths than what is possible today. Similarly at higher clock frequencies this width can be reduced. The future version of DRRA will have voltage frequency scaling and power shut off method- ology. This may result in some parts of DRRA working in different voltage/frequency range or completely turned off. The DRRA switchboxes will be improved to handle such situations by having level shifters, or isolators.