1. Energy Aware Design Methodologies for Application
Specific NoC
Naveen Choudhary, M. S. Gaur, V. Laxmi
Virendra Singh
Department of Computer Engineering
SERC
Malaviya National Institute of Technology
Indian Institute of Science
Jaipur, India
Bangalore, India
naveenc121@yahoo.com, gaurms@mnit.ac.in,
viren@serc.iisc.ernet.in
vlaxmi@mnit.ac.in
II. IRREGULAR NOC COMMUNICATION MODEL AND
Abstract— Network-on-Chip (NoC) has emerged as a solution ARCHITECTURE
for communication framework for high-performance nanoscale In the following paragraphs, communication model,
architecture. One important aspect, in addition to deadlock-free associated NoC architecture and routing function applicable for
routing, is low power consumption. In view of varied the customized Irregular NoC are described.
communication requirements, application specific SoC design is
increasingly important. Customized NoC architectures are more
suitable for a particular application, and do not necessarily
conform to regular topologies. In this work, a methodology using
the priori knowledge of the application’s communication
characteristic for the design of customized and energy optimized
irregular NoC is proposed.
Keywords- NP-hard; NoC; Optimization; SoC; Core Graph.
I. INTRODUCTION
Network-on-Chip [1, 2, 7] has been proposed as the
solution for the on-chip communication challenges of future Figure 1. Application specific communication model in NoC
SoC architectures. Early works [2, 13] in NoC favored the use
of standard topologies such as meshes, tori, k-ary n-cubes or fat A. Communication Model
trees under the assumption that the wires can be well structured Task graphs [9, 11] are generally used to model the
in such topologies. However most application specific SoCs behavior of complex multi-core SoC applications on an
are heterogeneous with each core having different size, abstract level. The tasks Ti is mapped to a set of IP cores vj,
functionality and communication requirements. Thus, standard which communicates through unidirectional point-to-point
topologies can have a structure that poorly matches the abstract channels. The generic communication model is shown
application traffic leading to large wiring complexity after in Figure 1 and related definitions are presented as follows.
floor-planning, as well as significant energy and area overhead.
Moreover, for most SoCs the system is designed with static (or Definition 1 Core Graph is a directed graph, G (V, E) with each
semi-static) mapping of tasks to processors and hardware cores vertex νi ∈ V representing an IP core and a directed edge ei,j ∈ E,
and hence the communication traffic characteristics of the SoC representing the communication between the cores νi and νj. The
are well characterized at design time. Therefore it is expected weight of the edge ei,j denoted by bi,j , represent the desired average
bandwidth requirement of the communication from νi and νj.
that networks with irregular topology tailored to the
applications requirements to have an edge over the networks Definition 2 NoC topology graph is a directed graph N (U, F) with
with regular topology. Application specific custom topology each vertex υi ∈ U representing a node/tile in the topology and a
mapping and design have been explored in [8, 9, 10, 19]. In directed edge fi,j ∈ F represents direct communication channel
this paper, two genetic algorithm based heuristics for the between vertices υi and υj. Weight of the edge fi,j denoted by Abi,j
design of customized energy efficient irregular NoC based on represents the available link/channel bandwidth across the edge fi,j.
the applied routing function are proposed. B. Chip Layout & NoC Energy Model
Irregular NoC communication model and architecture are Floorplanning can be done using non-slicing based
defined in Section II. The proposed energy efficient design floorplannners such as B*-Trees [12]. The energy model [9] for
methodologies for customized NoC are presented in Section III. the Network-on-Chip is defined as follows:
The Genetic Algorithm (GA) used in the proposed Ebit (t i , t j ) = n hops × Erbit + (n hops − 1) × Elbit
methodologies is described in section IV. Section V presents
some experimental results followed by a brief conclusion in Where Ebit(ti, tj) is the average dynamic energy consumption
Section VI. for sending one bit of data from tile ti to tile tj, nhops is the
978-1-4244-8971-8/10$26.00 c 2010 IEEE
2. number of routers the bit traverses from tile ti to tile tj, Erbit and genetic algorithm (refer section V). The routing tables of
Elbit are the energy consumed by router and link respectively routers in the discovered shortest energy path are marked with
for transporting one bit of data. tag shortest path. Lastly the proposed methodology uses the
modified Dijkstra’s algorithm [14] according to up*/down*
C. Routing in Irregular NoC (Left Right) rule for finding escape routing paths from each
The popular routing algorithms with irregular topologies node in the shortest energy path to the corresponding
such as up*/down* routing [5], Left-Right routing [6], L-turn destination in the generated NoC and tags them as up*/down*
routing [6] use the turn model [4] to avoid deadlock condition. (Left-Right). While taking routing decision the output channels
In this paper minimal (shortest) paths are used for tagged as shortest path are selected with higher priority and
communication and up*/down* or Left-Right routing function up*/down* (Left Right) tagged channels are selected only when
is used to provide deadlock free escape paths [3] to avoid no output channel corresponding to shortest path is free.
deadlock situation in the network.
B. Shortest Path First (SPF) Methodology
III. METHODOLOGIES FOR ENERGY EFFICIENT NOC SPF is similar to MSTF methodology with the exception
GENERATION that in SPF the topology generation is initiated by first finding
the shortest energy path and later the topology is extended by
constructing the MST. As in MSTF, a genetic algorithm is used
to find the optimized energy-efficient traffic characteristics
order of the application. Since in MSTF, MST is constructed
first, it is possible that a large number of links for a number of
nodes in the topology are the links pertaining to MST. As
maximum links emanating from a node is limited to ndmax, this
phenomenon can lead to increased value of hop count in the
shortest energy paths generated later leading to increased
communication energy. However the SPF overcomes this
drawback by creating the links pertaining to shortest energy
Figure 2. Network construction using proposed methodologies
path before the links pertaining to MST. As shortest energy
In this section, two GA based methodologies: minimum- paths in the topology are generated first in SPF and so there
spanning-tree-first (MSTF) and shortest-paths-first (SPF) for can be a possibility that not enough number of free ports are
the design of customized energy efficient NoC and available to construct the MST in the topology later. In such
corresponding routing tables for deadlock free communication case a minimum number of ports per node need to be reserved
are presented. The routing function is implemented as given by before finding the shortest energy paths. However experiments
Silla et al [3] with up*/down* and Left-Right routing for escape showed that if communication requirement are uniformly
paths. For both the methodologies the floorplan information distributed in the Core Graph then such problems are rare if
and Core Graph exhibiting traffic characteristics respectively any. Algorithm 1 briefly presents the proposed methodologies.
are taken as inputs (refer Figure 2). Floorplanning can be done Algorithm 1 : Energy aware application specific NoC generator
based on Manhattan distance using a floorplannners such as
Require :
B*-Trees [12] assuming over the cell routing [17]. In both the 1. Œ = Core Graph = {E edges (i.e. traffic characteristics), V vertices}
proposed methodologies the link length is not allowed to 2. V = {vi | vi is ith IP core}
exceed the maximum permitted channel length (emax) due to 3. E = {eij : vi → vj with weight bwij | vi (source), vj (destination) • V}
constraint of physical signaling delay. Moreover constraint on 4. NoC = {T (Topology), R (Set of routing tables), S (set of shortest path)}
maximum permitted node-degree (ndmax) prevents the algorithm 5. TC_Array = {Array of traffic characteristic (i.e. ordered set of E)}
6. ndmax = Maximum permitted node degree in the topology T
from instantiating slow routers with a large number of I/O- 7. emax = The maximum permitted length of a link(channel) in topology T
channels which would decrease the achievable clock frequency 8. Manhattan Distance = ∆= {dij | dij = |vi – vj|, vi, vj • V, dij < emax }
due to internal routing and scheduling delay of the router. 9. u = node with maximum communication in Œ
Ensure : Energy Aware NoC Topology for CG
A. Minimum Spanning Tree First (MSTF) Methodology Procedure Minimum-Spanning-Tree-First()
In this methodology, first while keeping the constraints on 1. NoCEA.T = Φ; NoCEA.R = Φ; NoCEA.S = Φ;
ndmax and emax a minimum spanning tree (MST) using 2. Γ = {MST rooted at u as per ∆ and constraints ndmax & emax }
3. NoCEA.T = NoCEA.T ∪ {Γ}
Manhattan distance as a metric is generated on the nodes of the 4. (NoCEA, TC_Array) = GeniticAlgo(NoCEA,Γ)
Core Graph. The node with maximum bandwidth requirement 5. for each path si • {NoCEA.S }
is assumed as the root of the constructed MST. This MST helps o N = {set of nodes in path si}
in classifying all the channels of the topology as “up” (“Left”) o for nj • N
or “down” (“Right”). While keeping the constraints on ndmax NoCEA.R =NOCEA. R ∪ {update routing tables in NOCEA. R
and emax, the topology is further extended by laying the shortest for nodes • V in the root followed by the shortest up*/down*
energy path for each traffic characteristics. Due to constraints (Left–Right) escape path from node nj to the destination node
of path si. The routing table entry type tag is set as up*/down*
on ndmax and emax, the order in which such shortest energy paths (Lef –Right) for these nodes}
are generated basically decides the total communication energy o Endfor
requirement of the generated topology. The optimized order of 7. endfor
traffic characteristics of the application is found using a Endprocedure
Procedure Shortest-Paths-First( )
3. 1. NoCEA.T = Φ; NoCEA.R = Φ; NoCEA.S = Φ; Γ = Φ ; A. SPF and MSTF with Random Benchmarks
2. (NoCEA, TC_Array) = GeniticAlgo(NoCEA,Γ)
3. Γ = { MST rooted at u as per ∆ and constraints ndmax & emax } Performance of the proposed SPF and MSTF methodology
4. NoCEA.T = NoCEA.T ∪ {Γ} were compared on the IrNIRGAM with varying packet
5. for each path si • {NoCEA.S } injection interval. Figure 3 shows performance results averaged
o N = {set of nodes in path si} over 50 generated energy efficient irregular topologies
o for nj • N generated based on up*/down* routing function. Constraints of
NoCEA.R =NOCEA. R ∪ {update routing tables in NOCEA. R ndmax = 4 and emax as 1.5 times the maximum length of the
for nodes • V in the root followed by the shortest up*/down*
core/node among all the cores in the NoC were observed. For
(Left–Right) escape path from node nj to the destination node
of path si. The routing table entry type tag is set as up*/down* the SPF, total dynamic communication energy consumption
(Lef –Right) for these nodes} was on average 18.5% lesser in comparison to MSTF
o endfor Methodology. Moreover reduction in latency ( in the range of
8. endfor 7.5 clocks to 10 clocks) was observed for comparatively
Endprocedure similar throughput.
IV. GENETIC ALGORITHM
A genetic algorithm [15] based heuristic is used to find the
best order of the traffic characteristics to generate the shortest
energy paths in topology such that the communication energy
requirement of the application is optimized. In the proposed
genetic algorithm formulation each chromosome is represented
as an array of genes with each gene representing a traffic
characteristic for the application. 500 chromosomes are taken
in the initial population and crossover and mutation are done
on 50% and 40% of the population in each generation. (a)
Crossover is achieved by intermixing of the traffic
characteristics of two chromosomes whereas mutation is
performed by randomly changing the order of traffic
characteristic in a chromosome. Fitness of chromosome is
regarded as high if its cost approaches 0. The fitness function
used is as follows.
Cost = Eci / X
Where X is maximum chromosome energy requirement,
Eci is the energy requirement for chromosome ci. It may be
noted that, the best 10% chromosomes (referred as Best Class) (b)
in any generation are directly transferred to the next generation, Figure 3. Performance comparison with varying packet injection interval of
so as not to degrade the solution between the generations. (a) dynamic communication energy consumption (in pico joules) and (b)
Average flit latency (in clock cycles) of the proposed MSTF and SPF
V. EXPERIMENTAL RESULTS methodology averaged over 50 generated energy efficient irregular topologies
with number of cores varying from 16 to 81
Multiple Core Graphs using TGFF [11] were randomly
generated with diverse bandwidth requirement of the IP Cores.
B. SPF and Regular NoC with Random Benchmarks
Moreover a NoC simulator IrNIRGAM, extended version of
NIRGAM [16] supporting irregular topology with the provision Figure 4 shows the performance comparison SPF with 2D-
of supporting escape path routing for avoiding deadlock Mesh for equivalent sized tile and according to the application's
condition was deployed for performance evaluation. traffic characteristics requirement. ndmax = 4 and emax was
IrNIRGAM was run for 10000 clock cycles with applied packet taken as 2 times the length of the core/node. The SPF with
injection interval to evaluate the network performance with up*/down* (Left-Right) routing shows reduced average flit
varying traffic load. The router energy consumption is latency in the range of 10 (9.4) clocks to 20.9 (18.4) clocks and
evaluated using the power simulator orion [18] for 0.18µm 13.8 (13.2) clocks to 76 (69) clocks and reduction in average
technology. Similarly the dynamic bit energy consumption for per flit communication energy in the range of 18.8 (18.5%) to
inter-node links (Elbit) can be calculated using the equation: 29.2 (25.8%) and 25.2 (24.6%) to 54.7 (53%) in comparison to
2D-Mesh with XY and OE routing respectively for up*/down*
Elbit = (1 / 2) × α × C phy × VDD
2
(Left-Right) routing. In most cases SPF with up*/down*
routing was found to perform better.
Where α = average probability of a 1 to 0 or 0 to 1
transition between two successive samples in the stream for a C. SPF and Regular NoC with Intelligent Mapping
specific bit, α = 0.5 assuming data stream to be purely random, The proposed SPF methodology was compared with the
Cphy = physical capacitance of inter-node wire and VDD is the intelligent energy aware mapping technique proposed in [9] for
supply voltage. equivalent tile sizes and application to core mapping. Figure 5
shows reduction in average flit latency in the range of 1.7
4. clocks to 5 clocks and 7.5 clocks to 20.4 clocks and reduction for deadlock prevention, the presented methodologies can be
in average per flit communication energy in the range of 1.6% adapted with any topology agnostic routing algorithms where
to 10.9% and 17% to 37% for SPF methodology for equivalent generic routing rules based on turn prohibition can be laid. It
throughput in comparison to the 2D-Mesh with XY and OE is believed that the combined treatment of the routing and
routing respectively. topology generation offers a huge potential of optimization for
future application-specific NoC architectures.
REFERENCES
[1] W. J. Dally, B.Towles,,“Route Packets, Not Wires: On-Chip
Interconnection Networks,” in IEEE Proceedings of the 38th Design
Automation Conference (DAC), pp. 684–689, 2001.
[2] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J.
Oberg, K. Tiensyrja, and A. Hemani, “A Network on Chip Architecture
and Design Methodology”, In Proceedings of VLSI Annual Symposium
(a) Average per flit latency and throughput (in flits) (ISVLSI 2002), pp. 105–112, 2002.
[3] F. Silla, J. Duato, “ High-Performance Routing in Networks of
Workstations with Irregular Topology,” in IEEE Transactions on
Parallel and Distributed Systems, vol. 11, pp. 699-719, july 2000.
[4] C. Glass, L. Ni, “The Turn Model for Adaptive Routing”. In Proceeding
of 19th International Symposium on Computer Architecture. pp. 278–
287, May 1992.
[5] M. D. Schroeder et al., “Autonet: A High-Speed Self-Configuring Local
Area Network Using Point-to-Point Links”. Journal on Selected Areas
in Communications, 9, 1991.
[6] A. Jouraku, A. Funahashi, H. Amano, M. Koibuchi, “L-turn routing: An
(b) Average communication energy per flit Adaptive Routing in Irregular Networks”. In Proceeding of the
Figure 4. SPF (up*/down* & Left-Right routing) performance comparison International Conference on Parallel Processing, pp. 374-383, Sep.
with 2D-Mesh (XY & OE routing) averaged over 50 generated energy 2001.
efficient irregular topologies with number of cores varying from 16 to 81 (a) [7] U. Ogras, J. Hu, R. Marculescu, “Key research problems in NoC design:
Average flit latency (in clock cycles) and (b) Average communication energy a holistic perspective”. In IEEE CODES+ISSS, pp. 69-74, 2005.
consumption per flit (in pico joules) [8] S. Murali, G. De Micheli, “SUNMAP: A Tool for Automatic Topology
Selection and Generation for NoCs”. In Proceeding of DAC, 2004.
[9] J. Hu, R. Marculescu, “Energy-Aware Mapping for Tile-based NOC
Architectures Under Performance Constraints”. In ASP-DAC 2003, Jan
2003.
[10] J. Hu, R. Marculescu, “Energy- and performance-aware mapping for
regular NoC architectures”. In IEEE Trans. on CAD of Integrated
Circuits and Systems, 24(4), April 2005.
[11] R. P. Dick, D. L. Rhodes, W. Wolf, “TGFF: task graphs for free”. In
Proceeding of the International Workshop on Hardware/Software
Codesign, March 1998.
[12] Y. C. Chang, Y. W. Chang, G. M. Wu, S. W. Wu, “B*-Trees : A New
(a) Average per flit latency and Throughput (in flits) Representation for Non-Slicing Floorplans”. In Proceeding of 37th
Design Automation Conference, pp. 458-463, 2000.
[13] L. Natvig, “High-level Architectural Simulation of the Torus Routing
Chip”. In Proceedings of the International Verilog HDL Conference,
California, pp. 48–55, Mar. 1997.
[14] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms, Prentice
Hall International, 1990.
[15] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing,
Springer-Verlag, Berlin, Heidelberg, 2003.
[16] L. Jain, B. M. Al-Hashimi, M. S. Gaur, V. Laxmi, A. Narayanan,
(b) Average communication energy per flit “NIRGAM: A Simulator for NoC Interconnect Routing and Application
Figure 5. SPF and 2D-Mesh performance comparison for intelligent Modelling”. DATE 2007, 2007.
application to Core mapping averaged over 50 generated energy efficient [17] K. Srinivasan, K. S. Chatha, “Layout Aware Design of Mesh based NoC
irregular topologies with number of cores varying from 16 to 81 (a) Average Architectures”. In Proceedings of 4th International Conference on
flit latency (in clock cycles) and (b) Average communication energy Hardware Software Codesign and System Synthesis, Seoul, Korea, pp.
consumption per flit (in pico joules) 136-141, 2006.
[18] H-S Wang et al., “Orion: A Power-Performance Simulator for
VI. CONCLUSION AND FUTURE WORK Interconnection Network,” in Proc. International Symposium on
Microarchitecture, Nov 2002.
[19] K. Srinivasan et al., “An Automated Technique for Topology and Route
In this paper, the energy efficient customized Irregular Generation of Application Specific On-Chip Interconnection Networks,”
topology generation problem for NoC was addressed. in Proc. ICCAD 2005.
Although in this paper for the proposed methodologies,
up*/down* and Left-Right routing were used as escape path