SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O)
Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 1
Coarse Grain Reconfigurable Floating Point Unit
K.Moorthi *
Dr.J.Raja B.Ramya
ECE & Adhiparasakthi Engg. College ECE & Adhiparasakthi Engg. College ECE & Aksheya College of Engg.
Abstract— Today the most commonly used system architectures in data processing can be divided into three
categories, general purpose processors, application specific architectures and reconfigurable architectures.
Application specific architectures are efficient and give good performance, but are inflexible. Recently reconfigurable
systems have drawn increasing attention due to their combination of flexibility and efficiency. Re-configurable
architectures limit their flexibility to a particular algorithm. This paper introduces approaches to mapping point
arithmetic. After presenting an optimal formulation using applications onto CGRAs supporting both integer and
floating point unit.High-level design entry tools are essential for reconfigurable systems, especially coarse-grained
reconfigurable architectures.Coarse-grained reconfigurable architectures have drawn increasing attention due to
their performance and flexibility. However, their applications have been restricted to domains based on integer
arithmetic since typical CGRAs support only integer arithmetic or logical operations. In this project we introduce an
approach to map applications onto CGRAs supporting floating point addition. The increase in requirements for more
flexibility and higher performance in embedded systems design, reconfigurable computing is becoming more and
more popular.
Keywords— High-level synthesis, CGRA, FPGA, Floating point unit , Reconfigurable architecture.
I. INTRODUCTION
Various coarse-grained reconfigurable architectures (CGRAs) have been proposed in recent years [1]–[3], with
different target domains of applications and different tradeoffs between flexibility and performance. Typically, they
consist is not easy to map an application onto the reconfigurable array host mostly reduced instruction set computing
(RISC)] processor. The computation intensive kernels of the applications—typically loop—are mapped to the
reconfigurable array while the remaining code is executed by the processor. However, it of a reconfigurable array of
processing elements (PEs) and abecause of the high complexity of the problem that requires compiling the application on
a dynamically reconfigurable parallel architecture, with additional complexity of dealing with complex routing resources.
The problem of mapping an application onto a CGRA to minimize the number of resources giving best performance has
been shown to be NP-complete [16].Few automatic mapping/compiling/synthesis tools have been developed to exploit
the parallelism found in the applications and extensive computation resources of CGRAs. Some researchers [1], [4] have
used structure-based or graphical user interface based design tools to manually generate a mapping, which would have a
difficulty in handling big designs. Some researchers [5], [6] have only focused on instruction-level parallelism, failing to
fully utilize the resources in CGRAs, which is possible by exploiting loop-level parallelism. Some researchers [7], [8],
[17] have introduced a compiler to exploit the parallelism in the CGRA provided by the abundant resources. However,
their approaches use shared registers to solve the mapping problem. While these shared registers explicitly during the
mapping process. registers can be eliminated if routing resources are considered simplify the mapping process, theycan
increase the critical path delay or latency. We show in this paper that the shared More recently, routing-aware mapping
algorithms have been introduced [9]–[11]. However, they rely on step-by-step approaches, where scheduling, binding,
and routing algorithms are performed sequentially. Thus, it tends to fall into a local integer type application domains,
whereas ours extends the routing at the same time, thereby generating better optimized solutions. It also explicitly
considers incorporating Steiner points for more efficient routing (some previous incorporate Steiner points although it is
not clear whether they approaches use a kind of maze routing algorithm that can are actually incorporated). In [7], they
have also presented a unified approach based on the simulated annealing algorithm. However, it takes too much time to
get a solution. few minutes. Furthermore, all previous mapping/compiling/synthesis tools have been restricted to
coverage to floating pointtype application domains.the approach in [7] takes hundreds of minutes whereas our optimum.
Our unified approach considers scheduling and binding.
II. COARSE GRAIN RECONFIGURABLE ARCHITECTURE
CGRAs consist of an array of a large number of function units (FUs) interconnected by a mesh style network. Register
files are distributed throughout the CGRAs to hold temporary values and are accessible only by a subset of FUs. The
Fig.1. shows the coarse grain architecture, the FUs can execute common word-level operations, including addition,
subtraction, and multiplication. In contrast to FPGAs, CGRAs have short reconfiguration times, low delay characteristics,
and low power consumption as they are constructed from standard cell implementations. Thus, gate-level re-
configurability is sacrificed, but the result is a large increase in hardware efficiency.
International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O)
Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 2
Fig.1. Reconfigurable Hardware
A good compiler is essential for exploiting the abundance of computing resources available on a GRA. However,
sparse connectivity and distributed register files present difficult challenges to the scheduling phase of a compiler. The
sparse connectivity puts the burden of routing operands from producers to consumers on the compiler. The Fig 2(a)
shows the mesh connected processing elements the traditional schedulers that just assign an FU and time to every
operation in the program are unsuitable because they do not take routability into consideration. Operand values must be
explicitly routed between producing and consuming FUs. Further, dedicated routing resources are not provided. Rather,
an FU can serve either as a compute resource or as a routing resource at a given time. A compiler scheduler must thus
manage the computation and flow of operands across the array to effectively map applications onto CGRAs. Innermost
loop of an application is selected to be executed on CGRA. With the abundance of computation resources, CGRA can
efficiently execute compute intensive part of the application. Scheduling difficulties lie in the limited connectivity of
CGRA. Without careful scheduling, some values can be stuck in nodes where they cannot be routed any further.
Preprocessing is performed on dataflow graph of the target application. It includes identifying recurrence cycles, SSA
conversion, and height calculation. Operations are then sorted by their heights and scheduling algorithm is applied to the
group of operations with the same height. To guarantee the routing of values over the CGRA interconnect, scheduling
space is skewed in a way that slots on the left side of CGRA are utilized first. Since slots on the left are utilized
beforehand, operands can be easily routed to the right side of CGRA. An affinity graph which is constructed for each
group of operations that have same height in DFG. The idea here is that placing operations with common consumers
close to each other. By placing operations with common consumers close to each other, routing cost can be reduced when
their consumers are placed later. Affinity values are calculated for every pair of operations in the same height by looking
at how far their common consumers are located in DFG. In an affinity graph, nodes represent operations and edges
represent affinity values. High affinity values between two operations indicate that they have common consumers in a
short distance in DFG. In this step, operations are placed on CGRA in a way that affinity values between operations are
minimized. Scheduling of operations is converted into a graph embedding problem where an affinity graph which is used
in 3-D scheduling space minimizing total edge length. In the example on the left, nodes are placed in the scheduling
space so that total edge length is minimized, giving more weight to edges with high affinity values. When scheduling
operations in the next height, routing cost can be effectively reduced since producers of the same consumer are already
placed close to each other. After getting an optimal placement of operations in one height, the scheduling space is
updated the unused slot can be used in later time slots. Then, scheduler proceeds to next group of operations.
A. Architecture Extension for Floating-Point Operations
A pair of PEs in the PE array is able to compute floating-point operations according to its configuration [21]. While a
PE in the pair computes the mantissa part of the floating point operations, the other PE handles the exponent part. Since
the data path of a PE is 16 bits wide, the floating-point operations do not support the single precision IEEE-754 standard,
but support only reduced mantissa of 15 bits. However, experiments show that the precision is good enough for hand-
held embedded systems [21]. Since adjacent PEs are paired for floating-point operations [Fig. 2(c)], the total number of
floating-point units that can but PE array as shown in Fig. 2(a) and (b). The PE array has enough interconnections among
the PEs so that it makes use of the interconnections for the data exchange between the PEs required in floating-point
operations.Mapping a floating-point operation on the PE array with integer operations may take many layers of cache. If
a kernel consists of a multitude of floating-point operations, then causing costly fetch of additional context words from
the main mapping on the array easily runs out of the cache layers, memory. Instead of using multiple cache layers to
International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O)
Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 3
perform such a complex operation, we add some control logic to the PEs so that the operation can be completed in
multiple cycles cache logic control.
The layers multiple requiring without be constructed is half the number of integer PEs in the can be implemented
with a small finite state machine (FSM) that controls the PE’s existing datapath for a fixed number of cycles. B.
Architecture Extension for Floating-Point Operations. A pair of PEs in the PE array is able to compute floating-point
operations according to its configuration. While a PE in the pair computes the mantissa part of the floating point
operations, the other PE handles the exponent part. Since the data path of a PE is 16 bits wide, the floating-point
operations do not support the single precision IEEE- To make the RTL implementation flexible, the VHDL generic
operator was used in order to control implemented features such as supported operations, number of pipeline registers,
word lengths, etc. A cross-bar interconnect provides a connection from every processing elements output to every other
processing element input. Partial interconnects, such as nearest neighbour, reduce the interconnect cost. The slice stage is
used to split a word operand into sub-word operands and do sign extension to the operational word length. This is used to
support wide accesses to memory and split operands to different processing elements which do operations in parallel. In
the experimental studies, memories are 32 bits wide and slicing to 16 bits is supported also mapping.
Fig.2 Mesh architecture
B. Processing Elements
To make the RTL implementation flexible, the VHDL generic operator was used in order to control implemented
features such as supported operations, number of pipeline registers, word lengths, etc. Hence, instantiated PEs can be
made a heterogeneous set by configuring these parameters prior to synthesis shows an example of the processing
element. The stages marked at the top of the figure are controlled through a set of software controlled 32-bit wide
configuration registers. These registers are written by software in order to route the datapath and configure the
ALUoperation. Only PEs used in the computation need to be configured. What follows is a brief summary of the pipeline
stages of the developed processing elements. The interconnect stage is used to select source operands from other
processing elements, memory system, or FIFO. It is implemented with multiplexers and prior to synthesis, a setup and
speed as compared to fine-grained functional blocks. This is because DSP computations in general are characterized by
arithmetic operations on multi-bitwords of data. Hence, fine-grained logic blocks, such as 4-input look-up tables, give a
large overhead when implementing word-level arithmetic To target reconfigurable architectures for the DSP domain,
coarse-grained functional blocks have been proposed and proven more efficient in terms of area. The processing elements
developed for Cpac are customized ALUs that support a set of different operations. In addition, post- and preprocessing
steps such as data slicing, scaling, and truncation is included as these operations are commonly used in fixed-point data
paths.
The operation stage supports a set of fundamental arithmetic and logic operations. In particular, basic operators used
in standard codecs from ITU were studied in order to find suitable arithmetic operations. The basic operators are
currently used in the standards for G723.1, G729, GSM enhanced full-rate, and adaptive multi-rate speech codec and use
16 and 32 bits numbers. These operators are sufficient to build computational structures such as radix-2 FFT butterfly,
complex multiplication, dual and quad multiply and accumulate, accumulated mean squared error, sum of absolute
International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O)
Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 4
differences, and basic vector operations. The shift stage is used to scale the result after multiplication, addition, or
subtraction to avoid overflow. As many processing elements are envisioned to use shifting by a fixed set of numbers,
implementation of the shifter can be selected either as partial shifter, or a barrel shifter that supports all shift
operations.The saturation stage is used to saturate the result from the shifter (R4) or from the operation (R3). Saturation
values are software controlled through the registers min.
Fig 3. Data path resource.
The floating point input and output was discussed. The floating point conversion using IEEE floating point
conversion is used. The Xilinx 9.1 and modelsim 6.1e student edition has been used for the simulation.The floating point
conversion is used to convert from hexadecimal number to floating point number. The input is shown in the Fig 3.1
which is a single input floating point number.The floating point number is converted to hexadecimal number for the input
to the floating point adder by the use of IEEE floating point conversion software. The 64 bit floating point output
which is to be converted to a floating point number by the use of IEEE application software. The floating point
conversion unit round off the value, then it will manipulate the hexadecimal value. The hexadecimal value is applied in
the coding and the output will be in the hexadecimal. The output is then converted to floating point number by the IEEE
floating point conversion unit.
Fig 4. Floating point output
III. CONCLUSIONS
In this paper, we presented a slow but optimal approach and fast heuristic approaches to mapping of applications
from multiple domains onto a CGRA supporting floating since it gives better result than spanning tree floating point
operations. In particular, we considered Steiner routing. After presenting an ILP formulation for an optimal solution, we
presented a fast heuristic approach based on HLS techniques that performs loop unrolling and pipelining which achieves
drastic performance improvement. For ran-heuristic approach considering Steiner points finds 97% of domly generated
test examples, we showed that the proposed the optimal solutions within a few seconds.
International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O)
Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 5
Experiments on only implementation. shown that our approach targeting a CGRA achieves up to119.9 times
performance improvement compared to software various benchmark examples and 3-D graphics.
REFERENCES
[1]M. C. Filho, “Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive
applications,” IEEE Trans. Comput.,vol. 49, no. 5, pp. 465–481, May 2000.
[2] PACT XPP Technologies [Online]. Available: http://www.pactxpp.com
[3] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, “ADRES: An architecture with tightly coupled
VLIW processor and coarse-grained reconfigurable matrix,” in Proc. FPLA, 2003, pp. 61–70.
[4] Chameleon Systems, Inc. [Online]. Available: http://www.chameleon- “DRESC: A retargetable compiler for coarse-
grained reconfigurable architectures,” in Proc. ICFPT, Dec. 2002, pp. 166–173.
[5] T. J. Callahan and J. Wawrzynek, “Instruction-level parallelism for systems.com [1] H. Singh, M.-H. Lee, G. Lu, F. J.
Kurdahi, N. Bagherzadeh, and E.
[6] W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S.reconfigurable computing,” in Proc. IWFPL,
1998, pp. 248–257.
[7] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins,P. Amarasinghe, “Space-time scheduling of
instruction level parallelismon a RAW machine,” in Proc. ASPLOSV, 1998, pp. 46–57.
[8] T. Toi, N. Nakamura, Y. Kato, T. Awashima, K. Wakabayashi, and L. Jing, “High-level synthesis challenges and
solutions for a dynamicallyreconfigurable processor,” in Proc. ICCAD, Nov. 2006, pp. 702–708.
[9] H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H. Kim, “Edgecentric modulo scheduling for coarse-grained
reconfigurable architec-tures,” in Proc. PACT, Oct. 2008, pp. 166–176.
[10] S. Friedman, A. Carroll, B. V. Essen, B. Ylvisaker, C. Ebeling, and S. Hauck, “SPR: An architecture-adaptive
CGRA mapping tool,” in Proc.FPGA, 2009, pp. 191–200.based spatial mapping algorithm for coarse-grained
reconfigurable
[11] J. Yoon, A. Shrivastava, S. Park, M. Ahn, and Y. Paek, “A graph draw-architecture,” IEEE Trans. Very Large Scale
Integr. Syst., vol. 17, no.11, pp. 1565–1578, Jun. 2008.
[12] Y. Ahn, K. Han, G. Lee, H. Song, J. Yoo, and K. Choi, “SoCDAL:System-on-chip design accelerator,” ACM Trans.
Des. Automat. Electron.Syst., vol. 13, no. 1, pp. 171–176, Jan. 2008.
[13] Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi, “Resource sharing specific optimization,” in Proc. DATE, Mar.
2005, pp. 12–17. and pipelining in coarse-grained reconfigurable architecture for domain.
[14] Y. Kim, I. Park, K. Choi, and Y. Paek, “Power-conscious configurationcache structure and code mapping for coarse-
grained reconfigurable architecture,” in Proc. ISLPED, Oct. 2006, pp. 310–315.
[15] The SUIF Compiler System [Online]. Available: http://suif.stanford.edu
[16] C. O. Shields, Jr., “Area efficient layouts of binary trees in grids,” Ph.D.
[17] G. Lee, S. Lee, and K. Choi, “Automatic mapping of application to dissertation, Dept. Comput. Sci., Univ. Texas,
Dallas, 2001. techniques,” in Proc. ISOCC, Nov. 2008, pp. 395–398. coarse-grained reconfigurable architecture based
on high-level synthesis
[18] K. Han and J. Kim, “Quantum-inspired evolutionary algorithms with a new termination criterion, Hε gate, and two
phase scheme,” IEEE Trans. Evol. Computat., vol. 8, no. 2, pp. 156–169, Apr. 2004.
[19] GNU Linear Programming Kit [Online]. Available: http://www.gnu.org/
[20] G. Lee, S. Lee, K. Choi, and N. Dutt, “Routing-aware application mapping considering Steiner point for coarse-
grained reconfigurable algorithms,” in Proc. FPL, 2005, pp. 317–322.
[21] M. Jo, V. K. P. Arava, H. Yang, and K. Choi, “Implementation of floating-point operations for 3-D graphics on a
coarse-grained reconfigurable architecture,” in Proc. IEEE-SOCC, Sep. 2007, pp. 127–130.
[22] Z. Baidas, A. D. Brown, and A. C. Williams, “Floating-point behavioral synthesis,” IEEE Trans. Comput.-Aided
Des. Integr. Circuits Syst., vol.20, no. 7, pp. 828–839, Jul. 2001.

Mais conteúdo relacionado

Mais procurados

A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
iaemedu
 
Power consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learningPower consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learning
IJECEIAES
 

Mais procurados (19)

A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
 
Bulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networksBulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networks
 
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
IRJET-  	  A Statistical Approach Towards Energy Saving in Cloud ComputingIRJET-  	  A Statistical Approach Towards Energy Saving in Cloud Computing
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSING
A HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSINGA HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSING
A HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSING
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
 
J41046368
J41046368J41046368
J41046368
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
 
Data mining
Data miningData mining
Data mining
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix Multiplication
 
E132833
E132833E132833
E132833
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
Power consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learningPower consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learning
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
IJSRED-V2I5P45
IJSRED-V2I5P45IJSRED-V2I5P45
IJSRED-V2I5P45
 

Destaque

Destaque (12)

Modernisation of Avionics and C.N.S. Facilities
Modernisation of Avionics and C.N.S. FacilitiesModernisation of Avionics and C.N.S. Facilities
Modernisation of Avionics and C.N.S. Facilities
 
Eurotec
EurotecEurotec
Eurotec
 
куаныш аманжолов+спорт товары+идея
куаныш аманжолов+спорт товары+идеякуаныш аманжолов+спорт товары+идея
куаныш аманжолов+спорт товары+идея
 
Brace Yourself
Brace YourselfBrace Yourself
Brace Yourself
 
The Analysis of Marketing Personnel Performance Appraisal Problems of Small a...
The Analysis of Marketing Personnel Performance Appraisal Problems of Small a...The Analysis of Marketing Personnel Performance Appraisal Problems of Small a...
The Analysis of Marketing Personnel Performance Appraisal Problems of Small a...
 
A Pulse on the Ecosystem: Where & Why I'd Invest
A Pulse on the Ecosystem:  Where & Why I'd InvestA Pulse on the Ecosystem:  Where & Why I'd Invest
A Pulse on the Ecosystem: Where & Why I'd Invest
 
LinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment RecommendationLinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment Recommendation
 
Investor readiness: 99 questions from investors by Startups.be
Investor readiness: 99 questions from investors by Startups.beInvestor readiness: 99 questions from investors by Startups.be
Investor readiness: 99 questions from investors by Startups.be
 
Investor readiness: Startup valuation by Startups.be
Investor readiness: Startup valuation by Startups.beInvestor readiness: Startup valuation by Startups.be
Investor readiness: Startup valuation by Startups.be
 
Study of Effect of Rotor Speed, Combing-Roll Speed and Type of Recycled Waste...
Study of Effect of Rotor Speed, Combing-Roll Speed and Type of Recycled Waste...Study of Effect of Rotor Speed, Combing-Roll Speed and Type of Recycled Waste...
Study of Effect of Rotor Speed, Combing-Roll Speed and Type of Recycled Waste...
 
DKUNKEL RESUME
DKUNKEL RESUMEDKUNKEL RESUME
DKUNKEL RESUME
 
The Analysis of the Distribution Mode of Rookie Network
The Analysis of the Distribution Mode of Rookie NetworkThe Analysis of the Distribution Mode of Rookie Network
The Analysis of the Distribution Mode of Rookie Network
 

Semelhante a Coarse Grain Reconfigurable Floating Point Unit

OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
ijgca
 
Optimized Resource Provisioning Method for Computational Grid
Optimized Resource Provisioning Method for Computational GridOptimized Resource Provisioning Method for Computational Grid
Optimized Resource Provisioning Method for Computational Grid
ijgca
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applications
ijcsit
 

Semelhante a Coarse Grain Reconfigurable Floating Point Unit (20)

Programming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersProgramming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi Clusters
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
 
Distributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisDistributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data Analysis
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
 
OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
OPTIMIZED RESOURCE PROVISIONING METHOD FOR COMPUTATIONAL GRID
 
Optimized Resource Provisioning Method for Computational Grid
Optimized Resource Provisioning Method for Computational GridOptimized Resource Provisioning Method for Computational Grid
Optimized Resource Provisioning Method for Computational Grid
 
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
Ieeepro techno solutions   2014 ieee java project - deadline based resource p...Ieeepro techno solutions   2014 ieee java project - deadline based resource p...
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
 
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
 
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 
High Dimensionality Structures Selection for Efficient Economic Big data usin...
High Dimensionality Structures Selection for Efficient Economic Big data usin...High Dimensionality Structures Selection for Efficient Economic Big data usin...
High Dimensionality Structures Selection for Efficient Economic Big data usin...
 
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applications
 
Evolutionary Multi-Goal Workflow Progress in Shade
Evolutionary  Multi-Goal Workflow Progress in ShadeEvolutionary  Multi-Goal Workflow Progress in Shade
Evolutionary Multi-Goal Workflow Progress in Shade
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 

Último

Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
instagramfab782445
 
ab-initio-training basics and architecture
ab-initio-training basics and architectureab-initio-training basics and architecture
ab-initio-training basics and architecture
saipriyacoool
 
Simple Conference Style Presentation by Slidesgo.pptx
Simple Conference Style Presentation by Slidesgo.pptxSimple Conference Style Presentation by Slidesgo.pptx
Simple Conference Style Presentation by Slidesgo.pptx
balqisyamutia
 
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
drmarathore
 
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
ZurliaSoop
 
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
yhavx
 
Resume all my skills and educations and achievement
Resume all my skills and educations and  achievement Resume all my skills and educations and  achievement
Resume all my skills and educations and achievement
210303105569
 
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Minimalist Orange Portfolio by Slidesgo.pptx
Minimalist Orange Portfolio by Slidesgo.pptxMinimalist Orange Portfolio by Slidesgo.pptx
Minimalist Orange Portfolio by Slidesgo.pptx
balqisyamutia
 
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
eeanqy
 
Design-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora AgencyDesign-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora Agency
Isadora Agency
 
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
nirzagarg
 
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
eqaqen
 

Último (20)

Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
 
Hackathon evaluation template_latest_uploadpdf
Hackathon evaluation template_latest_uploadpdfHackathon evaluation template_latest_uploadpdf
Hackathon evaluation template_latest_uploadpdf
 
ab-initio-training basics and architecture
ab-initio-training basics and architectureab-initio-training basics and architecture
ab-initio-training basics and architecture
 
Pondicherry Escorts Service Girl ^ 9332606886, WhatsApp Anytime Pondicherry
Pondicherry Escorts Service Girl ^ 9332606886, WhatsApp Anytime PondicherryPondicherry Escorts Service Girl ^ 9332606886, WhatsApp Anytime Pondicherry
Pondicherry Escorts Service Girl ^ 9332606886, WhatsApp Anytime Pondicherry
 
Simple Conference Style Presentation by Slidesgo.pptx
Simple Conference Style Presentation by Slidesgo.pptxSimple Conference Style Presentation by Slidesgo.pptx
Simple Conference Style Presentation by Slidesgo.pptx
 
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
 
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
Jual Obat Aborsi Bandung ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan ...
 
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证原件一模一样
 
Resume all my skills and educations and achievement
Resume all my skills and educations and  achievement Resume all my skills and educations and  achievement
Resume all my skills and educations and achievement
 
High Profile Escorts Nerul WhatsApp +91-9930687706, Best Service
High Profile Escorts Nerul WhatsApp +91-9930687706, Best ServiceHigh Profile Escorts Nerul WhatsApp +91-9930687706, Best Service
High Profile Escorts Nerul WhatsApp +91-9930687706, Best Service
 
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In eluru [ 7014168258 ] Call Me For Genuine Models We ...
 
Minimalist Orange Portfolio by Slidesgo.pptx
Minimalist Orange Portfolio by Slidesgo.pptxMinimalist Orange Portfolio by Slidesgo.pptx
Minimalist Orange Portfolio by Slidesgo.pptx
 
Eye-Catching Web Design Crafting User Interfaces .docx
Eye-Catching Web Design Crafting User Interfaces .docxEye-Catching Web Design Crafting User Interfaces .docx
Eye-Catching Web Design Crafting User Interfaces .docx
 
Gamestore case study UI UX by Amgad Ibrahim
Gamestore case study UI UX by Amgad IbrahimGamestore case study UI UX by Amgad Ibrahim
Gamestore case study UI UX by Amgad Ibrahim
 
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
怎样办理巴斯大学毕业证(Bath毕业证书)成绩单留信认证
 
Design-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora AgencyDesign-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora Agency
 
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Sonipat [ 7014168258 ] Call Me For Genuine Models W...
 
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdfJordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
 
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
一比一定(购)西悉尼大学毕业证(WSU毕业证)成绩单学位证
 
Call Girls Jalaun Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Jalaun Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Jalaun Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Jalaun Just Call 8617370543 Top Class Call Girl Service Available
 

Coarse Grain Reconfigurable Floating Point Unit

  • 1. International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O) Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 1 Coarse Grain Reconfigurable Floating Point Unit K.Moorthi * Dr.J.Raja B.Ramya ECE & Adhiparasakthi Engg. College ECE & Adhiparasakthi Engg. College ECE & Aksheya College of Engg. Abstract— Today the most commonly used system architectures in data processing can be divided into three categories, general purpose processors, application specific architectures and reconfigurable architectures. Application specific architectures are efficient and give good performance, but are inflexible. Recently reconfigurable systems have drawn increasing attention due to their combination of flexibility and efficiency. Re-configurable architectures limit their flexibility to a particular algorithm. This paper introduces approaches to mapping point arithmetic. After presenting an optimal formulation using applications onto CGRAs supporting both integer and floating point unit.High-level design entry tools are essential for reconfigurable systems, especially coarse-grained reconfigurable architectures.Coarse-grained reconfigurable architectures have drawn increasing attention due to their performance and flexibility. However, their applications have been restricted to domains based on integer arithmetic since typical CGRAs support only integer arithmetic or logical operations. In this project we introduce an approach to map applications onto CGRAs supporting floating point addition. The increase in requirements for more flexibility and higher performance in embedded systems design, reconfigurable computing is becoming more and more popular. Keywords— High-level synthesis, CGRA, FPGA, Floating point unit , Reconfigurable architecture. I. INTRODUCTION Various coarse-grained reconfigurable architectures (CGRAs) have been proposed in recent years [1]–[3], with different target domains of applications and different tradeoffs between flexibility and performance. Typically, they consist is not easy to map an application onto the reconfigurable array host mostly reduced instruction set computing (RISC)] processor. The computation intensive kernels of the applications—typically loop—are mapped to the reconfigurable array while the remaining code is executed by the processor. However, it of a reconfigurable array of processing elements (PEs) and abecause of the high complexity of the problem that requires compiling the application on a dynamically reconfigurable parallel architecture, with additional complexity of dealing with complex routing resources. The problem of mapping an application onto a CGRA to minimize the number of resources giving best performance has been shown to be NP-complete [16].Few automatic mapping/compiling/synthesis tools have been developed to exploit the parallelism found in the applications and extensive computation resources of CGRAs. Some researchers [1], [4] have used structure-based or graphical user interface based design tools to manually generate a mapping, which would have a difficulty in handling big designs. Some researchers [5], [6] have only focused on instruction-level parallelism, failing to fully utilize the resources in CGRAs, which is possible by exploiting loop-level parallelism. Some researchers [7], [8], [17] have introduced a compiler to exploit the parallelism in the CGRA provided by the abundant resources. However, their approaches use shared registers to solve the mapping problem. While these shared registers explicitly during the mapping process. registers can be eliminated if routing resources are considered simplify the mapping process, theycan increase the critical path delay or latency. We show in this paper that the shared More recently, routing-aware mapping algorithms have been introduced [9]–[11]. However, they rely on step-by-step approaches, where scheduling, binding, and routing algorithms are performed sequentially. Thus, it tends to fall into a local integer type application domains, whereas ours extends the routing at the same time, thereby generating better optimized solutions. It also explicitly considers incorporating Steiner points for more efficient routing (some previous incorporate Steiner points although it is not clear whether they approaches use a kind of maze routing algorithm that can are actually incorporated). In [7], they have also presented a unified approach based on the simulated annealing algorithm. However, it takes too much time to get a solution. few minutes. Furthermore, all previous mapping/compiling/synthesis tools have been restricted to coverage to floating pointtype application domains.the approach in [7] takes hundreds of minutes whereas our optimum. Our unified approach considers scheduling and binding. II. COARSE GRAIN RECONFIGURABLE ARCHITECTURE CGRAs consist of an array of a large number of function units (FUs) interconnected by a mesh style network. Register files are distributed throughout the CGRAs to hold temporary values and are accessible only by a subset of FUs. The Fig.1. shows the coarse grain architecture, the FUs can execute common word-level operations, including addition, subtraction, and multiplication. In contrast to FPGAs, CGRAs have short reconfiguration times, low delay characteristics, and low power consumption as they are constructed from standard cell implementations. Thus, gate-level re- configurability is sacrificed, but the result is a large increase in hardware efficiency.
  • 2. International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O) Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 2 Fig.1. Reconfigurable Hardware A good compiler is essential for exploiting the abundance of computing resources available on a GRA. However, sparse connectivity and distributed register files present difficult challenges to the scheduling phase of a compiler. The sparse connectivity puts the burden of routing operands from producers to consumers on the compiler. The Fig 2(a) shows the mesh connected processing elements the traditional schedulers that just assign an FU and time to every operation in the program are unsuitable because they do not take routability into consideration. Operand values must be explicitly routed between producing and consuming FUs. Further, dedicated routing resources are not provided. Rather, an FU can serve either as a compute resource or as a routing resource at a given time. A compiler scheduler must thus manage the computation and flow of operands across the array to effectively map applications onto CGRAs. Innermost loop of an application is selected to be executed on CGRA. With the abundance of computation resources, CGRA can efficiently execute compute intensive part of the application. Scheduling difficulties lie in the limited connectivity of CGRA. Without careful scheduling, some values can be stuck in nodes where they cannot be routed any further. Preprocessing is performed on dataflow graph of the target application. It includes identifying recurrence cycles, SSA conversion, and height calculation. Operations are then sorted by their heights and scheduling algorithm is applied to the group of operations with the same height. To guarantee the routing of values over the CGRA interconnect, scheduling space is skewed in a way that slots on the left side of CGRA are utilized first. Since slots on the left are utilized beforehand, operands can be easily routed to the right side of CGRA. An affinity graph which is constructed for each group of operations that have same height in DFG. The idea here is that placing operations with common consumers close to each other. By placing operations with common consumers close to each other, routing cost can be reduced when their consumers are placed later. Affinity values are calculated for every pair of operations in the same height by looking at how far their common consumers are located in DFG. In an affinity graph, nodes represent operations and edges represent affinity values. High affinity values between two operations indicate that they have common consumers in a short distance in DFG. In this step, operations are placed on CGRA in a way that affinity values between operations are minimized. Scheduling of operations is converted into a graph embedding problem where an affinity graph which is used in 3-D scheduling space minimizing total edge length. In the example on the left, nodes are placed in the scheduling space so that total edge length is minimized, giving more weight to edges with high affinity values. When scheduling operations in the next height, routing cost can be effectively reduced since producers of the same consumer are already placed close to each other. After getting an optimal placement of operations in one height, the scheduling space is updated the unused slot can be used in later time slots. Then, scheduler proceeds to next group of operations. A. Architecture Extension for Floating-Point Operations A pair of PEs in the PE array is able to compute floating-point operations according to its configuration [21]. While a PE in the pair computes the mantissa part of the floating point operations, the other PE handles the exponent part. Since the data path of a PE is 16 bits wide, the floating-point operations do not support the single precision IEEE-754 standard, but support only reduced mantissa of 15 bits. However, experiments show that the precision is good enough for hand- held embedded systems [21]. Since adjacent PEs are paired for floating-point operations [Fig. 2(c)], the total number of floating-point units that can but PE array as shown in Fig. 2(a) and (b). The PE array has enough interconnections among the PEs so that it makes use of the interconnections for the data exchange between the PEs required in floating-point operations.Mapping a floating-point operation on the PE array with integer operations may take many layers of cache. If a kernel consists of a multitude of floating-point operations, then causing costly fetch of additional context words from the main mapping on the array easily runs out of the cache layers, memory. Instead of using multiple cache layers to
  • 3. International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O) Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 3 perform such a complex operation, we add some control logic to the PEs so that the operation can be completed in multiple cycles cache logic control. The layers multiple requiring without be constructed is half the number of integer PEs in the can be implemented with a small finite state machine (FSM) that controls the PE’s existing datapath for a fixed number of cycles. B. Architecture Extension for Floating-Point Operations. A pair of PEs in the PE array is able to compute floating-point operations according to its configuration. While a PE in the pair computes the mantissa part of the floating point operations, the other PE handles the exponent part. Since the data path of a PE is 16 bits wide, the floating-point operations do not support the single precision IEEE- To make the RTL implementation flexible, the VHDL generic operator was used in order to control implemented features such as supported operations, number of pipeline registers, word lengths, etc. A cross-bar interconnect provides a connection from every processing elements output to every other processing element input. Partial interconnects, such as nearest neighbour, reduce the interconnect cost. The slice stage is used to split a word operand into sub-word operands and do sign extension to the operational word length. This is used to support wide accesses to memory and split operands to different processing elements which do operations in parallel. In the experimental studies, memories are 32 bits wide and slicing to 16 bits is supported also mapping. Fig.2 Mesh architecture B. Processing Elements To make the RTL implementation flexible, the VHDL generic operator was used in order to control implemented features such as supported operations, number of pipeline registers, word lengths, etc. Hence, instantiated PEs can be made a heterogeneous set by configuring these parameters prior to synthesis shows an example of the processing element. The stages marked at the top of the figure are controlled through a set of software controlled 32-bit wide configuration registers. These registers are written by software in order to route the datapath and configure the ALUoperation. Only PEs used in the computation need to be configured. What follows is a brief summary of the pipeline stages of the developed processing elements. The interconnect stage is used to select source operands from other processing elements, memory system, or FIFO. It is implemented with multiplexers and prior to synthesis, a setup and speed as compared to fine-grained functional blocks. This is because DSP computations in general are characterized by arithmetic operations on multi-bitwords of data. Hence, fine-grained logic blocks, such as 4-input look-up tables, give a large overhead when implementing word-level arithmetic To target reconfigurable architectures for the DSP domain, coarse-grained functional blocks have been proposed and proven more efficient in terms of area. The processing elements developed for Cpac are customized ALUs that support a set of different operations. In addition, post- and preprocessing steps such as data slicing, scaling, and truncation is included as these operations are commonly used in fixed-point data paths. The operation stage supports a set of fundamental arithmetic and logic operations. In particular, basic operators used in standard codecs from ITU were studied in order to find suitable arithmetic operations. The basic operators are currently used in the standards for G723.1, G729, GSM enhanced full-rate, and adaptive multi-rate speech codec and use 16 and 32 bits numbers. These operators are sufficient to build computational structures such as radix-2 FFT butterfly, complex multiplication, dual and quad multiply and accumulate, accumulated mean squared error, sum of absolute
  • 4. International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O) Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 4 differences, and basic vector operations. The shift stage is used to scale the result after multiplication, addition, or subtraction to avoid overflow. As many processing elements are envisioned to use shifting by a fixed set of numbers, implementation of the shifter can be selected either as partial shifter, or a barrel shifter that supports all shift operations.The saturation stage is used to saturate the result from the shifter (R4) or from the operation (R3). Saturation values are software controlled through the registers min. Fig 3. Data path resource. The floating point input and output was discussed. The floating point conversion using IEEE floating point conversion is used. The Xilinx 9.1 and modelsim 6.1e student edition has been used for the simulation.The floating point conversion is used to convert from hexadecimal number to floating point number. The input is shown in the Fig 3.1 which is a single input floating point number.The floating point number is converted to hexadecimal number for the input to the floating point adder by the use of IEEE floating point conversion software. The 64 bit floating point output which is to be converted to a floating point number by the use of IEEE application software. The floating point conversion unit round off the value, then it will manipulate the hexadecimal value. The hexadecimal value is applied in the coding and the output will be in the hexadecimal. The output is then converted to floating point number by the IEEE floating point conversion unit. Fig 4. Floating point output III. CONCLUSIONS In this paper, we presented a slow but optimal approach and fast heuristic approaches to mapping of applications from multiple domains onto a CGRA supporting floating since it gives better result than spanning tree floating point operations. In particular, we considered Steiner routing. After presenting an ILP formulation for an optimal solution, we presented a fast heuristic approach based on HLS techniques that performs loop unrolling and pipelining which achieves drastic performance improvement. For ran-heuristic approach considering Steiner points finds 97% of domly generated test examples, we showed that the proposed the optimal solutions within a few seconds.
  • 5. International Journal of Innovative Research in Information Security (IJIRIS) ISSN:2349-7017(O) Volume 1 Issue 4 (October 2014) ISSN:2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 5 Experiments on only implementation. shown that our approach targeting a CGRA achieves up to119.9 times performance improvement compared to software various benchmark examples and 3-D graphics. REFERENCES [1]M. C. Filho, “Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications,” IEEE Trans. Comput.,vol. 49, no. 5, pp. 465–481, May 2000. [2] PACT XPP Technologies [Online]. Available: http://www.pactxpp.com [3] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, “ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix,” in Proc. FPLA, 2003, pp. 61–70. [4] Chameleon Systems, Inc. [Online]. Available: http://www.chameleon- “DRESC: A retargetable compiler for coarse- grained reconfigurable architectures,” in Proc. ICFPT, Dec. 2002, pp. 166–173. [5] T. J. Callahan and J. Wawrzynek, “Instruction-level parallelism for systems.com [1] H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. [6] W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S.reconfigurable computing,” in Proc. IWFPL, 1998, pp. 248–257. [7] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins,P. Amarasinghe, “Space-time scheduling of instruction level parallelismon a RAW machine,” in Proc. ASPLOSV, 1998, pp. 46–57. [8] T. Toi, N. Nakamura, Y. Kato, T. Awashima, K. Wakabayashi, and L. Jing, “High-level synthesis challenges and solutions for a dynamicallyreconfigurable processor,” in Proc. ICCAD, Nov. 2006, pp. 702–708. [9] H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H. Kim, “Edgecentric modulo scheduling for coarse-grained reconfigurable architec-tures,” in Proc. PACT, Oct. 2008, pp. 166–176. [10] S. Friedman, A. Carroll, B. V. Essen, B. Ylvisaker, C. Ebeling, and S. Hauck, “SPR: An architecture-adaptive CGRA mapping tool,” in Proc.FPGA, 2009, pp. 191–200.based spatial mapping algorithm for coarse-grained reconfigurable [11] J. Yoon, A. Shrivastava, S. Park, M. Ahn, and Y. Paek, “A graph draw-architecture,” IEEE Trans. Very Large Scale Integr. Syst., vol. 17, no.11, pp. 1565–1578, Jun. 2008. [12] Y. Ahn, K. Han, G. Lee, H. Song, J. Yoo, and K. Choi, “SoCDAL:System-on-chip design accelerator,” ACM Trans. Des. Automat. Electron.Syst., vol. 13, no. 1, pp. 171–176, Jan. 2008. [13] Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi, “Resource sharing specific optimization,” in Proc. DATE, Mar. 2005, pp. 12–17. and pipelining in coarse-grained reconfigurable architecture for domain. [14] Y. Kim, I. Park, K. Choi, and Y. Paek, “Power-conscious configurationcache structure and code mapping for coarse- grained reconfigurable architecture,” in Proc. ISLPED, Oct. 2006, pp. 310–315. [15] The SUIF Compiler System [Online]. Available: http://suif.stanford.edu [16] C. O. Shields, Jr., “Area efficient layouts of binary trees in grids,” Ph.D. [17] G. Lee, S. Lee, and K. Choi, “Automatic mapping of application to dissertation, Dept. Comput. Sci., Univ. Texas, Dallas, 2001. techniques,” in Proc. ISOCC, Nov. 2008, pp. 395–398. coarse-grained reconfigurable architecture based on high-level synthesis [18] K. Han and J. Kim, “Quantum-inspired evolutionary algorithms with a new termination criterion, Hε gate, and two phase scheme,” IEEE Trans. Evol. Computat., vol. 8, no. 2, pp. 156–169, Apr. 2004. [19] GNU Linear Programming Kit [Online]. Available: http://www.gnu.org/ [20] G. Lee, S. Lee, K. Choi, and N. Dutt, “Routing-aware application mapping considering Steiner point for coarse- grained reconfigurable algorithms,” in Proc. FPL, 2005, pp. 317–322. [21] M. Jo, V. K. P. Arava, H. Yang, and K. Choi, “Implementation of floating-point operations for 3-D graphics on a coarse-grained reconfigurable architecture,” in Proc. IEEE-SOCC, Sep. 2007, pp. 127–130. [22] Z. Baidas, A. D. Brown, and A. C. Williams, “Floating-point behavioral synthesis,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.20, no. 7, pp. 828–839, Jul. 2001.