SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and
                  Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.293-296
    Area Efficient 128-point FFT Processor using Mixed Radix 4-2
                       for OFDM Applications
                           K. UMAPATHY1 D. RAJAVEERAPPA2
                                    1
                                     Research Scholar, JNT University, Anantapur, India
                2
                    Professor, Department of ECE, Loyola Institute of Technology, Chennai, India

Abstract
        This paper presents a single-chip 128-             is the radix number of the FFT decomposition. In
point FFT processor based on the cached-memory             FFT, each stage requires the reading and writing of
architecture (CMA) with the resource Mixed                 all N data words. So higher-radix decomposition is
Radix 4-2 computation element. The 4-2-stage               required to increase the FFT execution time. The
CMA, including a pair of single-port SRAMs, is             proposed CMA performs a high-radix computation
introduced to increase the execution time of the 2-        with low-power and high-performance.
dimensional FFT’s. Using the above techniques,                      Figure 1 shows the block diagram of system
we have designed an FFT processor core which               using the CMA. It is similar to single-memory
integrates 5,52,000 transistors within an area of          architecture except for a small cache memory
2.8 x 2.8 mm2 with CMOS 0.35μm triple-layer-               comprising a pair of registers between the
metal process technology. This processor can               computation element and the main memory. This pair
execute a 128-point, 36-bit-complex fixed-point            of registers enables hiding the memory access cycle
data format, 1-dimensonal FFT in 23.2μsec and 2            behind the computation cycle shown in Figure 2. This
dimensional FFT in only 23.8 msec at 133 MHz               tightly-coupled processor-cache pair increases the
clock operation.                                           effective bandwidth to the main memory during the
                                                           mixed radix-4-2 execution thereby avoiding the
Keywords:      FFT, CMA (cached               memory       processor to access the main memory often.
architecture), OFDM, Mixed Radix4-2                        The advances in semiconductor technology have not
                                                           only enabled the performance and integration of FFT
1. Introduction                                            processors to increase steadily, but also led to the
         The Fast Fourier transform (FFT)                  requirement of the higher-radix computation.
transmutes a set of data from the time domain to the       Assuming that the computation element has one
frequency domain or vice versa. Since it saves a           mixed radix 4-2 execution unit, then we have the
considerable amount of computation time over the           following equations-
conventional discrete Fourier transform (DFT)                      Memory Access Cycle = r2           (1)
method the FFT is widely used in various signal                    Execution Cycle = r [logr N] / 2   (2)
processing applications. The earlier implementations
of the FFT algorithms have been mainly used for                   Radix                                        Register
running software on general-purpose computers. But
                                                                                          MUX
                                                                  4-2
with the increasing demands for high-speed signal                                                              Cache
processing applications, the efficient hardware
                                                                  Computation
implementation of FFT processors has become an                    Element                                      memory
important key to develop many future generations of
advanced multi-dimensional digital signal and image            Figure 1. Block diagram of the System using
processing systems.                                                               CMA.
         This paper describes a small-area, high-
performance 128-point FFT processor using both the
cached-memory architecture (CMA) and the double
buffer structure. The CMA significantly reduces the
hardware resource of the computation element when
compared to single-memory or dual-memory
architecture, even if the radix number is high. A high-
performance, multi-dimensional FFT is achieved
using the 2-stage CMA based on the double buffer
structure.

2. Cached-Memory Architecture                              Figure 2. Operation sequence for each register. It
        Since FFT makes frequent accesses to data          is required to keep a balance between the memory
in memory, it can be calculated in O (logr N) stages,      access cycle and the execution cycle to achieve
where N is the number of point of the transform and r      100% efficiency.

                                                                                                   293 | P a g e
K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and
                   Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                     Vol. 2, Issue 5, September- October 2012, pp.293-296
Equations (1) and (2) show that the higher the radix                 Figure 4 shows the block diagram of the
number, the smaller will be the ratio of the execution      RM-R4-2CE with a pair of registers. In conventional
cycle to the memory access cycle. This shows that           FFT processors the deep pipeline architecture is
with increasing the number of radix number, it takes        adopted. But in this RM-R4-2CE, every pass is
more time for execution than accessing the memory           processed cyclically among the processor and the
and this causes the imbalance as shown in Figure 2.         cache. Hence none of the additional pipeline registers
To solve this problem, multiple computation units           and complex controllers is required. Thus this simple
with multi-datapath are required. Hence we have             architecture avoids any processor stalls or memory
developed the resource-saving multidatapath                 access bottleneck caused by the data dependence.
computation element without any interconnection             The RM-R4-2CE has resource-saving routing
bottleneck caused by large crossbar, bus, or network        switches for transmitting the output signals to the
structure connected to partitioned memories.                appropriate registers for the following pass
                                                            processing. When the ―select‖ signal shown in Figure
3. Mixed Radix 4-2 Computation Element                      3 is ―0‖, the 1st and 3rd pass configuration will be
          In the Decimation-in-time (DIT) Mixed             realized on RM-R4-2CE, and if ―1‖, it is configured
Radix-4-2 FFT algorithm, an 8-word transformation           for the 2nd pass.
can be done by a 4-row x 3-column Mixed Radix-4-2
butterfly operations shown in Figure 3(a). Since the
butterfly execution unit, which is composed of a
complex multiplier, adder and subtractor, is huge, it
is very important to reuse the computation resources
in order to shrink the size of the chip. To achieve this,
we have developed the resource-saving Mixed Radix
4-2 computation element (RM-R4-2CE).
          Since each column is composed of 4
independent butterflies and data flows in a consistent
pattern in the 8-word groups throughout the entire
128-word transform, this symmetry makes it possible
to divide the normal 8-word computation into 3
passes as shown in Figure 3(b). Each pass is
processed by a single RM-R4-2CE consisting of
Mixed Radix 4-2 butterfly execution units. Using the
RM-R4-2CE, all the communication can be easily
sorted and stored in the register closer to the
following butterfly unit by changing the address of
the data stored in the cache one after the other. In this
manner, RM-R4-2CE can reduce the processor area
down to 1/3 and half of the interconnection area
without any delay penalties.
                                                            Figure 4. Block diagram of RM-R4-2CE with a pair
                                                                                 of registers.
                                                                      According to the effect of multidatapath, the
                                                            execution cycle becomes non- critical. Hence we
                                                            have implemented the complex multiplier in the
                                                            butterfly execution unit as small as possible. Table 1
                                                            and 2 shows the evaluation of the gate count of both
                                                            the multiplier and the adder. From these results, we
                                                            have decided to use the array multiplier and the ripple
                                                            carry adder, which can reduce the area of the
                                                            complex multiplier, comprising 4 multipliers, 1 adder
                                                            and 1 subtractor by about 39% when compared to the
                                                            one which uses the modified Booth-2 encoded
                                                            Wallace tree multiplier and the carry look ahead
                                                            adder as shown in Table 3.


 Figure 3. Dataflow diagram of the 8-word group
            (a) Radix 2 decomposition
            (b) Computation element.


                                                                                                   294 | P a g e
K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and
                   Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                     Vol. 2, Issue 5, September- October 2012, pp.293-296
Table 1. Evaluation of the multiplier                      combination of the CMA, RM-R4-2CE and double
  Type of multiplier       Number          Gate delay      buffer structure is extremely well suited for the
  (18-bit width, 2’s        of gates        [nsec]         proposed FFT processor.
    complement)
Array                        1,962            24.8         4. Chip Design & Results
Wallace Tree                 2,145             9.7                   Figure 5 shows the die plot of the specially
Modified Booth-2             3,019             6.3         designed 2-stage CMA FFT processor using RM-R4-
encoded       Wallace                                      2CE. This processor has been designed with 0.35μm,
Tree                                                       3-layer-metal CMOS technology and occupies an
                                                           area of only 7.84 mm2 which contains 1,92,000
Table 2. Evaluation of the adder                           transistors for logic and 3,60,000 transistors for
    Type of adder           Number         Gate delay      SRAM in 3 blocks. While 2 of these blocks are
  (18-bit width, 2’s        of gates        [nsec]         assigned for data, the other one stores the coefficients
    complement)                                            of the twiddle factor.
Ripple Carry                                   7.5
                               95
Carry Look Ahead              128
                                               1.6

Table 3. Evaluation of the complex multiplier
 Combination of the         Number       Gate delay
      multiplier            of gates        [nsec]
    and the adder
                                                           Figure 5. Die plot of the proposed FFT processor.
Array & Ripple Carry         8,038           32.3
                                                           Table 4 shows the key features of the 2-stage CMA
                                                           FFT processor in comparison with the previous
Modified   Booth-2           13,000            7.9
                                                           works. This table shows that our proposed FFT
encoded    Wallace
                                                           processor is superior in terms of the data path width,
Tree & Carry Look
                                                           the 2-dimensional FFT calculation speed, and
Ahead
                                                           especially the area of the silicon even though the
                                                           difference in technology is taken into account. In this
                                                           table, we assume constant throughput in the
          From the simulation waveforms of the cache       performance estimation. Moreover, the twiddle factor
register, data are serially read from the main memory      is stored not in the ROM but in the coefficients of
to the register, processed in parallel, and written back   SRAM. This makes it possible to perform the
to the main memory serially. During execution, data        adaptive number of point FFT, which means more
is circulated 3 times around the RM-R4-2CE and the         than 128-point FFT’s.
register of one side. Then the 3-stage pass processing
                                                           Table 4. Key features of the proposed 128-point
is finished within the same time as that of 8-word         FFT processor
data read/write time. This clearly indicates that the
                                                            Parameters     Proposed        Spiffee         NTT
RM-R4-2CE converts the data properly without any
                                                                              FFT                        LSI Lab
delay penalties in spite of the decrease in the number
                                                           Technology                       0.7μm
of butterfly elements down to 1/3 or the interconnect                       0.35μm                        0.8μm
                                                                                        (Lpoly=0.6μm)
resources down to one half. It suggests that the
                                                           Number of        Logic :
proposed RM-R4-2CE is well suited for the FFT
                                                           Transistors        192k           460k          380k
calculations based on the CMA.
                                                                            SRAM :
         On the other way, the processor must be                              360k
stalled and forced to wait for data in the original                          Total :
                                                                              552k
CMA without the double buffer structure. This is
important for the high-speed and small-area 2-             Data path 36x8-bit               20-bit        24-bit
dimensional FFT processor where a series of 1-             Width              fixed      fixed point     floating
dimensional FFTs must be executed. The dual-port                             point
SRAM is sometimes used to increase the data path           Area            7.84 mm2        49 mm2          134
width by pipeline FFT processors. Since dual-port                                                          mm2
SRAM occupies about double the area in comparison
to the single-port SRAM, it is difficult to adopt the      Clock Freq.     133MHz          173MHz          40MHz
double buffer structure. However the CMA with RM-
R4-2CE can achieve 100% efficiency even with the           128-point       23.2μsec         15μsec         54μsec
single-port SRAM configuration. This shows that the        FFT


                                                                                                     295 | P a g e
K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and
                Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                  Vol. 2, Issue 5, September- October 2012, pp.293-296
5. Conclusion                                    [8]  3GPP TS 36.201 V8.3.0 LTE Physical
     A single-chip 128-point FFT processor is                   Layer—General Description,E-UTRA, Mar.
presented based on the CMA with the resource-                   2009.
saving Mixed Radix-4-2 computation element. The 2-       [9]    E. Oran Brigham, The fast Fourier transform
stage CMA, including a pair of single-port SRAMs,               and its applications, Prentice-Hall, 1988.
is also introduced to speed up the execution time of     [10]   Se Ho Park et.ul. ‖A 2048 complex point
the 2-dimensional FFT’s. Using above techniques,                FFT architecture for digital audio
we have designed an FFT processor core which                    broadcasting system‖, In Proc. ISCAS 2000.
integrates 5,52,000 transistors in area of 2.8 x 2.8     [11]   Se Ho Park e t d . ―Sequential design of a
mm2 with CMOS 0.35μm triple layer metal process                 8192 complex point FFT in OFDM
technology. This processor can execute a 128-point,             receiver‖, In Pvoc. AP-ASIC, 1999.
fixed-point 36-bit complex data format, 1-dimensonal
FFT in 23.2 μsec and 2-dimensional FFT in only 23.8
msec at 133MHz operation. With this FFT core, we
can develop many future generations of advanced
multi-dimensional Digital signal and Image
processing systems onto the system chip.

References
  [1]   M. K. A. Aziz, P. N. Fletcher, and A. R.
        Nix, ―Performance analysis of IEEE
        802.11n solutions combining MIMO
        architectures with iterative decoding and
        sub-optimal ML detection via MMSE and
        zero forcing GIS solutions,‖ IEEE Wireless
        Communications         and       Networking
        Conference,2004, vol. 3, pp. 1451–1456,
        Mar. 2004.
  [2]   A. Ghosh and D. R. Wolter, ―Broadband
        wireless access with WiMax/802.16: current
        performance benchmarks and future
        potential,‖      IEEE       Communications
        Magazine, vol. 43, no. 2, pp. 129–136, Feb.
        2005.
  [3]   S. Yoshizawa, K. Nishi, and Y. Miyanaga,
        ―Reconfigurable two- dimensional pipeline
        FFT processor in OFDM cognitive radio
        systems,‖in IEEE International Symposium
        on Circuits and Systems, (ISCAS ’08),May
        2008, pp. 1248–1251.
  [4]   L. G. Johnson, ―Conflict free memory
        addressing for dedicated FFT hardware,‖
        IEEE Trans. Circuits Syst. II, Analog Digit.
        Signal rocess.,vol. 39, no. 5, pp. 312–316,
        May 1992.
  [5]   J. A. Hidalgo, J. Lopez, F. Arguello, and E.
        L.Zapata,―Area-efficient architecture for fast
        Fourier transform,‖ IEEE Trans. Circuits
        Syst. II,Analog Digit. Signal Process., vol.
        46, no. 2, pp. 187–193, Feb. 1999.
  [6]   B. G. Jo and M. H. Sunwoo, ―New
        continuous-flow mixed-radix (CFMR) FFT
        processor using novel in-place strategy,‖
        IEEE Trans. Circuits yst. I,Reg. Papers, vol.
        52, no. 5, pp. 911–919, May 2005.
  [7]   Z.-X. Yang, Y.-P. Hu, C.-Y. Pan, and L.
        Yang, ―Design of a 3780-point IFFT
        processor for TDS-OFDM,‖ IEEE Trans.
        Broadcast., vol. 48, no. 1,pp. 57–61, Mar.
        2002.

                                                                                            296 | P a g e

Mais conteúdo relacionado

Mais procurados

MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERING
MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERINGMULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERING
MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERINGvtunotesbysree
 
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMS
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMSMULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMS
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMSvtunotesbysree
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processingvishal choudhary
 
Ternary content addressable memory for longest prefix matching based on rando...
Ternary content addressable memory for longest prefix matching based on rando...Ternary content addressable memory for longest prefix matching based on rando...
Ternary content addressable memory for longest prefix matching based on rando...TELKOMNIKA JOURNAL
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
 
A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Fisnik Kraja
 
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressor
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 CompressorArea and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressor
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressoridescitation
 
Final Project Report
Final Project ReportFinal Project Report
Final Project ReportRiddhi Shah
 
Low Power Architecture for JPEG2000
Low Power Architecture for JPEG2000Low Power Architecture for JPEG2000
Low Power Architecture for JPEG2000Rahul Jain
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Dr.K. Thirunadana Sikamani
 

Mais procurados (20)

427 432
427 432427 432
427 432
 
MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERING
MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERINGMULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERING
MULTIPLE CHOICE QUESTIONS ON COMMUNICATION PROTOCOL ENGINEERING
 
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMS
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMSMULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMS
MULTIPLE CHOICE QUESTIONS WITH ANSWERS ON NETWORK MANAGEMENT SYSTEMS
 
Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processing
 
J0166875
J0166875J0166875
J0166875
 
Cg24551555
Cg24551555Cg24551555
Cg24551555
 
Ek35775781
Ek35775781Ek35775781
Ek35775781
 
Ternary content addressable memory for longest prefix matching based on rando...
Ternary content addressable memory for longest prefix matching based on rando...Ternary content addressable memory for longest prefix matching based on rando...
Ternary content addressable memory for longest prefix matching based on rando...
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange
 
A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Unit 2
Unit 2Unit 2
Unit 2
 
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressor
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 CompressorArea and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressor
Area and Speed Efficient Reversible Fused Radix-2 FFT Unit using 4:3 Compressor
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
Low Power Architecture for JPEG2000
Low Power Architecture for JPEG2000Low Power Architecture for JPEG2000
Low Power Architecture for JPEG2000
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 

Destaque (20)

Cj25509515
Cj25509515Cj25509515
Cj25509515
 
By25450453
By25450453By25450453
By25450453
 
Bs25412419
Bs25412419Bs25412419
Bs25412419
 
Bn25384390
Bn25384390Bn25384390
Bn25384390
 
Du25727731
Du25727731Du25727731
Du25727731
 
Cn25528534
Cn25528534Cn25528534
Cn25528534
 
Ba25315321
Ba25315321Ba25315321
Ba25315321
 
Bi25356363
Bi25356363Bi25356363
Bi25356363
 
Bg25346351
Bg25346351Bg25346351
Bg25346351
 
Dz25764770
Dz25764770Dz25764770
Dz25764770
 
Ei25822827
Ei25822827Ei25822827
Ei25822827
 
Er25885892
Er25885892Er25885892
Er25885892
 
Bv31491493
Bv31491493Bv31491493
Bv31491493
 
Bq31464466
Bq31464466Bq31464466
Bq31464466
 
Ck31577580
Ck31577580Ck31577580
Ck31577580
 
Duvidas frequentes gobull
Duvidas frequentes gobullDuvidas frequentes gobull
Duvidas frequentes gobull
 
A Day at Work With Keira's Daddy - A Flat Stanley Story
A Day at Work With Keira's Daddy - A Flat Stanley StoryA Day at Work With Keira's Daddy - A Flat Stanley Story
A Day at Work With Keira's Daddy - A Flat Stanley Story
 
Música,artes e movimento
Música,artes e movimentoMúsica,artes e movimento
Música,artes e movimento
 
Pietra grey
Pietra greyPietra grey
Pietra grey
 
2013.12 simone martini
2013.12 simone martini2013.12 simone martini
2013.12 simone martini
 

Semelhante a Aw25293296

High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierIJERA Editor
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...ijsrd.com
 
Implementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAImplementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAMangaiK4
 
Analysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalAnalysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalMangaiK4
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...IOSRJECE
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...IOSR Journals
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTAM Publications
 
IRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET Journal
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET Journal
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...IOSR Journals
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...IOSR Journals
 

Semelhante a Aw25293296 (20)

Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless ApplicationDesign Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
 
H344250
H344250H344250
H344250
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic Multiplier
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
 
Implementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAImplementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGA
 
Analysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalAnalysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix Modal
 
My paper
My paperMy paper
My paper
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
 
Hv2514131415
Hv2514131415Hv2514131415
Hv2514131415
 
Hv2514131415
Hv2514131415Hv2514131415
Hv2514131415
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFT
 
IRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo Codes
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
 

Aw25293296

  • 1. K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.293-296 Area Efficient 128-point FFT Processor using Mixed Radix 4-2 for OFDM Applications K. UMAPATHY1 D. RAJAVEERAPPA2 1 Research Scholar, JNT University, Anantapur, India 2 Professor, Department of ECE, Loyola Institute of Technology, Chennai, India Abstract This paper presents a single-chip 128- is the radix number of the FFT decomposition. In point FFT processor based on the cached-memory FFT, each stage requires the reading and writing of architecture (CMA) with the resource Mixed all N data words. So higher-radix decomposition is Radix 4-2 computation element. The 4-2-stage required to increase the FFT execution time. The CMA, including a pair of single-port SRAMs, is proposed CMA performs a high-radix computation introduced to increase the execution time of the 2- with low-power and high-performance. dimensional FFT’s. Using the above techniques, Figure 1 shows the block diagram of system we have designed an FFT processor core which using the CMA. It is similar to single-memory integrates 5,52,000 transistors within an area of architecture except for a small cache memory 2.8 x 2.8 mm2 with CMOS 0.35μm triple-layer- comprising a pair of registers between the metal process technology. This processor can computation element and the main memory. This pair execute a 128-point, 36-bit-complex fixed-point of registers enables hiding the memory access cycle data format, 1-dimensonal FFT in 23.2μsec and 2 behind the computation cycle shown in Figure 2. This dimensional FFT in only 23.8 msec at 133 MHz tightly-coupled processor-cache pair increases the clock operation. effective bandwidth to the main memory during the mixed radix-4-2 execution thereby avoiding the Keywords: FFT, CMA (cached memory processor to access the main memory often. architecture), OFDM, Mixed Radix4-2 The advances in semiconductor technology have not only enabled the performance and integration of FFT 1. Introduction processors to increase steadily, but also led to the The Fast Fourier transform (FFT) requirement of the higher-radix computation. transmutes a set of data from the time domain to the Assuming that the computation element has one frequency domain or vice versa. Since it saves a mixed radix 4-2 execution unit, then we have the considerable amount of computation time over the following equations- conventional discrete Fourier transform (DFT) Memory Access Cycle = r2 (1) method the FFT is widely used in various signal Execution Cycle = r [logr N] / 2 (2) processing applications. The earlier implementations of the FFT algorithms have been mainly used for Radix Register running software on general-purpose computers. But MUX 4-2 with the increasing demands for high-speed signal Cache processing applications, the efficient hardware Computation implementation of FFT processors has become an Element memory important key to develop many future generations of advanced multi-dimensional digital signal and image Figure 1. Block diagram of the System using processing systems. CMA. This paper describes a small-area, high- performance 128-point FFT processor using both the cached-memory architecture (CMA) and the double buffer structure. The CMA significantly reduces the hardware resource of the computation element when compared to single-memory or dual-memory architecture, even if the radix number is high. A high- performance, multi-dimensional FFT is achieved using the 2-stage CMA based on the double buffer structure. 2. Cached-Memory Architecture Figure 2. Operation sequence for each register. It Since FFT makes frequent accesses to data is required to keep a balance between the memory in memory, it can be calculated in O (logr N) stages, access cycle and the execution cycle to achieve where N is the number of point of the transform and r 100% efficiency. 293 | P a g e
  • 2. K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.293-296 Equations (1) and (2) show that the higher the radix Figure 4 shows the block diagram of the number, the smaller will be the ratio of the execution RM-R4-2CE with a pair of registers. In conventional cycle to the memory access cycle. This shows that FFT processors the deep pipeline architecture is with increasing the number of radix number, it takes adopted. But in this RM-R4-2CE, every pass is more time for execution than accessing the memory processed cyclically among the processor and the and this causes the imbalance as shown in Figure 2. cache. Hence none of the additional pipeline registers To solve this problem, multiple computation units and complex controllers is required. Thus this simple with multi-datapath are required. Hence we have architecture avoids any processor stalls or memory developed the resource-saving multidatapath access bottleneck caused by the data dependence. computation element without any interconnection The RM-R4-2CE has resource-saving routing bottleneck caused by large crossbar, bus, or network switches for transmitting the output signals to the structure connected to partitioned memories. appropriate registers for the following pass processing. When the ―select‖ signal shown in Figure 3. Mixed Radix 4-2 Computation Element 3 is ―0‖, the 1st and 3rd pass configuration will be In the Decimation-in-time (DIT) Mixed realized on RM-R4-2CE, and if ―1‖, it is configured Radix-4-2 FFT algorithm, an 8-word transformation for the 2nd pass. can be done by a 4-row x 3-column Mixed Radix-4-2 butterfly operations shown in Figure 3(a). Since the butterfly execution unit, which is composed of a complex multiplier, adder and subtractor, is huge, it is very important to reuse the computation resources in order to shrink the size of the chip. To achieve this, we have developed the resource-saving Mixed Radix 4-2 computation element (RM-R4-2CE). Since each column is composed of 4 independent butterflies and data flows in a consistent pattern in the 8-word groups throughout the entire 128-word transform, this symmetry makes it possible to divide the normal 8-word computation into 3 passes as shown in Figure 3(b). Each pass is processed by a single RM-R4-2CE consisting of Mixed Radix 4-2 butterfly execution units. Using the RM-R4-2CE, all the communication can be easily sorted and stored in the register closer to the following butterfly unit by changing the address of the data stored in the cache one after the other. In this manner, RM-R4-2CE can reduce the processor area down to 1/3 and half of the interconnection area without any delay penalties. Figure 4. Block diagram of RM-R4-2CE with a pair of registers. According to the effect of multidatapath, the execution cycle becomes non- critical. Hence we have implemented the complex multiplier in the butterfly execution unit as small as possible. Table 1 and 2 shows the evaluation of the gate count of both the multiplier and the adder. From these results, we have decided to use the array multiplier and the ripple carry adder, which can reduce the area of the complex multiplier, comprising 4 multipliers, 1 adder and 1 subtractor by about 39% when compared to the one which uses the modified Booth-2 encoded Wallace tree multiplier and the carry look ahead adder as shown in Table 3. Figure 3. Dataflow diagram of the 8-word group (a) Radix 2 decomposition (b) Computation element. 294 | P a g e
  • 3. K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.293-296 Table 1. Evaluation of the multiplier combination of the CMA, RM-R4-2CE and double Type of multiplier Number Gate delay buffer structure is extremely well suited for the (18-bit width, 2’s of gates [nsec] proposed FFT processor. complement) Array 1,962 24.8 4. Chip Design & Results Wallace Tree 2,145 9.7 Figure 5 shows the die plot of the specially Modified Booth-2 3,019 6.3 designed 2-stage CMA FFT processor using RM-R4- encoded Wallace 2CE. This processor has been designed with 0.35μm, Tree 3-layer-metal CMOS technology and occupies an area of only 7.84 mm2 which contains 1,92,000 Table 2. Evaluation of the adder transistors for logic and 3,60,000 transistors for Type of adder Number Gate delay SRAM in 3 blocks. While 2 of these blocks are (18-bit width, 2’s of gates [nsec] assigned for data, the other one stores the coefficients complement) of the twiddle factor. Ripple Carry 7.5 95 Carry Look Ahead 128 1.6 Table 3. Evaluation of the complex multiplier Combination of the Number Gate delay multiplier of gates [nsec] and the adder Figure 5. Die plot of the proposed FFT processor. Array & Ripple Carry 8,038 32.3 Table 4 shows the key features of the 2-stage CMA FFT processor in comparison with the previous Modified Booth-2 13,000 7.9 works. This table shows that our proposed FFT encoded Wallace processor is superior in terms of the data path width, Tree & Carry Look the 2-dimensional FFT calculation speed, and Ahead especially the area of the silicon even though the difference in technology is taken into account. In this table, we assume constant throughput in the From the simulation waveforms of the cache performance estimation. Moreover, the twiddle factor register, data are serially read from the main memory is stored not in the ROM but in the coefficients of to the register, processed in parallel, and written back SRAM. This makes it possible to perform the to the main memory serially. During execution, data adaptive number of point FFT, which means more is circulated 3 times around the RM-R4-2CE and the than 128-point FFT’s. register of one side. Then the 3-stage pass processing Table 4. Key features of the proposed 128-point is finished within the same time as that of 8-word FFT processor data read/write time. This clearly indicates that the Parameters Proposed Spiffee NTT RM-R4-2CE converts the data properly without any FFT LSI Lab delay penalties in spite of the decrease in the number Technology 0.7μm of butterfly elements down to 1/3 or the interconnect 0.35μm 0.8μm (Lpoly=0.6μm) resources down to one half. It suggests that the Number of Logic : proposed RM-R4-2CE is well suited for the FFT Transistors 192k 460k 380k calculations based on the CMA. SRAM : On the other way, the processor must be 360k stalled and forced to wait for data in the original Total : 552k CMA without the double buffer structure. This is important for the high-speed and small-area 2- Data path 36x8-bit 20-bit 24-bit dimensional FFT processor where a series of 1- Width fixed fixed point floating dimensional FFTs must be executed. The dual-port point SRAM is sometimes used to increase the data path Area 7.84 mm2 49 mm2 134 width by pipeline FFT processors. Since dual-port mm2 SRAM occupies about double the area in comparison to the single-port SRAM, it is difficult to adopt the Clock Freq. 133MHz 173MHz 40MHz double buffer structure. However the CMA with RM- R4-2CE can achieve 100% efficiency even with the 128-point 23.2μsec 15μsec 54μsec single-port SRAM configuration. This shows that the FFT 295 | P a g e
  • 4. K. Umapathy, D. Rajaveerappa / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.293-296 5. Conclusion [8] 3GPP TS 36.201 V8.3.0 LTE Physical A single-chip 128-point FFT processor is Layer—General Description,E-UTRA, Mar. presented based on the CMA with the resource- 2009. saving Mixed Radix-4-2 computation element. The 2- [9] E. Oran Brigham, The fast Fourier transform stage CMA, including a pair of single-port SRAMs, and its applications, Prentice-Hall, 1988. is also introduced to speed up the execution time of [10] Se Ho Park et.ul. ‖A 2048 complex point the 2-dimensional FFT’s. Using above techniques, FFT architecture for digital audio we have designed an FFT processor core which broadcasting system‖, In Proc. ISCAS 2000. integrates 5,52,000 transistors in area of 2.8 x 2.8 [11] Se Ho Park e t d . ―Sequential design of a mm2 with CMOS 0.35μm triple layer metal process 8192 complex point FFT in OFDM technology. This processor can execute a 128-point, receiver‖, In Pvoc. AP-ASIC, 1999. fixed-point 36-bit complex data format, 1-dimensonal FFT in 23.2 μsec and 2-dimensional FFT in only 23.8 msec at 133MHz operation. With this FFT core, we can develop many future generations of advanced multi-dimensional Digital signal and Image processing systems onto the system chip. References [1] M. K. A. Aziz, P. N. Fletcher, and A. R. Nix, ―Performance analysis of IEEE 802.11n solutions combining MIMO architectures with iterative decoding and sub-optimal ML detection via MMSE and zero forcing GIS solutions,‖ IEEE Wireless Communications and Networking Conference,2004, vol. 3, pp. 1451–1456, Mar. 2004. [2] A. Ghosh and D. R. Wolter, ―Broadband wireless access with WiMax/802.16: current performance benchmarks and future potential,‖ IEEE Communications Magazine, vol. 43, no. 2, pp. 129–136, Feb. 2005. [3] S. Yoshizawa, K. Nishi, and Y. Miyanaga, ―Reconfigurable two- dimensional pipeline FFT processor in OFDM cognitive radio systems,‖in IEEE International Symposium on Circuits and Systems, (ISCAS ’08),May 2008, pp. 1248–1251. [4] L. G. Johnson, ―Conflict free memory addressing for dedicated FFT hardware,‖ IEEE Trans. Circuits Syst. II, Analog Digit. Signal rocess.,vol. 39, no. 5, pp. 312–316, May 1992. [5] J. A. Hidalgo, J. Lopez, F. Arguello, and E. L.Zapata,―Area-efficient architecture for fast Fourier transform,‖ IEEE Trans. Circuits Syst. II,Analog Digit. Signal Process., vol. 46, no. 2, pp. 187–193, Feb. 1999. [6] B. G. Jo and M. H. Sunwoo, ―New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,‖ IEEE Trans. Circuits yst. I,Reg. Papers, vol. 52, no. 5, pp. 911–919, May 2005. [7] Z.-X. Yang, Y.-P. Hu, C.-Y. Pan, and L. Yang, ―Design of a 3780-point IFFT processor for TDS-OFDM,‖ IEEE Trans. Broadcast., vol. 48, no. 1,pp. 57–61, Mar. 2002. 296 | P a g e