SlideShare a Scribd company logo
1 of 16
A Combined SDC-SDF Architecture For Normal I/O Pipelined Radix-4 FFT
S.MAGESHKUMAR,
Department of ECE,
Asan Memorial College Of Engineering And Technology,
Sakthikumaran06@gmail.com.
Abstract
We present an efficient combined
single-path delay commutator and
multi-path delay feedback (SDC-
SDF) radix-4 pipelined fast fourier
transform architecture.which
includes SDC stages,and
one SDF stages .The SDC
processing engine is proposed to
achieve 100% hardware resource
utilization by sharing the common
arithmetic resource in the time-
multiplexed approach,including both
adders and multipliers is reduced to
compared with
for the other radix-4 SDC-SDF
architecture .in addition the
proposed architecture requires
roughly minimum number of
complex adders and
complex delay memory 4N+3.0
.
Intex Terms-Fast Fourier Transform
(FFT),pipelined architecture ,single
path delay communicator processing
elements (SDC PF).
Introduction
Fast Fourier Transform(FFT) has
played a significant role in digital
signal processing field,especially in
the advanced communication
systems,such as orthogonal
frequency multiplexing
(OFDM),and asymmetric digital
subscriber line .all these systems
require that the FFT computation
must be high throughput and low
latency.therefore ,designing a high
performance FFT circuit is an
efficient solution to the
abovementioned problems,in
particular the pipelined FFT
architecture have mainly been
adopted to address the difficulties
Due to their attractive .such as small
chip area .high throughput ,and
power consumption.
To the best our knowledge ,two
types of pipelined FFT architecture
can be found in this brief: delay feed
back (DF) and delay commutator
(DC).further according to the
number of inputs data stream paths,
they can be classified in to multi
path (M) and single path (S)
architectures.the two classification
form four kind of pipelined FFT
architecture,are often adopted when
the throughput requirement is
beyond the theoretical limitation that
the single-path architecture can offer
at a given clock frequency.however
,they require concurrent read (write)
operation for the multipath input
(output) data .therefore ,single –path
(s) architecture could be appropriate
in some cases when the system
cannot ensure concurrent operations
.however ,the arithmetic utilization
is relatively low,compared with
100% utilizations of the existing
MDF/MDC architecture.in also
achieve 100% multiplier utilization
by reordering the inner data
sequence.
For single input data stream,the
conventional radix-4 SDF FFT
architecture requires
complex adders and
complex multipliers,where N is the
FFT size.both chang [11] and Liu et
al.[12] present the novel SDC
architecture to reduce 50% complex
adders by reordering inner data
sequence.however the utilization of
the corresponding comple multiplier
still remain 50%for the both
architecture.we therefore study
whether the complex multiplier unit
can be modeified to achieve the
100% utilization.
In the radix-4 FFT architecture
,there is a common observation that
one half data (sum part of butterfly
operation) do not involve complex
multiplication (W ) at all. While
other half (difference part)indeed
involves complex multiplication
(W .hence it has the opportunity
to achieve the objective that reduces
the arithmetic resource of the
conventional complex multiplier by
a factor of 2,leading to 100%
utilization .it is ideal for two
consecutive complex input data to
contain a complex number, which
needs to execute complex
multiplication ,if so w can minimize
the reordering memory requirement
while achieving the above objective
that reduces 50% the arithametic
resurce of complex multipliers.
Fortunately,the improved SDC
architecture can be produce the sum
and corresponding difference results
of a butterfly operation in
consecutive two cycles .the sum part
is directly passed to the next
stages,while the difference parts
need to execute complex
multiplication before passing to the
next stage .therefore,the SDC
architecture is ideal for our efficient
pipelined radix-4 FFT
architecture.however the SDF
architecture does not meet the above
constraint well since the sums of the
all butterflies in the stage are
produced first ,followed by the
corresponding differences.
In this brief ,we present an efficient
combined SDC-MDF radix-4
pipelined architecture,which
includes SDC stages,one
SDF stages,and 1 bit reverser.the
SDC processing engine(SDC PE) in
each SDC stages achieves
100%hardware utilization of both
adders and multipliers .we include
the MDF stage to reorder the data
sequence,and then the delay memory
of the bit reverser is reduced to N/4
.the proposed architecture can be
produce the same normal output .
REVIEW OF PIPELINED FFT ARCHITECTURE
A. FFT review of radix-2 :
Let us considered the computation
of the N=2^v point DFT by the
divide –and conquer approach.we
split the N-point data sequence into
two N/2-point data sequence F1(n)
and F2(n) ,corresponding to the even
–numbered and odd-numbered
samples of X(n),respectively ,that is
F1(n)=X(2n)
F2(n)=X(2n+1), n=0,1,,……..,N/2
– 1
Thus F1(n) and F2(n) are obtained
by decimating X(n) by a factor of
2,and hence the resulting FFT
algorithm is called a decimation –in-
time algorithm.
Now the N-point DFT can be
expressed in terms of the DFTs the
decimated sequences as follows:
the N-point DFT is defined by
X(K)= *
k=0,1,2,……..,N-1,
Where X(n) is the input data .Ԝ
is the coefficient
( =e^-2∏nk/N) and N is any
integer power of two,
+
+
But Ԝ = .with this
substitution ,the equations can be
expressed as
X(K)= 1(m) +
F1(k) + F2(k) ,
K=0,1,,………N-1
whereF1(k)and F2(k) are the N/2
point DFTs of the sequence F1(m)
and F2(m) respectively.
Since F1(k) and F2(k) are periodic ,
with period N/2 , we have
F1(K+N/2)=F1(K) and
F2(K+N/2)=F2(k),in addition ,the
factor = - hence the
equations may be expressed as
X(k) = F1(k) + F2(k),
K=0,1,………N/2-1
X(K+N/2)=F1(k) - F2(k),
K=0,1,……..,N/2-1
We observe that the direct
computation of F1(k) requires
(N/2)^2 complex multiplication ,the
same applies to the computation of
F2(k).furthermore ,there are N2
additional complex multiplication
required to compute
F2(k),hence the computation of X(k)
requires 2(N/2)^2 + N/2 =N^2/2
+N/2 complex multiplications .this
first step results in a reduction of the
number of multiplication from N^2
to N^2/2 + N/2,which is about a
factor of 2 for N large.
By computing N/4 –point DFTs ,we
would obtain the N/2 point DFTs
F1(k) and F2(k) from the relations
F1(k)=F{F1(2n)} +
F{F1(2n+1)},
K=0,1,………N/4-1, n=0,1,…N/4-
1
F1(K+N/4)=F {F1(2n)}-
K=0,1,..N/4-
1, n=0,1,..N/4-1
F2(k)=F{F2(2n)} +
K=0,1….N/4-1 n=0,1…..N/4-1
F2(K+N/4)=F{F2(2n) -
F{F2(2n+1)}, K=0,1..N/4-1
n=0,1….N/4-1
The decimation of the data sequence
can be repeated again and again
until the resulting sequenced are
reduced to one-point sequences , for
N=2^v,this decimation can be
performed V= N times ,thus the
total number of complex
multiplications is reduced to
(N/2) N.the number of complex
addition is N N.
Another important radix-2 FFT
algorithm ,called the decimation-in-
frequence algorithm ,is obtained by
using the divide and-conquer
approach . to derive the algorithm
,we begin by splitting the DFT
formula into two summation ,one of
which involves the sum over the first
N/2 data points and the second sum
involves the last N/2 data points
.thus we obtain
X(K)= +
+
Since =(-1)^k
X(k)=
Now, let us split (decimate )X(k)
into the even- and odd-numbered
samples .thus we obtain
X(2k)
=
X(2k+1)=
Where we have used the fact that
=
The computational procedure above
can be repeated through decimation
of the N/2 –point DFTs X((2k) and
X(2k+1).the entire process involves
V= N stages of decimation
,consequential ,the computation of
the N-point DFT via the decimation
frequencies FFT required
(N/2) N complex multiplications
and N N complex addition
,just as in the decimation -in-time
algorithm
B. FFT review of radix-4 :
When the number of data point N in
the DFT is power of 4
(ie..,N=4^v),we can ,of
course,always use a radix-4
algorithm for the computation
.however ,for this case it is more
efficiently computation to employ a
radix-r FFT algorithm Our purposed
let as drive the radix-4 decimation –
in-frequency algorithm by breaking
the N point DFT formula into four
smaller DFTs.We have
X(k) =
=
+
+
+
= +
+
+
From the definition of the twiddle
factor ,we have
=(-j)^k, =(-
1)^k, =(j)^k,
Thus
X(k)= X(n)+(-j)^k
X(n+N/4)+(-1)^k X(n+N/2) + (j)^k
X(n+3N/4)]
The relation is not an N/4 point DFT
because the twiddle factor depends
on N and not on N/4.to convert it
into N/4 point DFT we subdivide the
DFT sequencies
,X(4k),X(4K+1),X(4K+2), and
X(4K+3),K=0,1,2………N/4.
Thus we obtain the radix-4
decimation-in frequency DFT as
X(K) =
X(4K+1) =
X(4K+2) =
X(4K+3) =
Where we have used the property
= .note that the input to
each N/4 point DFT is a linear
combination of four signal samples
scaled by a twiddle factor ,this
procedure is repeated V times
.where V= N
C.pipelining of radix-4 FFT :
Assuming that the input data enters
the FFT circuit serially in a
continuous flow,those input data
when shifting from one stage to
another stage if its need some higher
hardware utilization of adder and
multipliers.
When we does consider design of
FFT hardware now calculating for
data speed in ever stages .
111.COMBINED SDC-MDF RADIX-4 PIPELINED FFT
For single –input data stream ,we
proposed an efficient combined
SDC-SDF radix-4 pipelined FFT
architecture,and the proposed SDC
PE structure
Can reduced 50% complex
multiplier
A.proposed FFTarchitecture
The proposed FFT architecture
consist of one pre-stage, N/4-
1SDC stages ,one post stages 4MDF
stages ,and 4 bit reverser .the pre-
stage shuffles and complex input
data to a new sequences that consist
of real part followed by the
corresponding imaginary part .the
corresponding post stages shuffles
back the new sequences to the
complex format.the SDC stage t
(t=1,2…… N/4) contains an
SDC PE,which can achieve 100%
arithmetic resource utilization of
both complex adders and complex
multipliers.the last stage,SDF
stage,is identical to the radix-
4SDF,containing a complex adder
and a complex substractor,the data
with an even index are written into
memory in normal order ,and they
are then retrieved from memory in
bit-reversed order while the ones
with an odd index are written in bit
reversed order.final,the even data are
retrieved in normal order.thus,the bit
reverser required only N/4 data
buffers.
The complex input data at cycle m
are (m-r,m-i),where m-r and m-
i(m=0,1,2…15) represent real and
imaginary parts,respectively.we only
include the pre-stages,SDC stage
1,2,3 and post stages since the SDF
stages has the same sequences as the
post stage except the 8 cycle
delay,and the bit reverser,8-cycle
delay over the SDF stages produces
normal output sequences .
B.single path DC processingEngine:
The SDC PE consists of a data
commutator,a real add/sub unit, and
an optimum complex multiplier unit
in order to minimize the arithmetic
resource of the SDC PE,the most
significant factor is to maximize the
arithmetic resource utilization via
reordering the data sequences of the
above three units.
In the stage t,the data commutator
shuffles its input data (Node-A) to
generate a new data sequences
(Node-B),whose index difference is
N/2^t,where t is the index of
stage.the new data sequences (Node-
b)is critical to the real add/sub
unit,where one real adder and one
real subtracter.
For the optimum complex multiplier
unit its output data sequence (Node-
E)should be the same as its input
data sequence(Node-C).if so its
output sequences (Node-E),which is
also the output sequences of the
SDC stages t,can become the direct
input data sequence (Node-A) of the
SDC stages t+1,
C.Optimum Complex Multiplier
Unit:
It contain 4 multiplexer
(M0,M4,M2,M3) 3.0 word memory
(G0,G1,G2,G3),4 real multiplier and
2 real adder and 2 real
subtractor.those signal going of
same path,when has being applied
for input signal(complex and real
data ) these signal when senting
from one stage to another stage now
spreading four signal from total
radix and remain spreading of two
half real part and imaginary
part,adding of first half real part
and imaginary part,remain
subtraction of second half real part
and imaginary part and again these
two half signal will be senting
through same path and those signal
will be reached to buffer ,buffer can
be used for storing the multiple
signal now if its from filtering the
four stages via orthogonal frequency
division multiplexer,every four
stages will be sent to shift register if
inside those data will be reached
through pipelined lined ,again these
whole data will be occupaid SDF,the
multipath dealy feedback can be
used for the whole data transferring
from input to output and those data
will be receiving from output stage
to input stages,this process can be
determining the systolic architecture
and its consist of processing
element.
DATA OUTPUT ORDER OF THE PROPOSED PIPELINED ARCHITECTURE FROM
PRESTAGE TO STAGE N/4-1 OF 16 POINT FFT,
Cycles Digital input 1st stages 2nd stages
0 0000 12r,0i 0r,0i
0 0001 12r,0i 0r,0i
0 0010 14r,0i 0r,0i
0 0011 14r,0i 0r,0i
0 0100 16r,0i 0r,0i
1 0101 16r,0i 0r,8.65i
2 0110 18r,0i 0r,0i
3 0111 18r,0i 0r,-9.738i
0 1000 -12r,0i 0r,0i
2 1001 -4r,0i 0r,0i
4 1010 -12r,0i 12r,0i
6 1011 -4r,0i 0r,0i
0 1100 -12r,0i 0r,0i
3 1101 -4r,0i 0r,2.164i
6 1110 -12r,0i 0r,0i
9 1111 -4r,0i 0r,-0.496i
DATA SEQUENCE FROM PRE STAGE TO BIT REVERSER
CYCLES DIGITAL
INPUT
1ST
STAGES
2ND
STAGES
TWIDDLE
FACTOR
BIT
REVERSER
0 0000 12r,0i 0r,0i *0 0
0 0001 12r,0i 0r,0i *0 0
0 0010 14r,0i 0r,0i *0 0
0 0011 14r,0i 0r,0i *0 0
0 0100 16r,0i 0r,0i *0 0
1 0101 16r,0i 0r,8.656i *0 0
2 0110 18r,0i 0r,0i *0 0
3 0111 18r,0i 0r,-9.738i *0 0
0 1000 -12r,0i 0r,0i *0 0
2 1001 -4r,0i 0r,0i *0 0
4 1010 -12r,0i 12r,0i *0 0
6 1011 -4r,0i 0r,0i *0 0
0 1100 -12r,0i 0r,0i *0 0
3 1101 -4r,0i 0r,2.164i *0 0
6 1110 -12r,0i 0r,0i *0 0
9 1111 -4r,0i 0r,-0.496i *0 0
Hardware resource comparison for the various pipelined FFT architecture
ARCHITECTURE INTERNAL
MEMORY
OVERALL
MEMORY
ADDER GENERAL
MULTIPLIER
(UTILIZATION)
CONSTANT
MULTIPLIER
THROUGH
PUT
LATEN
CY
CRITICAL
PATH
DELAY
R4 SDF N/4-1 4N/4-1
(50%)
NIL 4/N 4N-1 + +
R4 SDC 4N/4-4 16N/4-4
(50%)
NIL 4/N N + +
CHANG 3.0N 4N
(50%)
NIL 4/N 4N + +
LIU 3.0N + 4X 4N+4X
(50%)
NIL 4/N 4N+4X + +
N/4-1 4N/4-1
(75%)
NIL 4/N 4N-1 + +
PROPOSED 3.0N+3.0X 3.0N+3.0X
(100%)
NIL 4/N
4N+
+ +
COMPARISIONS OF TRANSISTORS REQUIREMENT AND LATENCY
ARCHITECTURE COMPONENTS TRANSISTORS LATENCY TRANSISTORS
LATENCY
CHANG 1024 16-bit SRAMs
32 16-bit Adders
28 16-bit multipliers
230748
(135%) 512
118142976
(133%)
LIU 1048 16-bit SRAMs
32 16-bit Adders
28 16-bit multipliers
233052
(136%) 524
122119248
(138%)
1022 16-bit SRAMs
32 16-bit Adders
12 16-bit multipliers
167138
(98%) 511
85407518
(96%)
1192 16-bit SRAMs
22 16-bit Adders
12 16-bit multipliers
175378
(103%) 591
103648398
(117%)
R2^3 SDF 1022 16-bit SRAMs
37.6 16-bit Adders
11.2 16-bit multipliers
163614
(96%) 511
83606754
(94%)
R2^4 SDF 1048 16-bit SRAMs
35.6 16-bit Adders
7.2 16-bit multipliers
145992
(85%) 511
74601912
(84%)
PROPOSED 1045 16-bit SRAMs
25 16-bit Adders
14 16-bit multipliers
171087
(100%) 519
88794153
(100%)
AREA AND PERFORMANCE OF THE PROPOSED FFT ARCHITECTURE
FOR 16 BITS,
FFT
SIZE
LUTs FFs DSPs BRAMs FREQ
(MHZ)
LATENCY
(ns)
16 672 522 4 0 322 140
64 1110 752 8 0 303 498
256 1733 1073 12 0 297 1834
1024 2804 1589 16 3 298 7028
4096 8391 2780 20 4 295 27975
ANALYSING SIGNAL FLOWING
OF RADIX-4 DIF FFT
*considering signal flow of from
one stage to another stage via
butterfly diagram,
*add and subtracting the real part
and imaginary part of each four
stages
*if its either considering the twiddle
factor,it has being complex value
*getting each real part imaginary
part in every stages and its with
multiplying complex value and its
from getting on and whole
signal can be stored in buffer and
adding of multiplexer and filtering
the signal and those signal will be
flowing through pipelined structure
now it can be used for shift register
and if its from shifting the one
channels,every channels will be
contained four set of signal, and
those signal will be will passes to
MDF path and its via every channels
can be occupaid bit reversal
purposes.
CONCLUSION
We propose a combined SDC-MDF
pipelined FFT architecture which
produces the output data in the
normal order,the proposed SDC PE
mainly reduces 50% complex
multipliers,compared with the other
radix-4 DIF FFT design,therefore
,the proposed FFT architecture is
very attractive for single path
pipelined radix-4 FFT processors
with the input and output sequence
in normal order.
REFERENCES
[1] L.J.cimini”analysis and simulation of digital mobile channel using
orthogonal frequency multiplexing IEEE trans communication vol.33,
no.7,pp.665-675,jul 1985.
[2] J.M.Cioffi,the communication hand book.Boca Raton,FL,USA CRC
press,1997.
[3] Y.W.Lin, H.Y.Liu, and C.Y Lee,”A 1-GS/s FFT/IFFT processor for UWB
applications”,IEEE.J. solid state circuits,vol.40,n0. 8pp.1726 -1735,aug2005.
[4] C.cheng and K.K,parhi,”high throughput VLSI architecture For FFT
computation ,”IEEE trans.circuit syst.11,Exp,briefs,Vol.54,no.10pp.339-
344,oct.2007.
[5]S.N.Tang, J.W.Tsai, and T.Y.Chang,”A2.4-GS/s FFT processor for Ofdm
based WPAN applications,”IEEE trans.Circuit syst.11,Exp briefs,
vol.57,no.6,pp 451-455,jun2010.
[6]L.R. rabiner and B.Gold,Theory and applications of digital signal processing
.englewood cliffs,Nj,USA;prentice-Hall,1975pp.604-609.
[7] E.H,Wold and A.M.Despain “pipelined and parallel-pipelined FFT
processor for VLSI implementation ,”IEEE trans,Comput,,,Vol,C-
33,no.5,pp.414-426,may 1984.
[8]T.Sansaloni,A.Perez-Pascual,V.Torres and j.valls,”Efficient pipelined
processor for WLAN MIMO-OFDM systems”Electron
Lett.,Vol.41,no.19,pp.1043-1044,sep.2005.
[9]A.M. Despain “Fourier Transform computer using CORDIC iterations “
IEEE trans .Comput…Vol.C-23,no.10,pp 993-1001, oct 1974,
[10]N.H.E Weste and D.Harris.CMOS VLSI DESIGN : A circuit and systems
Perspective .Boston.MA,USA:Addison-Wesley.2005

More Related Content

What's hot

64 point fft chip
64 point fft chip64 point fft chip
64 point fft chipShalyJ
 
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidXavier Davias
 
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...Cooperative underlay cognitive radio assisted NOMA: secondary network improve...
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...TELKOMNIKA JOURNAL
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transformAshraf Khan
 
Hybrid protocol for wireless EH network over weibull fading channel: performa...
Hybrid protocol for wireless EH network over weibull fading channel: performa...Hybrid protocol for wireless EH network over weibull fading channel: performa...
Hybrid protocol for wireless EH network over weibull fading channel: performa...IJECEIAES
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmeSAT Publishing House
 
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET -  	  Distributed Arithmetic Method for Complex MultiplicationIRJET -  	  Distributed Arithmetic Method for Complex Multiplication
IRJET - Distributed Arithmetic Method for Complex MultiplicationIRJET Journal
 
fft using labview
fft using labviewfft using labview
fft using labviewkiranrockz
 
Local Vibrational Modes
Local Vibrational ModesLocal Vibrational Modes
Local Vibrational ModesDani Setiawan
 
NetSim Technology Library- Propagation models
NetSim Technology Library- Propagation modelsNetSim Technology Library- Propagation models
NetSim Technology Library- Propagation modelsVishal Sharma
 
hankel_norm approximation_fir_ ijc
hankel_norm approximation_fir_ ijchankel_norm approximation_fir_ ijc
hankel_norm approximation_fir_ ijcVasilis Tsoulkas
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 

What's hot (19)

Daubechies wavelets
Daubechies waveletsDaubechies wavelets
Daubechies wavelets
 
Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
 
64 point fft chip
64 point fft chip64 point fft chip
64 point fft chip
 
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
 
Dif fft
Dif fftDif fft
Dif fft
 
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...Cooperative underlay cognitive radio assisted NOMA: secondary network improve...
Cooperative underlay cognitive radio assisted NOMA: secondary network improve...
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transform
 
Hybrid protocol for wireless EH network over weibull fading channel: performa...
Hybrid protocol for wireless EH network over weibull fading channel: performa...Hybrid protocol for wireless EH network over weibull fading channel: performa...
Hybrid protocol for wireless EH network over weibull fading channel: performa...
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
 
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET -  	  Distributed Arithmetic Method for Complex MultiplicationIRJET -  	  Distributed Arithmetic Method for Complex Multiplication
IRJET - Distributed Arithmetic Method for Complex Multiplication
 
fft using labview
fft using labviewfft using labview
fft using labview
 
Local Vibrational Modes
Local Vibrational ModesLocal Vibrational Modes
Local Vibrational Modes
 
NetSim Technology Library- Propagation models
NetSim Technology Library- Propagation modelsNetSim Technology Library- Propagation models
NetSim Technology Library- Propagation models
 
hankel_norm approximation_fir_ ijc
hankel_norm approximation_fir_ ijchankel_norm approximation_fir_ ijc
hankel_norm approximation_fir_ ijc
 
Properties of dft
Properties of dftProperties of dft
Properties of dft
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
Fft
FftFft
Fft
 

Similar to A combined sdc

IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT VLSICS Design
 
Fast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSPFast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSProykousik2020
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...TELKOMNIKA JOURNAL
 
lec08_computation_of_DFT.pdf
lec08_computation_of_DFT.pdflec08_computation_of_DFT.pdf
lec08_computation_of_DFT.pdfshannlevia123
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
 
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET Journal
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET Journal
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTAM Publications
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIijeei-iaes
 
3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transformWiw Miu
 

Similar to A combined sdc (20)

IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT
 
Fast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSPFast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSP
 
G010233540
G010233540G010233540
G010233540
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
 
B1030610
B1030610B1030610
B1030610
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
 
lec08_computation_of_DFT.pdf
lec08_computation_of_DFT.pdflec08_computation_of_DFT.pdf
lec08_computation_of_DFT.pdf
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
 
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
 
DFT.pptx
DFT.pptxDFT.pptx
DFT.pptx
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFT
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
 
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless ApplicationDesign Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
 
Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
Res701 research methodology fft1
Res701 research methodology fft1Res701 research methodology fft1
Res701 research methodology fft1
 
3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform
 
Fft ppt
Fft pptFft ppt
Fft ppt
 

A combined sdc

  • 1. A Combined SDC-SDF Architecture For Normal I/O Pipelined Radix-4 FFT S.MAGESHKUMAR, Department of ECE, Asan Memorial College Of Engineering And Technology, Sakthikumaran06@gmail.com. Abstract We present an efficient combined single-path delay commutator and multi-path delay feedback (SDC- SDF) radix-4 pipelined fast fourier transform architecture.which includes SDC stages,and one SDF stages .The SDC processing engine is proposed to achieve 100% hardware resource utilization by sharing the common arithmetic resource in the time- multiplexed approach,including both adders and multipliers is reduced to compared with for the other radix-4 SDC-SDF architecture .in addition the proposed architecture requires roughly minimum number of complex adders and complex delay memory 4N+3.0 . Intex Terms-Fast Fourier Transform (FFT),pipelined architecture ,single path delay communicator processing elements (SDC PF). Introduction Fast Fourier Transform(FFT) has played a significant role in digital signal processing field,especially in the advanced communication systems,such as orthogonal frequency multiplexing
  • 2. (OFDM),and asymmetric digital subscriber line .all these systems require that the FFT computation must be high throughput and low latency.therefore ,designing a high performance FFT circuit is an efficient solution to the abovementioned problems,in particular the pipelined FFT architecture have mainly been adopted to address the difficulties Due to their attractive .such as small chip area .high throughput ,and power consumption. To the best our knowledge ,two types of pipelined FFT architecture can be found in this brief: delay feed back (DF) and delay commutator (DC).further according to the number of inputs data stream paths, they can be classified in to multi path (M) and single path (S) architectures.the two classification form four kind of pipelined FFT architecture,are often adopted when the throughput requirement is beyond the theoretical limitation that the single-path architecture can offer at a given clock frequency.however ,they require concurrent read (write) operation for the multipath input (output) data .therefore ,single –path (s) architecture could be appropriate in some cases when the system cannot ensure concurrent operations .however ,the arithmetic utilization is relatively low,compared with 100% utilizations of the existing MDF/MDC architecture.in also achieve 100% multiplier utilization by reordering the inner data sequence. For single input data stream,the conventional radix-4 SDF FFT architecture requires complex adders and complex multipliers,where N is the FFT size.both chang [11] and Liu et al.[12] present the novel SDC architecture to reduce 50% complex adders by reordering inner data sequence.however the utilization of the corresponding comple multiplier still remain 50%for the both
  • 3. architecture.we therefore study whether the complex multiplier unit can be modeified to achieve the 100% utilization. In the radix-4 FFT architecture ,there is a common observation that one half data (sum part of butterfly operation) do not involve complex multiplication (W ) at all. While other half (difference part)indeed involves complex multiplication (W .hence it has the opportunity to achieve the objective that reduces the arithmetic resource of the conventional complex multiplier by a factor of 2,leading to 100% utilization .it is ideal for two consecutive complex input data to contain a complex number, which needs to execute complex multiplication ,if so w can minimize the reordering memory requirement while achieving the above objective that reduces 50% the arithametic resurce of complex multipliers. Fortunately,the improved SDC architecture can be produce the sum and corresponding difference results of a butterfly operation in consecutive two cycles .the sum part is directly passed to the next stages,while the difference parts need to execute complex multiplication before passing to the next stage .therefore,the SDC architecture is ideal for our efficient pipelined radix-4 FFT architecture.however the SDF architecture does not meet the above constraint well since the sums of the all butterflies in the stage are produced first ,followed by the corresponding differences. In this brief ,we present an efficient combined SDC-MDF radix-4 pipelined architecture,which includes SDC stages,one SDF stages,and 1 bit reverser.the SDC processing engine(SDC PE) in each SDC stages achieves 100%hardware utilization of both adders and multipliers .we include
  • 4. the MDF stage to reorder the data sequence,and then the delay memory of the bit reverser is reduced to N/4 .the proposed architecture can be produce the same normal output . REVIEW OF PIPELINED FFT ARCHITECTURE A. FFT review of radix-2 : Let us considered the computation of the N=2^v point DFT by the divide –and conquer approach.we split the N-point data sequence into two N/2-point data sequence F1(n) and F2(n) ,corresponding to the even –numbered and odd-numbered samples of X(n),respectively ,that is F1(n)=X(2n) F2(n)=X(2n+1), n=0,1,,……..,N/2 – 1 Thus F1(n) and F2(n) are obtained by decimating X(n) by a factor of 2,and hence the resulting FFT algorithm is called a decimation –in- time algorithm. Now the N-point DFT can be expressed in terms of the DFTs the decimated sequences as follows: the N-point DFT is defined by X(K)= * k=0,1,2,……..,N-1, Where X(n) is the input data .Ԝ is the coefficient ( =e^-2∏nk/N) and N is any integer power of two, + + But Ԝ = .with this substitution ,the equations can be expressed as X(K)= 1(m) +
  • 5. F1(k) + F2(k) , K=0,1,,………N-1 whereF1(k)and F2(k) are the N/2 point DFTs of the sequence F1(m) and F2(m) respectively. Since F1(k) and F2(k) are periodic , with period N/2 , we have F1(K+N/2)=F1(K) and F2(K+N/2)=F2(k),in addition ,the factor = - hence the equations may be expressed as X(k) = F1(k) + F2(k), K=0,1,………N/2-1 X(K+N/2)=F1(k) - F2(k), K=0,1,……..,N/2-1 We observe that the direct computation of F1(k) requires (N/2)^2 complex multiplication ,the same applies to the computation of F2(k).furthermore ,there are N2 additional complex multiplication required to compute F2(k),hence the computation of X(k) requires 2(N/2)^2 + N/2 =N^2/2 +N/2 complex multiplications .this first step results in a reduction of the number of multiplication from N^2 to N^2/2 + N/2,which is about a factor of 2 for N large. By computing N/4 –point DFTs ,we would obtain the N/2 point DFTs F1(k) and F2(k) from the relations F1(k)=F{F1(2n)} + F{F1(2n+1)}, K=0,1,………N/4-1, n=0,1,…N/4- 1 F1(K+N/4)=F {F1(2n)}- K=0,1,..N/4- 1, n=0,1,..N/4-1 F2(k)=F{F2(2n)} + K=0,1….N/4-1 n=0,1…..N/4-1 F2(K+N/4)=F{F2(2n) - F{F2(2n+1)}, K=0,1..N/4-1 n=0,1….N/4-1
  • 6. The decimation of the data sequence can be repeated again and again until the resulting sequenced are reduced to one-point sequences , for N=2^v,this decimation can be performed V= N times ,thus the total number of complex multiplications is reduced to (N/2) N.the number of complex addition is N N. Another important radix-2 FFT algorithm ,called the decimation-in- frequence algorithm ,is obtained by using the divide and-conquer approach . to derive the algorithm ,we begin by splitting the DFT formula into two summation ,one of which involves the sum over the first N/2 data points and the second sum involves the last N/2 data points .thus we obtain X(K)= + + Since =(-1)^k X(k)= Now, let us split (decimate )X(k) into the even- and odd-numbered samples .thus we obtain X(2k) = X(2k+1)=
  • 7. Where we have used the fact that = The computational procedure above can be repeated through decimation of the N/2 –point DFTs X((2k) and X(2k+1).the entire process involves V= N stages of decimation ,consequential ,the computation of the N-point DFT via the decimation frequencies FFT required (N/2) N complex multiplications and N N complex addition ,just as in the decimation -in-time algorithm B. FFT review of radix-4 : When the number of data point N in the DFT is power of 4 (ie..,N=4^v),we can ,of course,always use a radix-4 algorithm for the computation .however ,for this case it is more efficiently computation to employ a radix-r FFT algorithm Our purposed let as drive the radix-4 decimation – in-frequency algorithm by breaking the N point DFT formula into four smaller DFTs.We have X(k) = = + + + = + + + From the definition of the twiddle factor ,we have =(-j)^k, =(- 1)^k, =(j)^k, Thus
  • 8. X(k)= X(n)+(-j)^k X(n+N/4)+(-1)^k X(n+N/2) + (j)^k X(n+3N/4)] The relation is not an N/4 point DFT because the twiddle factor depends on N and not on N/4.to convert it into N/4 point DFT we subdivide the DFT sequencies ,X(4k),X(4K+1),X(4K+2), and X(4K+3),K=0,1,2………N/4. Thus we obtain the radix-4 decimation-in frequency DFT as X(K) = X(4K+1) = X(4K+2) = X(4K+3) = Where we have used the property = .note that the input to each N/4 point DFT is a linear combination of four signal samples scaled by a twiddle factor ,this procedure is repeated V times .where V= N C.pipelining of radix-4 FFT : Assuming that the input data enters the FFT circuit serially in a continuous flow,those input data when shifting from one stage to another stage if its need some higher
  • 9. hardware utilization of adder and multipliers. When we does consider design of FFT hardware now calculating for data speed in ever stages . 111.COMBINED SDC-MDF RADIX-4 PIPELINED FFT For single –input data stream ,we proposed an efficient combined SDC-SDF radix-4 pipelined FFT architecture,and the proposed SDC PE structure Can reduced 50% complex multiplier A.proposed FFTarchitecture The proposed FFT architecture consist of one pre-stage, N/4- 1SDC stages ,one post stages 4MDF stages ,and 4 bit reverser .the pre- stage shuffles and complex input data to a new sequences that consist of real part followed by the corresponding imaginary part .the corresponding post stages shuffles back the new sequences to the complex format.the SDC stage t (t=1,2…… N/4) contains an SDC PE,which can achieve 100% arithmetic resource utilization of both complex adders and complex multipliers.the last stage,SDF stage,is identical to the radix- 4SDF,containing a complex adder and a complex substractor,the data with an even index are written into memory in normal order ,and they are then retrieved from memory in bit-reversed order while the ones with an odd index are written in bit reversed order.final,the even data are retrieved in normal order.thus,the bit reverser required only N/4 data buffers. The complex input data at cycle m are (m-r,m-i),where m-r and m- i(m=0,1,2…15) represent real and imaginary parts,respectively.we only include the pre-stages,SDC stage 1,2,3 and post stages since the SDF stages has the same sequences as the post stage except the 8 cycle
  • 10. delay,and the bit reverser,8-cycle delay over the SDF stages produces normal output sequences . B.single path DC processingEngine: The SDC PE consists of a data commutator,a real add/sub unit, and an optimum complex multiplier unit in order to minimize the arithmetic resource of the SDC PE,the most significant factor is to maximize the arithmetic resource utilization via reordering the data sequences of the above three units. In the stage t,the data commutator shuffles its input data (Node-A) to generate a new data sequences (Node-B),whose index difference is N/2^t,where t is the index of stage.the new data sequences (Node- b)is critical to the real add/sub unit,where one real adder and one real subtracter. For the optimum complex multiplier unit its output data sequence (Node- E)should be the same as its input data sequence(Node-C).if so its output sequences (Node-E),which is also the output sequences of the SDC stages t,can become the direct input data sequence (Node-A) of the SDC stages t+1, C.Optimum Complex Multiplier Unit: It contain 4 multiplexer (M0,M4,M2,M3) 3.0 word memory (G0,G1,G2,G3),4 real multiplier and 2 real adder and 2 real subtractor.those signal going of same path,when has being applied for input signal(complex and real data ) these signal when senting from one stage to another stage now spreading four signal from total radix and remain spreading of two half real part and imaginary part,adding of first half real part and imaginary part,remain subtraction of second half real part and imaginary part and again these two half signal will be senting through same path and those signal will be reached to buffer ,buffer can be used for storing the multiple
  • 11. signal now if its from filtering the four stages via orthogonal frequency division multiplexer,every four stages will be sent to shift register if inside those data will be reached through pipelined lined ,again these whole data will be occupaid SDF,the multipath dealy feedback can be used for the whole data transferring from input to output and those data will be receiving from output stage to input stages,this process can be determining the systolic architecture and its consist of processing element. DATA OUTPUT ORDER OF THE PROPOSED PIPELINED ARCHITECTURE FROM PRESTAGE TO STAGE N/4-1 OF 16 POINT FFT, Cycles Digital input 1st stages 2nd stages 0 0000 12r,0i 0r,0i 0 0001 12r,0i 0r,0i 0 0010 14r,0i 0r,0i 0 0011 14r,0i 0r,0i 0 0100 16r,0i 0r,0i 1 0101 16r,0i 0r,8.65i 2 0110 18r,0i 0r,0i 3 0111 18r,0i 0r,-9.738i 0 1000 -12r,0i 0r,0i 2 1001 -4r,0i 0r,0i 4 1010 -12r,0i 12r,0i 6 1011 -4r,0i 0r,0i 0 1100 -12r,0i 0r,0i 3 1101 -4r,0i 0r,2.164i 6 1110 -12r,0i 0r,0i 9 1111 -4r,0i 0r,-0.496i DATA SEQUENCE FROM PRE STAGE TO BIT REVERSER CYCLES DIGITAL INPUT 1ST STAGES 2ND STAGES TWIDDLE FACTOR BIT REVERSER 0 0000 12r,0i 0r,0i *0 0
  • 12. 0 0001 12r,0i 0r,0i *0 0 0 0010 14r,0i 0r,0i *0 0 0 0011 14r,0i 0r,0i *0 0 0 0100 16r,0i 0r,0i *0 0 1 0101 16r,0i 0r,8.656i *0 0 2 0110 18r,0i 0r,0i *0 0 3 0111 18r,0i 0r,-9.738i *0 0 0 1000 -12r,0i 0r,0i *0 0 2 1001 -4r,0i 0r,0i *0 0 4 1010 -12r,0i 12r,0i *0 0 6 1011 -4r,0i 0r,0i *0 0 0 1100 -12r,0i 0r,0i *0 0 3 1101 -4r,0i 0r,2.164i *0 0 6 1110 -12r,0i 0r,0i *0 0 9 1111 -4r,0i 0r,-0.496i *0 0 Hardware resource comparison for the various pipelined FFT architecture ARCHITECTURE INTERNAL MEMORY OVERALL MEMORY ADDER GENERAL MULTIPLIER (UTILIZATION) CONSTANT MULTIPLIER THROUGH PUT LATEN CY CRITICAL PATH DELAY R4 SDF N/4-1 4N/4-1 (50%) NIL 4/N 4N-1 + + R4 SDC 4N/4-4 16N/4-4 (50%) NIL 4/N N + + CHANG 3.0N 4N (50%) NIL 4/N 4N + + LIU 3.0N + 4X 4N+4X (50%) NIL 4/N 4N+4X + + N/4-1 4N/4-1 (75%) NIL 4/N 4N-1 + + PROPOSED 3.0N+3.0X 3.0N+3.0X (100%) NIL 4/N 4N+ + + COMPARISIONS OF TRANSISTORS REQUIREMENT AND LATENCY ARCHITECTURE COMPONENTS TRANSISTORS LATENCY TRANSISTORS LATENCY
  • 13. CHANG 1024 16-bit SRAMs 32 16-bit Adders 28 16-bit multipliers 230748 (135%) 512 118142976 (133%) LIU 1048 16-bit SRAMs 32 16-bit Adders 28 16-bit multipliers 233052 (136%) 524 122119248 (138%) 1022 16-bit SRAMs 32 16-bit Adders 12 16-bit multipliers 167138 (98%) 511 85407518 (96%) 1192 16-bit SRAMs 22 16-bit Adders 12 16-bit multipliers 175378 (103%) 591 103648398 (117%) R2^3 SDF 1022 16-bit SRAMs 37.6 16-bit Adders 11.2 16-bit multipliers 163614 (96%) 511 83606754 (94%) R2^4 SDF 1048 16-bit SRAMs 35.6 16-bit Adders 7.2 16-bit multipliers 145992 (85%) 511 74601912 (84%) PROPOSED 1045 16-bit SRAMs 25 16-bit Adders 14 16-bit multipliers 171087 (100%) 519 88794153 (100%) AREA AND PERFORMANCE OF THE PROPOSED FFT ARCHITECTURE FOR 16 BITS, FFT SIZE LUTs FFs DSPs BRAMs FREQ (MHZ) LATENCY (ns) 16 672 522 4 0 322 140 64 1110 752 8 0 303 498 256 1733 1073 12 0 297 1834 1024 2804 1589 16 3 298 7028 4096 8391 2780 20 4 295 27975
  • 14. ANALYSING SIGNAL FLOWING OF RADIX-4 DIF FFT *considering signal flow of from one stage to another stage via butterfly diagram, *add and subtracting the real part and imaginary part of each four stages *if its either considering the twiddle factor,it has being complex value *getting each real part imaginary part in every stages and its with multiplying complex value and its from getting on and whole signal can be stored in buffer and adding of multiplexer and filtering the signal and those signal will be flowing through pipelined structure now it can be used for shift register and if its from shifting the one channels,every channels will be contained four set of signal, and those signal will be will passes to MDF path and its via every channels can be occupaid bit reversal purposes. CONCLUSION We propose a combined SDC-MDF pipelined FFT architecture which produces the output data in the normal order,the proposed SDC PE mainly reduces 50% complex multipliers,compared with the other radix-4 DIF FFT design,therefore ,the proposed FFT architecture is very attractive for single path pipelined radix-4 FFT processors with the input and output sequence in normal order.
  • 15. REFERENCES [1] L.J.cimini”analysis and simulation of digital mobile channel using orthogonal frequency multiplexing IEEE trans communication vol.33, no.7,pp.665-675,jul 1985. [2] J.M.Cioffi,the communication hand book.Boca Raton,FL,USA CRC press,1997. [3] Y.W.Lin, H.Y.Liu, and C.Y Lee,”A 1-GS/s FFT/IFFT processor for UWB applications”,IEEE.J. solid state circuits,vol.40,n0. 8pp.1726 -1735,aug2005. [4] C.cheng and K.K,parhi,”high throughput VLSI architecture For FFT computation ,”IEEE trans.circuit syst.11,Exp,briefs,Vol.54,no.10pp.339- 344,oct.2007. [5]S.N.Tang, J.W.Tsai, and T.Y.Chang,”A2.4-GS/s FFT processor for Ofdm based WPAN applications,”IEEE trans.Circuit syst.11,Exp briefs, vol.57,no.6,pp 451-455,jun2010. [6]L.R. rabiner and B.Gold,Theory and applications of digital signal processing .englewood cliffs,Nj,USA;prentice-Hall,1975pp.604-609. [7] E.H,Wold and A.M.Despain “pipelined and parallel-pipelined FFT processor for VLSI implementation ,”IEEE trans,Comput,,,Vol,C- 33,no.5,pp.414-426,may 1984. [8]T.Sansaloni,A.Perez-Pascual,V.Torres and j.valls,”Efficient pipelined processor for WLAN MIMO-OFDM systems”Electron Lett.,Vol.41,no.19,pp.1043-1044,sep.2005. [9]A.M. Despain “Fourier Transform computer using CORDIC iterations “
  • 16. IEEE trans .Comput…Vol.C-23,no.10,pp 993-1001, oct 1974, [10]N.H.E Weste and D.Harris.CMOS VLSI DESIGN : A circuit and systems Perspective .Boston.MA,USA:Addison-Wesley.2005