SlideShare a Scribd company logo
1 of 9
Download to read offline
International Review on Computers and Software (I.RE.CO.S.), Vol. 8, N. 7
ISSN 1828-6003 July 2013
Manuscript received and revised June 2013, accepted July 2013 Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved
1718
Memory Based Hardware Efficient Implementation of FIR Filters
K. G. Shanthi, N. Nagarajan
Abstract – Finite impulse response (FIR) digital filters are key components used in many digital
signal processing (DSP) systems because of their linear phase, stability, fewer finite precision
errors and regular structure. The real time realization of FIR filter with less hardware
requirement and less latency has become very critical with increasing developments in very large
scale integration (VLSI) technology. The objective of this paper to explore the current trends in the
development of algorithms and architectures for memory based realization of FIR filters that are
mainly concerned with reducing the overall area-delay-power complexities. The purpose of this
study is to compare these architectures based on ROM size, delay and throughput. The results
presented here would assist the researchers in the field of Digital Signal processing to select best
architecture for an application based on requirements. New algorithms and architectures need to
be developed to design area-delay-power-efficient FIR filters for various demanding DSP
applications. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.
Keywords: Finite Impulse Response Filter, Field Programmable Gate Arrays (FPGA), Application
Specific Integrated Circuit (ASIC), Distributed Arithmetic (DA), Lookup Table (LUT)
Nomenclature
y[n] The FIR Filter Output
N Order of the Filter
Ci Constant coefficients
Xi Input data
B Input Word length
I. Introduction
Digital signal processing (DSP) is playing a vital role
in the significant advancements of digital technology
taking place currently around the world. Digital
communication, speech and image data compression,
speech recognition, spectral estimation and analysis,
adaptive filtering applications, wired and wireless
communication, multimedia systems, biomedical
instrumentation, satellite and aerospace control, remote
sensing are the major areas where DSP has created a
major impact [1].
The increased daily use of digital technology has led
to the development of improved algorithms and
architectures to design the DSP systems with less power
dissipation, higher speed performance and less area
complexity. Several architectural solutions have been
made to minimize the arithmetic complexities of the
algorithms in order to reduce the overall area-delay-
power complexities [2]. Finite impulse response (FIR)
filter is used as a basic tool in many DSP applications.
Digital filters are used to modify signal characteristics
in time or frequency domain and are used in many DSP
systems to perform signal preconditioning, anti-aliasing,
band selection, interpolation, low-pass filtering etc [1].
Traditionally, the design methods were mainly
focused on multiplier-based architectures to implement
the Multiply-and-Accumulate (MAC) blocks that
constitute the central piece in FIR filters and several DSP
functions. These multipliers consume most of the
resources of the system and also involve most of the
computation-time. The number of multiply and
accumulate operations required per filter output increases
with the filter order and thereby real time
implementations of these filters is a challenging task.
A discrete-time linear finite impulse response (FIR)
filter generates the output y[n] as a sum of delayed and
scaled input samples x[n].A N- tap FIR digital filter is
represented as:
     
1
0
N
i
y n c i x n i


  (1)
where y[n] is the FIR filter output, c[i] represents the
filter coefficients, x[n-i] is the input data and n is the time
index starting from 0. A direct implementation of Eq. (1)
requires N Multiply-and-Accumulate blocks, which is
expensive in terms of area and speed.
To resolve this problem many multiplier-less
architectures were proposed in the recent years which are
broadly classified in to two basic categories according to
how they manipulate the filter coefficients for the
multiply operation. The first type of multiplier-less
technique is the conversion-based approach and the
second type is memory based implementation approach.
For the past one decade, there has been a growing
trend to implement DSP functions in Field
Programmable Gate Arrays (FPGAs) rather than on
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1719
Application specific integrated circuits (ASIC) and DSP
chips.
The implementation on ASICs is not preferred due to
high development costs and time-to-market factors.
Sequential-execution architecture of programmable DSP
processors prevents them from achieving the desired
performance. In this context, FPGA platform provides a
very attractive solution that balance high flexibility with
the option to reconfigure, time-to-market, cost and
performance [3].
This paper is organized as follows: In Section 2, a
brief overview of the conversion-based multiplier-less
FIR filters is presented. Section 3 explores the
algorithmic aspects and architectural approach of
memory based FIR filters and an in-depth review of FIR
filters based on DA. Finally the Conclusion is presented
in Section 4.
II. Conversion-Based Multiplier-Less
Implementation of FIR Filters
In this approach the coefficients are transformed to
other numeric representations so that the multiplications
are implemented with adder/subtractors and shifters. A
coefficient in "n-bit" signed-digit representation can be
written as:
1
0
2
n-
i
i
i
C b

  (2)
where bi is taken from the set {-1 ,0 ,1 }.
The representation that has minimum non-zero digits
and no consecutive non-zero digits is known as the
canonic signed-digit (CSD) representation[2]. Since in
shift and add multiplication, non-zero digits represent
additions (or subtractions), CSD therefore is significantly
more efficient in adders than binary representations.
Multipliers [4] in the filter whose coefficients are
expressed as canonic signed digit code are realized with
wired-shifters, adders and subtractors.
Common subexpression elimination [CSE] is a
numerical transformation of the constant multiplications
that can lead to efficient hardware implementations in
terms of area, power and speed [5]-[8]. Subexpression
elimination can only be performed on constant
multiplications that operate on a common variable. It is
the process of examining the shift and add
implementations of constant multiplications and finding
the redundant operations.
Once the redundancies are found, these operations can
be performed once and can be shared among the constant
multiplications so that number of adders and shifters for
implementation are minimized. Common subexpression
(CSE) techniques attempt to minimize the number of
additions in the multiplier block by reusing terms. These
terms can be canonic signed digit (CSD) [5], minimal
signed digit (MSD), or all signed digit (ASD) [7].
Multiplierless FIR Filter Design Algorithms by
Malcolm D. Macleod, and Andrew G. Dempster
introduced a new CSE algorithm, which searches a
bounded number of Minimal Signed Digit (MSD)
representations [8]. Douglas L. Maskell, Jussipekka
Leiwo and Jagdish C. Patra [9] reduced both the
coefficient word length and the number of non-zero bits
in the filter coefficients so that the adder step can be
minimized that resulted in reducing the hardware
complexity of linear phase FIR digital filters.
III. Algorithms and Architectures
for Memory Based FIR Filters
The memory based approach involves the use of
memories (RAMs, ROMs) or Look-Up Tables (LUTs)
that store pre-computed values that can be readout for
multiplication operation. With the advancements in the
VLSI technology, the semiconductor memory has
become cheaper, faster and more efficient in terms power
dissipation.
Memory-based FIR filters consequently are gaining
substantial popularity in the DSP environment.
These filters result in high-throughput and reduced-
latency since the memory-access time is usually very
much shorter compared with multiplication time. They
have much less dynamic power consumption due to
minimal switching activities associated in obtaining the
output product/inner product values by memory read
operations. There are two types of memory based FIR
filters. One of the techniques is the direct memory-based
implementation of FIR filters [10], while the other is
based on distributed arithmetic (DA).
III.1. Direct-Memory-Based FIR Filters
In the direct-memory-based implementations [10], the
multiplications of input values with the fixed coefficients
can be replaced by a ROM or look-up-table (LUT) which
contains the pre-computed product values for all possible
values of input samples. Let X be an input word to be
multiplied with a W-bit fixed coefficient C. If X is
assumed to be an unsigned binary number of word-length
N, there are 2N
possible values of X, and hence there are
2N
possible values of product Y=C*X. Therefore direct
memory based implementation of multiplication would
require a memory unit of 2N
words to be used as LUT
consisting of pre-computed product values corresponding
to all possible values of X as shown in Fig. 1. The
product C* Xi is stored at the memory location whose
address is the same as the binary value of Xi for 0<2N
-1,
such that if N-bit binary value of Xi is used as address for
the memory-unit, then the corresponding product value is
read-out from the memory. However, the size of ROM
increases exponentially with the input length.
ROM with
2N
words
X
N
Y=C*X
N+W
Fig. 1. Structure of Direct-memory-based multiplier
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1720
A direct implementation of equation (1) requires N
number of multiplications where N represents the tap
length. Each of the multipliers which involve the
multiplications of input values with the fixed coefficients
can be replaced by a ROM or LUT, where each of the
LUTs contains the pre-computed product values for all
possible values of input samples.
A systolic system consists of a set of interconnected
cells, each capable of performing some simple operation
[2], [11].
Systolic designs are very efficient for hardware
implementation of computation-intensive DSP
applications because of the features like simplicity,
regularity and modularity of structure.
They also produce high-throughput rate by using
pipelining or parallel processing or both. The systolic
array for FIR filter of order N is shown in Fig. 2.It
consists of N Processing elements (PEs), where each PE
during a cycle period performs one MAC operation.
Several algorithms and architectures have been suggested
for systolization of FIR filters [12], [13].
Fig. 2. Structure of a linear systolic array for an N-tap FIR filter
The average computation time and the latency of
direct-memory based implementation is high for large
transform-lengths and therefore several novel algorithms
have been proposed in the last few years to decompose
the sinusoidal transforms into multiple number of
circular convolution or convolution-like structures of
smaller convolution-lengths [14]–[18].
These decompositions have resulted in improvement
of throughput performance with substantial reduction of
hardware and computational latency. A concurrent
recursive algorithm is derived for the computation of FIR
filter, and is ported further to a two-dimensional systolic
structure for reduced-latency direct-ROM-based
realization of large order filters [19].
A new approach to LUT design referred to as the odd-
multiple-storage (OMS) scheme is presented, where only
the odd multiples of the fixed coefficient are required to
be stored thereby the memory-size is reduced to half at
the cost of some increase in combinational circuit
complexity[20]. By the antisymmetric product coding
(APC) approach, the LUT size can also be reduced to
half, where the product words are recoded as
antisymmetric pairs [21]. Two new approaches are
suggested for designing the LUT for LUT-multiplier-
based implementation, where the memory-size is reduced
to nearly half of the conventional approach [22].
III.2. FIR Filters Based on Distributed Arithmetic (DA)
The main operations required for DA-based
computation of inner product are a sequence of lookup
table accesses followed by shift-accumulation operations
of the LUT output to obtain the desired result. DA-based
computation is well suited for FPGA realization, because
the LUT as well as the shift-add operations, can be
efficiently mapped to the LUT-based FPGA logic
structures.
DA is a bit-serial operation that implements a series of
fixed-point MAC operations in a fixed number of steps,
regardless of the number of terms to be calculated. DA is
often preferred since it eliminates the need for hardware
multipliers and is capable of implementing large filters
with very high throughput. Croisier et al had proposed
the DA algorithm for digital filter implementations in
1973 [23]. The first detailed discussion of DA was given
by Abraham Peled and Bede Liu in 1974 at the Arden
House Workshop on Digital Signal Processing [24].
S.A.White [25] discussed an organization to form the
inner product of a pair of data vectors and gave a
criterion for minimizing the ROM size and made
modifications to increase the speed by employing
techniques such as bit pairing or partitioning the input
words into the most significant half and least significant
half, thereby introducing parallelism in the computation.
III.2.1. Conventional DA approach
Consider the inner product of two N point vectors C
and X given by:
 
1
0
N-
i i
i
y n c x

  (3)
where Ci represents the constant coefficients, Xi is the
input data which may change from time to time. Let the
input sample represent the data coded as B-bit 2’s
complement binary number such that |xi|<1. The input
sample is given by:
1
0
1
2
B
j
i i i j
j
x x x



    (4)
where xi,j ∊ {0, 1}, xi0 is the sign bit and xi, B-1 is the Least
significant bit (LSB).Then substituting (4) in (3), the
output can be expressed as:
 
1 1
0
0 1
2
N B
j
i i i j
i j
y n c x x
 

 
 
   
 
 
  (5)
 
1 1 1
0
0 1 0
2
N B N
j
i i i i j
i j i
y n c x c x
  

  
   
        
   
   (6)
For a given set of Ci (i = 0, 1, 2,…, N − 1), the terms in
the brackets may take one of 2N
possible values that can
be precomputed and stored in an LUT. All possible 2N
values of Ci can be read out from the ROM using the N
bit sequence {xi,j for 0≤i≤N} as address bits.
These intermediate results are accumulated in B clock
cycles to produce one filter output y[n].
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1721
Fig. 3. LUT-based DA implementation of a 4-tap (N =4) FIR filter
Original LUT-based DA implementation of a 4-tap (N
=4) FIR filter consists of three units: the shift register
unit, the DA base unit, and the adder/shifter unit.
The LUT contains all 16 possible combination sums
of the filter weights C0, C1, C2, C3. The bank of shift
registers in Fig. 3 stores four consecutive input
samples(x[n-i], i=0, 1, 2, 3). The concatenation of
rightmost bits of the shift registers becomes the address
of the LUT. The shift register is shifted right at every
clock cycle. The corresponding LUT entries are also
shifted and accumulated in B consecutive times to
generate the output y[n]. The sign bits {xi0} are the last
bits to arrive. The clock period in which the sign bits all
simultaneously arrive is called the "sign-bit time”.
During the sign-bit time the control signal S = 1,
otherwise S = 0.
The time-complexity of FIR filters based on
Distributed Arithmetic is independent of the transform-
size or the number of filter-taps and depends only on the
word-length whereas time-complexity of Direct-memory-
based FIR filters is independent of word-length but
increases linearly with the transform size.
III.2.2. Distributed Arithmetic with Offset Binary Coding
The memory requirements (2N
) of DA-based
implementation for FIR filter increases exponentially
with the filter order N. With the use of offset binary
coding(OBC) the memory size can be reduced by half to
2N-1
words [2], [25]. The input data will be interpreted as
-1 for 0 and +1 for 1 in offset binary coding. Let the
input sample xi in offset binary coding be represented as:
 
1
2
i i ix x x     (7)
In 2's-complement notation the negative of Eq. (4) is
written as:
 
1
1
0
1
2 2
B
Nj
i i i j
j
x x x

 

     (8)
where the over score symbol indicates the complement of
a bit. From Eqs. (4) and (8), the Eq. (7) can be rewritten
as:
     
1
1
0 0
1
1
2 2
2
B-
Nj
i i i i j i j
j
x x x x x
 

 
      
  
 (9)
Define dij:
0 0
0
0
i j i j i j
i j i i
d x x j
d x x j
   
   
(10)
where dij ∊ {-1, 1}. Eq. (9) can be rewritten as:
 
1
1
0
1
2 2
2
B
Nj
i i j
j
x d

 

 
  
  
 (11)
Using Eq. (11) in Eq. (3):
   
1 1
1
0 0
1
2 2
2
N B
Nj
i i j
i j
y n c d
 
 
 
 
  
  
  (12)
   
1 1 1
1
0 0 0
1 1
2 2
2 2
B N N
Nj
i i j i
j i i
y n c d c
  
 
  
   
       
   
   (13)
   
1
1
0
2 2
B
Nj
j initial
j
y n D D

 

  (14)
where
1 1
0 0
1 1
2 2
N N
j i i j initial i
i i
D c d , D c
 
 
    .
The OBC scheme is characterized by Eq. (14).
Table I shows the content of the ROM for N=4. From
Table I, notice that the upper-half and the lower- half
ROM values are mirrored with sign reversed. Therefore
it is possible to reduce the ROM size by a factor of 2 as
shown in Table II. Fig. 4 shows a typical architecture for
DA-OBC based implementation of a 4-tap (N =4) FIR
filter. The XOR gates are used for address decoding; the
MUX with the constant Dinitial provides the initial value
to the shift accumulator. In Fig. 4, two control signals S1
and S2 are required, where S1 is 1 when j = 0 and 0
otherwise, and S2 is 1 when j = B-1 and 0 otherwise.
TABLE I
CONTENT OF THE ROM WITH DA-OBC
b3 b2 b1 b0 Contents of ROM
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
- (C3 +C2+ C1 +C0 )/2
- (C3 +C2+ C1 -C0 )/2
- (C3 +C2 - C1 +C0 )/2
- (C3 +C2 - C1 -C0 )/2
- (C3 - C2 + C1+C0 )/2
- (C3 -C2 + C1 - C0 )/2
- (C3 - C2- C1 + C0 )/2
- (C3 - C2 - C1 - C0 )/2
(C3 - C2 - C1 - C0 )/2
(C3 - C2 - C1 +C0 )/2
(C3 - C2 + C1- C0 )/2
(C3 -C2+ C1 + C0 )/2
(C3 +C2 - C1 - C0 )/2
(C3 +C2+ C1- C0 )/2
(C3 +C2+ C1 - C0 )/2
(C3 +C2+ C1+ C0 )/2
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1722
TABLE II
REDUCED SIZE ROM (2N-1
) WITH DA-OBC CODING
FOR 4-TAP (N =4) FIR FILTER
b2 b1 b0 Contents of ROM
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
- (C3 +C2+ C1 +C0 )/2
- (C3 +C2+ C1 -C0 )/2
- (C3 +C2 - C1 +C0 )/2
- (C3 +C2 - C1 - C0 )/2
- (C3 - C2+ C1 +C0 )/2
- (C3 -C2+ C1 - C0 )/2
- (C3 - C2- C1 +C0 )/2
- (C3 - C2- C1 - C0 )/2
Fig. 4. DA-OBC based implementation of a 4-tap (N =4) FIR filter
III.2.3. Distributed Arithmetic with Modified Offset
Binary Coding (DA-MOBC)
The DA-MOBC can reduce the LUT size from 2N−2
to
as low as 2 by exploiting the observation that if the single
term inside the LUT can be relocated outside the LUT,
then the lower half of the LUT is mirrored version of the
upper half of the LUT with only the signs reversed [26].
From Table II, it can be observed that the ROM values
except C3 term are mirrored along the line between the 4-
th and the 5-th rows. Except C3 term, the LUT in Table II
have only 2N-2
possible values depending on the input
values. Table III illustrates the new ROM table.
LUT size reduction is achieved with the overhead of
control circuits such as XOR gates, MUX (multiplexers),
and full adders (FA). While the increase in the number of
XOR gates is proportional to the input vector length B,
the complexities of other control circuits (MUX, FA)
increase in proportion to the coefficient word-length as
shown in Fig. 5.
III.2.4. Distributed Arithmetic Based LUT-Less
Architecture Proposed by Yoo and Anderson
A recursive LUT reduction to the original DA
decreases the LUT size by half at every iteration and
eventually the LUT-less DA architecture can be achieved
[27]. From Fig. 3, it can be observed that the lower half
of LUT (locations whose addresses have a 1 in the MSB)
is the same with the sum of the upper half of LUT
(locations whose addresses have a 0 in the MSB) and C3
term.
Thus, LUT size can be reduced by a factor of 2 with
an additional 2x1 MUX and a full adder. After several
iterations of the LUT reduction, final LUT-less DA
architecture for a 4-tap FIR filter is achieved as shown in
Fig. 6.
Fig. 5. Block diagram of the LUT-less DA-OBC (DA-MOBC)
for a 4-tap FIR filter
TABLE III
REDUCED SIZE ROM (2N-2
) WITH DA-MOBC CODING
FOR 4-TAP (N =4) FIR FILTER
b2 b1 b0 Contents of ROM
0 0 0
0 0 1
0 1 0
0 1 1
- (C2+ C1 + C0 )/2
- (C2+ C1 - C0 )/2
- (C2 - C1 + C0 )/2
- (C2 - C1 - C0 )/2
Fig. 6. LUT-less Architecture for a 4-tap FIR filter proposed
by Yoo and Anderson
III.2.5. On-Line DA-LUT Architecture for FIR Filters
proposed by Eshtawie, Othman
The tri-state buffer and a carry look ahead adder
(CLA) are the basic digital logic units that are used to
construct the on-line LUT DA-LUT Architecture [28] as
shown in Fig. 7.
Filter coefficients will pass to the CLA only if their
buffer enable signal value is 1.
Only the needed location contents are calculated
whereas, in the DA technique the contents of locations
that may not be used when processing the input signal
are also computed.
Fig. 7. LUT-less Architecture for a 4-tap FIR filter
with tri-state buffers and CLA adders
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1723
TABLE IV
COMPARISON OF VARIOUS ARCHITECTURES FOR A 4 TAP FILTER (N=4). THE SHIFT REGISTER AND THE ADDER/SHIFTER UNITS ARE NOT
CONSIDERED SINCE THEY ARE COMMON FOR ALL STRUCTURES. BC REPRESENTS THE COEFFICIENT WORD LENGTH.
Logic Functions
LUT-based DA
(conventional DA)
DA-OBC DA-MOBC
LUT-less Architecture
of Yoo & Anderson
On-Line DA-LUT
Architecture
ROM Size 2N
x BC 2N-1
x BC (2N-2
to 2) x BC 0 0
XOR gates 0 N N-1 0 0
2x1 MUX 0 BC BC N x BC 0
Adders 0 0 0 N-1 x BC N-1 CLA’s
Tristate Buffer 0 0 0 0 N
Adder/Sub 0 0 N x BC 0 0
In DA technique, even if the location content is zero it
will be fetched and added to the partial sum, whereas in
on-line LUT no addition operation occurs when
calculated contents is zero. Hence the execution time for
obtaining the filter output is very short.
III.2.6. Memory Partitioning and Multiple Memory
Bank Algorithms
The main drawback of DA based FIR filter is that as
the filter size increases, the memory size requirements of
the implementation grow exponentially. Memory access
time can be a bottleneck for speed of the entire system
when the ROM size is very large. A larger LUT can be
avoided by partitioning the circuit in to smaller LUTs
and to combine their outputs with adders.
Several Memory-partitioning and multiple memory
bank approaches along with flexible multi-bit data access
mechanisms are presented for FIR filtering and inner-
product computation in order to reduce the memory-size
of DA-based filters [10], [25], [29]-[32].
The N-tap filter is divided into m-smaller filters each
having k-input lines such that N= m × k and it is assumed
that N is not prime. The total number of clock cycles
required for this implementation will be B+log2(m); the
additional second term is the number of clock cycles
required to implement an adder tree to calculate the sum
of the outputs from m LUTS. The decrease in throughput
is very less with this implementation when compared
with a large LUT required for a high order filter.
Hence Eq. (6) is rewritten as:
 
 
 
1 11
0
0
1 11 1
1 0
2
z km-
i i
z i zk
z kB m
j
i i j
j z i zk
y n c x
c x
 
 
  

  
  
    
    
  
  
    
 
  
(15)
For example, a 32 tap DA FIR filter would require a
large LUT with 232
entries. This problem can be
overcome by breaking up the LUT into 8 smaller LUT
units with each having 4 input lines.
Hence a single large LUT with 232
memory elements
is replaced by 8 LUTS each having only 24
=16 memory
elements.
Fig. 8 shows the implementation of a 4-tap FIR filter
based on equation (15) for m=2 and k=2.
Fig. 8. Implementation of a 4-tap FIR filter
using memory partitioning with m=k=2
TABLE VI
COMPARISON OF VARIOUS REQUIREMENTS WITH AND WITHOUT
MEMORY-PARTITIONING
Memory Variants
No. of
Address
bits
Memory size
Clock cycles
required
Without memory
partitioning
(Full LUT
implementation)
N 2N
B
With Memory-
partitioning (ROM
decomposition)
N
k
m
  2 2
N / m k  
m or m  2
B mlog
0
5
10
15
20
Full LUT Partitioned
LUT
LUTSize
ClockCycles
Fig. 9. Comparison of a 4-tap FIR filter (N=4) with and without
memory partitioning with m=k=2 with the input word length B=8
III.2.7. Systolic Architectures for DA-Based
Implementation of FIR Filters
Systolic architectures can result in cost effective, high
performance system by exploiting high-level of
concurrency using pipelining or parallel processing or
both [11]. Novel one- and two-dimensional systolic
structures were designed for computation of circular
convolution using distributed arithmetic (DA) that
resulted in less memory and less area-delay complexity
compared with the other DA-based structures for circular
convolution [33].
One- and two-dimensional fully pipelined computing
structures are presented for area-delay-power-efficient
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1724
implementation of FIR filter by systolic decomposition
of distributed arithmetic based inner-product
computation [34].
A linear array consisting of number of Processing
elements (PEs) and an output cell is shown in Fig. 10.
Each PE consists of a ROM of 2M
words. Each PE
reads the content on its ROM at the location specified by
the input bit vector during a cycle period. The value read
from the ROM is then added to the input available to the
PE from its left. During every cycle period, the sum is
then transferred as output to its right as shown in Figs.
11. Each output cell contains a shift-register and an
adder. It shifts the content of its register left by one
position and then adds the available input to the recently
shifted content in its register during every cycle period.
For high-throughput implementation of FIR filters, a two
dimensional systolic array is used as shown in Figs. 12.
FPGA realization of FIR filters for high-speed and
medium-speed by using modified distributed arithmetic
architectures were suggested by Jiafeng Xie et al., which
made use of pipelined registers and pipelined shift adder
tree [35].
III.2.8. DA Based Architectures for Adaptive FIR
Filtering
Adaptive filtering DSP algorithms are employed in
several hand held mobile devices for applications such as
echo cancellation, signal de-noising, and channel
equalization. New hardware adaptive filter architecture
for very high throughput LMS adaptive filters using
distributed arithmetic (DA) has been suggested where
building adaptive DA filters requires recalculating the
contents of LUTs for each adaptation.
By using an auxiliary LUT with special addressing,
the efficiency and throughput of DA adaptive filters can
be of the same order as fixed DA filters [36], [37].
A new hardware architecture using conjugate
distributed arithmetic (CDA) for high throughput
hardware implementations of LMS adaptive filters is
presented where all possible combination sums of the
input signal samples are stored in the LUT and updated at
the arrival of every sample using an efficient update
procedure [36], [38].
Fig. 10. Linear 1-D systolic array for DA-based implementation
of FIR filter
Figs. 11. (a) Function of PE, (b) Function of output cell
of 1-D systolic array
Figs. 12. (a) 2-D systolic array for FIR filter; (b) function of PE; and (c) function of Shift Adder (SA) cell
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1725
IV. Conclusion
The recent significant researches that are concerned
with reducing the overall area-delay-power complexities
of memory based realization of FIR filters are presented
in this paper. A detailed survey of memory-based
implementation of FIR filters using Distributed
Arithmetic is also presented stating its merits over direct
memory-based implementation of FIR filters.
The main goal behind this review is to assist the
researchers in the field of Digital signal processing to
understand the available methods and adopt the same in
various application environments.
Many algorithms and architectures have been
suggested in the literature to reduce the area and time-
complexities of memory-based implementation of FIR
filters but many more efficient algorithms and
architectures need to be developed to design flexible
area-delay-power efficient memory based FIR filters to
meet the growing requirements of DSP applications.
References
[1] J. G. Proakis and D. G. Manolakis, Digital Signal Processing:
Principles, Algorithms and Applications., NJ: Prentice-Hall, 1996.
[2] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation. New York: Wiley, 1999.
[3] G. R. Goslin, “A Guide to Using Field Programmable Gate Arrays
(FPGAs) for Application-Specific Digital Signal Processing
Performance”, XILINX, 1995.
[4] M. Yamada, and A. Nishihara, “High-Speed FIR Digital Filter
with CSD Coefficients Implemented on FPGA”, in Proc. IEEE
Design Automation Conference, 2001, pp. 7-8.
[5] R. I. Hartley, “Subexpression sharing in filters using canonic
signed-digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43,
no. 10, pp. 677–688, Oct. 1996.
[6] M. Potkonjak, M. B. Srivastava, and A. Chandrakasan, “Multiple
constant multiplications: Efficient and versatile framework and
algorithms for exploring common subexpression elimination,”
IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol.
15, no. 2, pp. 151–165, Feb. 1996.
[7] A. G. Dempster and M. D. Macleod, “Generation of signed-digit
representations for integer multiplication,” IEEE Signal Process.
Lett., vol.11, no. 8, pp. 663–665, Aug. 2004.
[8] M. D. Macleod and A. G. Dempster, “Multiplierless FIR filter
design algorithms,” IEEE Signal Processing Letters, vol. 12, no.
3, pp. 186–189,Mar. 2005.
[9] Douglas L. Maskell, Jussipekka Leiwo and Jagdish C. Patra,”The
Design of Multiplierless FIR Filters with a Minimum Adder Step
and Reduced Hardware complexity,” in Proc. 2006 IEEE
International Symposium on Circuits and Systems, , p. 4,May
2006.
[10] H.-R. Lee, C.-W. Jen, and C.-M. Liu, “On the design automation
of the memory-based VLSI architectures for FIR filters,” IEEE
Trans. Consumer. Electronics, vol. 39, no. 3, pp. 619–629, Aug.
1993.
[11] H. T. Kung, “Why systolic architectures?,” IEEE Computer, vol.
15,no. 1, pp. 37–45, Jan. 1982.
[12] R.Wyrzykowski and S. Ovramenko, “Flexible systolic
architecture for VLSI FIR filters,” Proc. Inst. Elect. Eng.—
Comput. Digit. Techniques,vol. 139, no. 2, pp. 170–172, Mar.
1992.
[13] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible
celllevel systolic architecture for high throughput implementation
of 2-D FIR filters,” Proc. Inst. Elect. Eng.—Comput. Digit.
Techniques, vol.143, no. 5, pp. 436–439, Nov. 1996.
[14] D. F. Chiper, “A new systolic array algorithm for memory-based
VLSI array implementation of DCT,” in Proc. Second IEEE
Symp. on Computers and Communications, pp. 297–301,July
1997.
[15] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis,
“Systolic algorithms and a memory-based design approach for a
unified architecture for the computation of
DCT/DST/IDCT/IDST,”IEEE Trans. Circuits Syst-I: Regular
Papers, vol. 52, no. 6, pp. 1125–1137, June 2005.
[16] C. Cheng and K. K. Parhi, “A novel systolic array structure for
DCT,”IEEE Trans. Circuits Syst-II: Express Briefs, vol. 52, no. 7,
pp. 366–369,July 2005.
[17] P. K. Meher, J. C. Patra, and M. N. S. Swamy, “New systolic
algorithm and array architecture for prime-length discrete sine
transform,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54,
no. 3, pp. 262–266,Mar. 2007.
[18] P. K. Meher and M. N. S. Swamy, “High-throughput memory-
based architecture for DHT using a new convolutional
formulation,” IEEETrans. Circuits Syst. II: Express Briefs, vol.
54, no. 7, pp. 606–610,July 2007.
[19] P. K. Meher, “Low-latency hardware-efficient memory-based
design for large-order FIR digital filters”, Sixth International
Conference on Information, Communications and Signal
Processing(ICICS 2007), Dec. 2007
[20] P. K. Meher, “New approach to LUT implementation and
accumulation for memory-based multiplication,” in Proc. 2009
IEEE Int. Symp.Circuits Syst., ISCAS’09, May 2009, pp. 453–
456.
[21] P. K. Meher, “New look-up-table optimizations for memory-
based multiplication,” in Proc. Int. Symp. Integr. Circuits
(ISIC’09), Dec.2009.
[22] P. K. Meher, “New approach to lookup table design and memory
based realization of FIR digital filter”, IEEE Transactions on
circuit and systems-I, Vol.57, NO.3, March 2010.
[23] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, “Digital
filter for PCM encoded signals,” U.S. Patent 3 777 130, Dec. 4,
1973.
[24] A. Peled and B. Liu, “A new hardware realization of digital
filters,” IEEE Trans. Acoustic, Speech, Signal Process., vol. 22,
no. 6, pp.456–462, Dec. 1974.
[25] S. A. White, “Applications of the distributed arithmetic to digital
signal processing: A tutorial review,” IEEE ASSP Mag., vol. 6,
no. 3, pp. 5–19,Jul. 1989.
[26] P. Choi, S.-C. Shin, and J.-G. Chung, “Efficient ROM size
reduction for distributed arithmetic,” in Proc. IEEE Int. Symp.
Circuits System (ISCAS), May 2000, vol. 2, pp. 61–64.
[27] H. Yoo and D. V. Anderson, “Hardware-efficient distributed
arithmetic architecture for high-order digital filters,” in Proc.
IEEE Int. Conf. on Acoustics, Speech, Signal Processing
(ICASSP), Mar. 2005, vol. 5, pp. v/125–v/128.
[28] Mohamed A. Eshtawie and Masuri Othman," On-Line DA-LUT
Architecture for High-Speed High-Order Digital FIR Filters”, in
the tenth IEEE international conference on communication
systems, Nov. 2006, Singapore.
[29] C.-F. Chen, “Implementing FIR filters with distributed
arithmetic,” IEEE Trans. Acoustic., Speech, Signal Process., vol.
33, no. 5, pp.1318–1321, Oct. 1985.
[30] K. Nourji and N. Demassieux, “Optimal VLSI architecture for
distributed arithmetic-based algorithms,” in IEEE International
Conference on Acoustics, Speech, and Signal Processing, vol. 2,
Apr. 1994, pp. II/509–II/512.
[31] S.-S. Jeng, H.-C. Lin, and S.-M. Chang, “FPGA implementation
of FIR filter using M-bit parallel distributed arithmetic,” in
Proc.2006,IEEE Int. Symp. Circuits Systems (ISCAS), May 2006,
p. 4.
[32] M. Mehendale, S. D. Sherlekar, and G..Venkatesh “Area-delay
trade-off in distributed arithmetic based implementation of FIR
filters,” in Proc.10th Int. Conf. VLSI Design, Jan. 1997, pp. 124–
129.
[33] P. K. Meher, “Hardware-efficient systolization of DA-based
calculation of finite digital convolution,” IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707–711, Aug. 2006.
[34] P. K. Meher, S. Chandrasekaran, and A. Amira, “FPGA
realization of FIR filters by efficient and flexible systolization
using distributed arithmetic,”IEEE Trans. Signal Process., vol. 56,
no. 7, pp. 3009–3017, July 2008.
K. G. Shanthi, N. Nagarajan
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7
1726
[35] Jiafeng Xie n, JianjunHe,GuanzhengTan,” FPGA realization of
FIR filters for high-speed and medium-speed by using modified
distributed arithmetic architectures”, Microelectronics Journal 41,
April 2010 pp. 365–370.
[36] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle
River, NJ, 2002.
[37] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V.
Anderson, “LMS adaptive filters using distributed arithmetic for
high throughput,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol.
52, no. 7, pp. 1327–1337, July 2005.
[38] Walter Huang, Venkatesh Krishnan, and David V. Anderson,”
Conjugate Distributed Arithmetic Adaptive FIR Filters and their
Hardware Implementation”, MWSCAS '06,pp.295-299, Circuits
and Systems, Volume: 2, 2006.
Authors’ information
K. G. Shanthi (Corresponding author)
completed her B.E in 1996 from Madras
university, Chennai and obtained her ME in
2005 from the Government college of
technology, Coimbatore. Her major in PG course
is VLSI Design. Her field of interest includes
design of FPGA based VLSI architectures, VLSI
signal processing. She is currently working as
Associate professor at R.M.K Engineering College, Chennai. She is
currently pursuing her research in the field of VLSI Design.
Address: Associate Professor /Department of Electronics &
Communication Engg, R.M.K Engineering College, Chennai,
Tamilnadu, India .Pin code: 601 206.
E-mail: kgs.ece@rmkec.ac.in
Nagarajan N. received his B.Tech and M.E. degrees in Electronics
Engineering at M.I.T Chennai. He received his PhD in faculty of I.C.E.
from Anna University, Chennai. He is currently working as Principal
C.I.E.T, Coimbatore. His specialization includes optical, wireless
Adhoc and Sensor Networks.

More Related Content

What's hot

What's hot (19)

Compression
CompressionCompression
Compression
 
Data Redundacy
Data RedundacyData Redundacy
Data Redundacy
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
image compresson
image compressonimage compresson
image compresson
 
Image compression
Image compressionImage compression
Image compression
 
Image compression
Image compressionImage compression
Image compression
 
image basics and image compression
image basics and image compressionimage basics and image compression
image basics and image compression
 
Presentation on Image Compression
Presentation on Image Compression Presentation on Image Compression
Presentation on Image Compression
 
Image compression models
Image compression modelsImage compression models
Image compression models
 
Compressionbasics
CompressionbasicsCompressionbasics
Compressionbasics
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancy
 
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
A N  A LTERNATIVE  G REEN  S CREEN  K EYING M ETHOD  F OR  F ILM  V ISUAL  E ...A N  A LTERNATIVE  G REEN  S CREEN  K EYING M ETHOD  F OR  F ILM  V ISUAL  E ...
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
 
Image compression
Image compressionImage compression
Image compression
 
Image compression
Image compressionImage compression
Image compression
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
 
Run length encoding
Run length encodingRun length encoding
Run length encoding
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Image compression
Image compressionImage compression
Image compression
 
Compression: Images (JPEG)
Compression: Images (JPEG)Compression: Images (JPEG)
Compression: Images (JPEG)
 

Similar to Memory Based Hardware Efficient Implementation of FIR Filters

CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...ijaia
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...IRJET Journal
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Design of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MACDesign of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MACIRJET Journal
 
Fault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch CodesFault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch CodesIJERA Editor
 
Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...IAEME Publication
 
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...VLSICS Design
 
INDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORTINDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORTABHISHEK DABRAL
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation IJEEE
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...ijsrd.com
 
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search AlgorithmDesign of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search AlgorithmIJERA Editor
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Performance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate ApplicationsPerformance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate ApplicationsIJEEE
 
Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...IJECEIAES
 

Similar to Memory Based Hardware Efficient Implementation of FIR Filters (20)

CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
 
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
 
F1074145
F1074145F1074145
F1074145
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Design of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MACDesign of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MAC
 
Fault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch CodesFault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch Codes
 
A05410105
A05410105A05410105
A05410105
 
Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...
 
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
 
INDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORTINDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORT
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
 
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search AlgorithmDesign of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Performance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate ApplicationsPerformance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate Applications
 
Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...
 

More from Dr.SHANTHI K.G

Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdfFourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdfDr.SHANTHI K.G
 
Laplace Transform Problems
Laplace Transform ProblemsLaplace Transform Problems
Laplace Transform ProblemsDr.SHANTHI K.G
 
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,SphericalOrthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,SphericalDr.SHANTHI K.G
 
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties Dr.SHANTHI K.G
 
Unit-1 Classification of Signals
Unit-1 Classification of SignalsUnit-1 Classification of Signals
Unit-1 Classification of SignalsDr.SHANTHI K.G
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signalsDr.SHANTHI K.G
 
Scope of signals and systems
Scope of signals and systemsScope of signals and systems
Scope of signals and systemsDr.SHANTHI K.G
 
Unit 1 -Introduction to signals and standard signals
Unit 1 -Introduction to signals  and standard signalsUnit 1 -Introduction to signals  and standard signals
Unit 1 -Introduction to signals and standard signalsDr.SHANTHI K.G
 
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...Dr.SHANTHI K.G
 
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION Dr.SHANTHI K.G
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -NotesDr.SHANTHI K.G
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marksDr.SHANTHI K.G
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problemsDr.SHANTHI K.G
 
Electric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipoleElectric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipoleDr.SHANTHI K.G
 
Gauss law and its Applications
Gauss law and its ApplicationsGauss law and its Applications
Gauss law and its ApplicationsDr.SHANTHI K.G
 
Electric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux densityElectric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux densityDr.SHANTHI K.G
 
Electric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of chargeElectric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of chargeDr.SHANTHI K.G
 

More from Dr.SHANTHI K.G (20)

unit4 DTFT .pptx
unit4 DTFT .pptxunit4 DTFT .pptx
unit4 DTFT .pptx
 
unit4 sampling.pptx
unit4 sampling.pptxunit4 sampling.pptx
unit4 sampling.pptx
 
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdfFourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
 
Laplace Transform Problems
Laplace Transform ProblemsLaplace Transform Problems
Laplace Transform Problems
 
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,SphericalOrthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
 
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
 
Unit-1 Classification of Signals
Unit-1 Classification of SignalsUnit-1 Classification of Signals
Unit-1 Classification of Signals
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signals
 
Scope of signals and systems
Scope of signals and systemsScope of signals and systems
Scope of signals and systems
 
Unit 1 -Introduction to signals and standard signals
Unit 1 -Introduction to signals  and standard signalsUnit 1 -Introduction to signals  and standard signals
Unit 1 -Introduction to signals and standard signals
 
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
 
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
 
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
 TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
 
Unit-3:Magnetostatics
Unit-3:MagnetostaticsUnit-3:Magnetostatics
Unit-3:Magnetostatics
 
Electric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipoleElectric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipole
 
Gauss law and its Applications
Gauss law and its ApplicationsGauss law and its Applications
Gauss law and its Applications
 
Electric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux densityElectric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux density
 
Electric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of chargeElectric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of charge
 

Recently uploaded

Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsAlirezaBagherian3
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 

Recently uploaded (20)

Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal Compressors
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 

Memory Based Hardware Efficient Implementation of FIR Filters

  • 1. International Review on Computers and Software (I.RE.CO.S.), Vol. 8, N. 7 ISSN 1828-6003 July 2013 Manuscript received and revised June 2013, accepted July 2013 Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved 1718 Memory Based Hardware Efficient Implementation of FIR Filters K. G. Shanthi, N. Nagarajan Abstract – Finite impulse response (FIR) digital filters are key components used in many digital signal processing (DSP) systems because of their linear phase, stability, fewer finite precision errors and regular structure. The real time realization of FIR filter with less hardware requirement and less latency has become very critical with increasing developments in very large scale integration (VLSI) technology. The objective of this paper to explore the current trends in the development of algorithms and architectures for memory based realization of FIR filters that are mainly concerned with reducing the overall area-delay-power complexities. The purpose of this study is to compare these architectures based on ROM size, delay and throughput. The results presented here would assist the researchers in the field of Digital Signal processing to select best architecture for an application based on requirements. New algorithms and architectures need to be developed to design area-delay-power-efficient FIR filters for various demanding DSP applications. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved. Keywords: Finite Impulse Response Filter, Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), Distributed Arithmetic (DA), Lookup Table (LUT) Nomenclature y[n] The FIR Filter Output N Order of the Filter Ci Constant coefficients Xi Input data B Input Word length I. Introduction Digital signal processing (DSP) is playing a vital role in the significant advancements of digital technology taking place currently around the world. Digital communication, speech and image data compression, speech recognition, spectral estimation and analysis, adaptive filtering applications, wired and wireless communication, multimedia systems, biomedical instrumentation, satellite and aerospace control, remote sensing are the major areas where DSP has created a major impact [1]. The increased daily use of digital technology has led to the development of improved algorithms and architectures to design the DSP systems with less power dissipation, higher speed performance and less area complexity. Several architectural solutions have been made to minimize the arithmetic complexities of the algorithms in order to reduce the overall area-delay- power complexities [2]. Finite impulse response (FIR) filter is used as a basic tool in many DSP applications. Digital filters are used to modify signal characteristics in time or frequency domain and are used in many DSP systems to perform signal preconditioning, anti-aliasing, band selection, interpolation, low-pass filtering etc [1]. Traditionally, the design methods were mainly focused on multiplier-based architectures to implement the Multiply-and-Accumulate (MAC) blocks that constitute the central piece in FIR filters and several DSP functions. These multipliers consume most of the resources of the system and also involve most of the computation-time. The number of multiply and accumulate operations required per filter output increases with the filter order and thereby real time implementations of these filters is a challenging task. A discrete-time linear finite impulse response (FIR) filter generates the output y[n] as a sum of delayed and scaled input samples x[n].A N- tap FIR digital filter is represented as:       1 0 N i y n c i x n i     (1) where y[n] is the FIR filter output, c[i] represents the filter coefficients, x[n-i] is the input data and n is the time index starting from 0. A direct implementation of Eq. (1) requires N Multiply-and-Accumulate blocks, which is expensive in terms of area and speed. To resolve this problem many multiplier-less architectures were proposed in the recent years which are broadly classified in to two basic categories according to how they manipulate the filter coefficients for the multiply operation. The first type of multiplier-less technique is the conversion-based approach and the second type is memory based implementation approach. For the past one decade, there has been a growing trend to implement DSP functions in Field Programmable Gate Arrays (FPGAs) rather than on
  • 2. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1719 Application specific integrated circuits (ASIC) and DSP chips. The implementation on ASICs is not preferred due to high development costs and time-to-market factors. Sequential-execution architecture of programmable DSP processors prevents them from achieving the desired performance. In this context, FPGA platform provides a very attractive solution that balance high flexibility with the option to reconfigure, time-to-market, cost and performance [3]. This paper is organized as follows: In Section 2, a brief overview of the conversion-based multiplier-less FIR filters is presented. Section 3 explores the algorithmic aspects and architectural approach of memory based FIR filters and an in-depth review of FIR filters based on DA. Finally the Conclusion is presented in Section 4. II. Conversion-Based Multiplier-Less Implementation of FIR Filters In this approach the coefficients are transformed to other numeric representations so that the multiplications are implemented with adder/subtractors and shifters. A coefficient in "n-bit" signed-digit representation can be written as: 1 0 2 n- i i i C b    (2) where bi is taken from the set {-1 ,0 ,1 }. The representation that has minimum non-zero digits and no consecutive non-zero digits is known as the canonic signed-digit (CSD) representation[2]. Since in shift and add multiplication, non-zero digits represent additions (or subtractions), CSD therefore is significantly more efficient in adders than binary representations. Multipliers [4] in the filter whose coefficients are expressed as canonic signed digit code are realized with wired-shifters, adders and subtractors. Common subexpression elimination [CSE] is a numerical transformation of the constant multiplications that can lead to efficient hardware implementations in terms of area, power and speed [5]-[8]. Subexpression elimination can only be performed on constant multiplications that operate on a common variable. It is the process of examining the shift and add implementations of constant multiplications and finding the redundant operations. Once the redundancies are found, these operations can be performed once and can be shared among the constant multiplications so that number of adders and shifters for implementation are minimized. Common subexpression (CSE) techniques attempt to minimize the number of additions in the multiplier block by reusing terms. These terms can be canonic signed digit (CSD) [5], minimal signed digit (MSD), or all signed digit (ASD) [7]. Multiplierless FIR Filter Design Algorithms by Malcolm D. Macleod, and Andrew G. Dempster introduced a new CSE algorithm, which searches a bounded number of Minimal Signed Digit (MSD) representations [8]. Douglas L. Maskell, Jussipekka Leiwo and Jagdish C. Patra [9] reduced both the coefficient word length and the number of non-zero bits in the filter coefficients so that the adder step can be minimized that resulted in reducing the hardware complexity of linear phase FIR digital filters. III. Algorithms and Architectures for Memory Based FIR Filters The memory based approach involves the use of memories (RAMs, ROMs) or Look-Up Tables (LUTs) that store pre-computed values that can be readout for multiplication operation. With the advancements in the VLSI technology, the semiconductor memory has become cheaper, faster and more efficient in terms power dissipation. Memory-based FIR filters consequently are gaining substantial popularity in the DSP environment. These filters result in high-throughput and reduced- latency since the memory-access time is usually very much shorter compared with multiplication time. They have much less dynamic power consumption due to minimal switching activities associated in obtaining the output product/inner product values by memory read operations. There are two types of memory based FIR filters. One of the techniques is the direct memory-based implementation of FIR filters [10], while the other is based on distributed arithmetic (DA). III.1. Direct-Memory-Based FIR Filters In the direct-memory-based implementations [10], the multiplications of input values with the fixed coefficients can be replaced by a ROM or look-up-table (LUT) which contains the pre-computed product values for all possible values of input samples. Let X be an input word to be multiplied with a W-bit fixed coefficient C. If X is assumed to be an unsigned binary number of word-length N, there are 2N possible values of X, and hence there are 2N possible values of product Y=C*X. Therefore direct memory based implementation of multiplication would require a memory unit of 2N words to be used as LUT consisting of pre-computed product values corresponding to all possible values of X as shown in Fig. 1. The product C* Xi is stored at the memory location whose address is the same as the binary value of Xi for 0<2N -1, such that if N-bit binary value of Xi is used as address for the memory-unit, then the corresponding product value is read-out from the memory. However, the size of ROM increases exponentially with the input length. ROM with 2N words X N Y=C*X N+W Fig. 1. Structure of Direct-memory-based multiplier
  • 3. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1720 A direct implementation of equation (1) requires N number of multiplications where N represents the tap length. Each of the multipliers which involve the multiplications of input values with the fixed coefficients can be replaced by a ROM or LUT, where each of the LUTs contains the pre-computed product values for all possible values of input samples. A systolic system consists of a set of interconnected cells, each capable of performing some simple operation [2], [11]. Systolic designs are very efficient for hardware implementation of computation-intensive DSP applications because of the features like simplicity, regularity and modularity of structure. They also produce high-throughput rate by using pipelining or parallel processing or both. The systolic array for FIR filter of order N is shown in Fig. 2.It consists of N Processing elements (PEs), where each PE during a cycle period performs one MAC operation. Several algorithms and architectures have been suggested for systolization of FIR filters [12], [13]. Fig. 2. Structure of a linear systolic array for an N-tap FIR filter The average computation time and the latency of direct-memory based implementation is high for large transform-lengths and therefore several novel algorithms have been proposed in the last few years to decompose the sinusoidal transforms into multiple number of circular convolution or convolution-like structures of smaller convolution-lengths [14]–[18]. These decompositions have resulted in improvement of throughput performance with substantial reduction of hardware and computational latency. A concurrent recursive algorithm is derived for the computation of FIR filter, and is ported further to a two-dimensional systolic structure for reduced-latency direct-ROM-based realization of large order filters [19]. A new approach to LUT design referred to as the odd- multiple-storage (OMS) scheme is presented, where only the odd multiples of the fixed coefficient are required to be stored thereby the memory-size is reduced to half at the cost of some increase in combinational circuit complexity[20]. By the antisymmetric product coding (APC) approach, the LUT size can also be reduced to half, where the product words are recoded as antisymmetric pairs [21]. Two new approaches are suggested for designing the LUT for LUT-multiplier- based implementation, where the memory-size is reduced to nearly half of the conventional approach [22]. III.2. FIR Filters Based on Distributed Arithmetic (DA) The main operations required for DA-based computation of inner product are a sequence of lookup table accesses followed by shift-accumulation operations of the LUT output to obtain the desired result. DA-based computation is well suited for FPGA realization, because the LUT as well as the shift-add operations, can be efficiently mapped to the LUT-based FPGA logic structures. DA is a bit-serial operation that implements a series of fixed-point MAC operations in a fixed number of steps, regardless of the number of terms to be calculated. DA is often preferred since it eliminates the need for hardware multipliers and is capable of implementing large filters with very high throughput. Croisier et al had proposed the DA algorithm for digital filter implementations in 1973 [23]. The first detailed discussion of DA was given by Abraham Peled and Bede Liu in 1974 at the Arden House Workshop on Digital Signal Processing [24]. S.A.White [25] discussed an organization to form the inner product of a pair of data vectors and gave a criterion for minimizing the ROM size and made modifications to increase the speed by employing techniques such as bit pairing or partitioning the input words into the most significant half and least significant half, thereby introducing parallelism in the computation. III.2.1. Conventional DA approach Consider the inner product of two N point vectors C and X given by:   1 0 N- i i i y n c x    (3) where Ci represents the constant coefficients, Xi is the input data which may change from time to time. Let the input sample represent the data coded as B-bit 2’s complement binary number such that |xi|<1. The input sample is given by: 1 0 1 2 B j i i i j j x x x        (4) where xi,j ∊ {0, 1}, xi0 is the sign bit and xi, B-1 is the Least significant bit (LSB).Then substituting (4) in (3), the output can be expressed as:   1 1 0 0 1 2 N B j i i i j i j y n c x x                  (5)   1 1 1 0 0 1 0 2 N B N j i i i i j i j i y n c x c x                            (6) For a given set of Ci (i = 0, 1, 2,…, N − 1), the terms in the brackets may take one of 2N possible values that can be precomputed and stored in an LUT. All possible 2N values of Ci can be read out from the ROM using the N bit sequence {xi,j for 0≤i≤N} as address bits. These intermediate results are accumulated in B clock cycles to produce one filter output y[n].
  • 4. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1721 Fig. 3. LUT-based DA implementation of a 4-tap (N =4) FIR filter Original LUT-based DA implementation of a 4-tap (N =4) FIR filter consists of three units: the shift register unit, the DA base unit, and the adder/shifter unit. The LUT contains all 16 possible combination sums of the filter weights C0, C1, C2, C3. The bank of shift registers in Fig. 3 stores four consecutive input samples(x[n-i], i=0, 1, 2, 3). The concatenation of rightmost bits of the shift registers becomes the address of the LUT. The shift register is shifted right at every clock cycle. The corresponding LUT entries are also shifted and accumulated in B consecutive times to generate the output y[n]. The sign bits {xi0} are the last bits to arrive. The clock period in which the sign bits all simultaneously arrive is called the "sign-bit time”. During the sign-bit time the control signal S = 1, otherwise S = 0. The time-complexity of FIR filters based on Distributed Arithmetic is independent of the transform- size or the number of filter-taps and depends only on the word-length whereas time-complexity of Direct-memory- based FIR filters is independent of word-length but increases linearly with the transform size. III.2.2. Distributed Arithmetic with Offset Binary Coding The memory requirements (2N ) of DA-based implementation for FIR filter increases exponentially with the filter order N. With the use of offset binary coding(OBC) the memory size can be reduced by half to 2N-1 words [2], [25]. The input data will be interpreted as -1 for 0 and +1 for 1 in offset binary coding. Let the input sample xi in offset binary coding be represented as:   1 2 i i ix x x     (7) In 2's-complement notation the negative of Eq. (4) is written as:   1 1 0 1 2 2 B Nj i i i j j x x x          (8) where the over score symbol indicates the complement of a bit. From Eqs. (4) and (8), the Eq. (7) can be rewritten as:       1 1 0 0 1 1 2 2 2 B- Nj i i i i j i j j x x x x x                 (9) Define dij: 0 0 0 0 i j i j i j i j i i d x x j d x x j         (10) where dij ∊ {-1, 1}. Eq. (9) can be rewritten as:   1 1 0 1 2 2 2 B Nj i i j j x d              (11) Using Eq. (11) in Eq. (3):     1 1 1 0 0 1 2 2 2 N B Nj i i j i j y n c d                 (12)     1 1 1 1 0 0 0 1 1 2 2 2 2 B N N Nj i i j i j i i y n c d c                            (13)     1 1 0 2 2 B Nj j initial j y n D D       (14) where 1 1 0 0 1 1 2 2 N N j i i j initial i i i D c d , D c         . The OBC scheme is characterized by Eq. (14). Table I shows the content of the ROM for N=4. From Table I, notice that the upper-half and the lower- half ROM values are mirrored with sign reversed. Therefore it is possible to reduce the ROM size by a factor of 2 as shown in Table II. Fig. 4 shows a typical architecture for DA-OBC based implementation of a 4-tap (N =4) FIR filter. The XOR gates are used for address decoding; the MUX with the constant Dinitial provides the initial value to the shift accumulator. In Fig. 4, two control signals S1 and S2 are required, where S1 is 1 when j = 0 and 0 otherwise, and S2 is 1 when j = B-1 and 0 otherwise. TABLE I CONTENT OF THE ROM WITH DA-OBC b3 b2 b1 b0 Contents of ROM 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 - (C3 +C2+ C1 +C0 )/2 - (C3 +C2+ C1 -C0 )/2 - (C3 +C2 - C1 +C0 )/2 - (C3 +C2 - C1 -C0 )/2 - (C3 - C2 + C1+C0 )/2 - (C3 -C2 + C1 - C0 )/2 - (C3 - C2- C1 + C0 )/2 - (C3 - C2 - C1 - C0 )/2 (C3 - C2 - C1 - C0 )/2 (C3 - C2 - C1 +C0 )/2 (C3 - C2 + C1- C0 )/2 (C3 -C2+ C1 + C0 )/2 (C3 +C2 - C1 - C0 )/2 (C3 +C2+ C1- C0 )/2 (C3 +C2+ C1 - C0 )/2 (C3 +C2+ C1+ C0 )/2
  • 5. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1722 TABLE II REDUCED SIZE ROM (2N-1 ) WITH DA-OBC CODING FOR 4-TAP (N =4) FIR FILTER b2 b1 b0 Contents of ROM 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 - (C3 +C2+ C1 +C0 )/2 - (C3 +C2+ C1 -C0 )/2 - (C3 +C2 - C1 +C0 )/2 - (C3 +C2 - C1 - C0 )/2 - (C3 - C2+ C1 +C0 )/2 - (C3 -C2+ C1 - C0 )/2 - (C3 - C2- C1 +C0 )/2 - (C3 - C2- C1 - C0 )/2 Fig. 4. DA-OBC based implementation of a 4-tap (N =4) FIR filter III.2.3. Distributed Arithmetic with Modified Offset Binary Coding (DA-MOBC) The DA-MOBC can reduce the LUT size from 2N−2 to as low as 2 by exploiting the observation that if the single term inside the LUT can be relocated outside the LUT, then the lower half of the LUT is mirrored version of the upper half of the LUT with only the signs reversed [26]. From Table II, it can be observed that the ROM values except C3 term are mirrored along the line between the 4- th and the 5-th rows. Except C3 term, the LUT in Table II have only 2N-2 possible values depending on the input values. Table III illustrates the new ROM table. LUT size reduction is achieved with the overhead of control circuits such as XOR gates, MUX (multiplexers), and full adders (FA). While the increase in the number of XOR gates is proportional to the input vector length B, the complexities of other control circuits (MUX, FA) increase in proportion to the coefficient word-length as shown in Fig. 5. III.2.4. Distributed Arithmetic Based LUT-Less Architecture Proposed by Yoo and Anderson A recursive LUT reduction to the original DA decreases the LUT size by half at every iteration and eventually the LUT-less DA architecture can be achieved [27]. From Fig. 3, it can be observed that the lower half of LUT (locations whose addresses have a 1 in the MSB) is the same with the sum of the upper half of LUT (locations whose addresses have a 0 in the MSB) and C3 term. Thus, LUT size can be reduced by a factor of 2 with an additional 2x1 MUX and a full adder. After several iterations of the LUT reduction, final LUT-less DA architecture for a 4-tap FIR filter is achieved as shown in Fig. 6. Fig. 5. Block diagram of the LUT-less DA-OBC (DA-MOBC) for a 4-tap FIR filter TABLE III REDUCED SIZE ROM (2N-2 ) WITH DA-MOBC CODING FOR 4-TAP (N =4) FIR FILTER b2 b1 b0 Contents of ROM 0 0 0 0 0 1 0 1 0 0 1 1 - (C2+ C1 + C0 )/2 - (C2+ C1 - C0 )/2 - (C2 - C1 + C0 )/2 - (C2 - C1 - C0 )/2 Fig. 6. LUT-less Architecture for a 4-tap FIR filter proposed by Yoo and Anderson III.2.5. On-Line DA-LUT Architecture for FIR Filters proposed by Eshtawie, Othman The tri-state buffer and a carry look ahead adder (CLA) are the basic digital logic units that are used to construct the on-line LUT DA-LUT Architecture [28] as shown in Fig. 7. Filter coefficients will pass to the CLA only if their buffer enable signal value is 1. Only the needed location contents are calculated whereas, in the DA technique the contents of locations that may not be used when processing the input signal are also computed. Fig. 7. LUT-less Architecture for a 4-tap FIR filter with tri-state buffers and CLA adders
  • 6. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1723 TABLE IV COMPARISON OF VARIOUS ARCHITECTURES FOR A 4 TAP FILTER (N=4). THE SHIFT REGISTER AND THE ADDER/SHIFTER UNITS ARE NOT CONSIDERED SINCE THEY ARE COMMON FOR ALL STRUCTURES. BC REPRESENTS THE COEFFICIENT WORD LENGTH. Logic Functions LUT-based DA (conventional DA) DA-OBC DA-MOBC LUT-less Architecture of Yoo & Anderson On-Line DA-LUT Architecture ROM Size 2N x BC 2N-1 x BC (2N-2 to 2) x BC 0 0 XOR gates 0 N N-1 0 0 2x1 MUX 0 BC BC N x BC 0 Adders 0 0 0 N-1 x BC N-1 CLA’s Tristate Buffer 0 0 0 0 N Adder/Sub 0 0 N x BC 0 0 In DA technique, even if the location content is zero it will be fetched and added to the partial sum, whereas in on-line LUT no addition operation occurs when calculated contents is zero. Hence the execution time for obtaining the filter output is very short. III.2.6. Memory Partitioning and Multiple Memory Bank Algorithms The main drawback of DA based FIR filter is that as the filter size increases, the memory size requirements of the implementation grow exponentially. Memory access time can be a bottleneck for speed of the entire system when the ROM size is very large. A larger LUT can be avoided by partitioning the circuit in to smaller LUTs and to combine their outputs with adders. Several Memory-partitioning and multiple memory bank approaches along with flexible multi-bit data access mechanisms are presented for FIR filtering and inner- product computation in order to reduce the memory-size of DA-based filters [10], [25], [29]-[32]. The N-tap filter is divided into m-smaller filters each having k-input lines such that N= m × k and it is assumed that N is not prime. The total number of clock cycles required for this implementation will be B+log2(m); the additional second term is the number of clock cycles required to implement an adder tree to calculate the sum of the outputs from m LUTS. The decrease in throughput is very less with this implementation when compared with a large LUT required for a high order filter. Hence Eq. (6) is rewritten as:       1 11 0 0 1 11 1 1 0 2 z km- i i z i zk z kB m j i i j j z i zk y n c x c x                                         (15) For example, a 32 tap DA FIR filter would require a large LUT with 232 entries. This problem can be overcome by breaking up the LUT into 8 smaller LUT units with each having 4 input lines. Hence a single large LUT with 232 memory elements is replaced by 8 LUTS each having only 24 =16 memory elements. Fig. 8 shows the implementation of a 4-tap FIR filter based on equation (15) for m=2 and k=2. Fig. 8. Implementation of a 4-tap FIR filter using memory partitioning with m=k=2 TABLE VI COMPARISON OF VARIOUS REQUIREMENTS WITH AND WITHOUT MEMORY-PARTITIONING Memory Variants No. of Address bits Memory size Clock cycles required Without memory partitioning (Full LUT implementation) N 2N B With Memory- partitioning (ROM decomposition) N k m   2 2 N / m k   m or m  2 B mlog 0 5 10 15 20 Full LUT Partitioned LUT LUTSize ClockCycles Fig. 9. Comparison of a 4-tap FIR filter (N=4) with and without memory partitioning with m=k=2 with the input word length B=8 III.2.7. Systolic Architectures for DA-Based Implementation of FIR Filters Systolic architectures can result in cost effective, high performance system by exploiting high-level of concurrency using pipelining or parallel processing or both [11]. Novel one- and two-dimensional systolic structures were designed for computation of circular convolution using distributed arithmetic (DA) that resulted in less memory and less area-delay complexity compared with the other DA-based structures for circular convolution [33]. One- and two-dimensional fully pipelined computing structures are presented for area-delay-power-efficient
  • 7. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1724 implementation of FIR filter by systolic decomposition of distributed arithmetic based inner-product computation [34]. A linear array consisting of number of Processing elements (PEs) and an output cell is shown in Fig. 10. Each PE consists of a ROM of 2M words. Each PE reads the content on its ROM at the location specified by the input bit vector during a cycle period. The value read from the ROM is then added to the input available to the PE from its left. During every cycle period, the sum is then transferred as output to its right as shown in Figs. 11. Each output cell contains a shift-register and an adder. It shifts the content of its register left by one position and then adds the available input to the recently shifted content in its register during every cycle period. For high-throughput implementation of FIR filters, a two dimensional systolic array is used as shown in Figs. 12. FPGA realization of FIR filters for high-speed and medium-speed by using modified distributed arithmetic architectures were suggested by Jiafeng Xie et al., which made use of pipelined registers and pipelined shift adder tree [35]. III.2.8. DA Based Architectures for Adaptive FIR Filtering Adaptive filtering DSP algorithms are employed in several hand held mobile devices for applications such as echo cancellation, signal de-noising, and channel equalization. New hardware adaptive filter architecture for very high throughput LMS adaptive filters using distributed arithmetic (DA) has been suggested where building adaptive DA filters requires recalculating the contents of LUTs for each adaptation. By using an auxiliary LUT with special addressing, the efficiency and throughput of DA adaptive filters can be of the same order as fixed DA filters [36], [37]. A new hardware architecture using conjugate distributed arithmetic (CDA) for high throughput hardware implementations of LMS adaptive filters is presented where all possible combination sums of the input signal samples are stored in the LUT and updated at the arrival of every sample using an efficient update procedure [36], [38]. Fig. 10. Linear 1-D systolic array for DA-based implementation of FIR filter Figs. 11. (a) Function of PE, (b) Function of output cell of 1-D systolic array Figs. 12. (a) 2-D systolic array for FIR filter; (b) function of PE; and (c) function of Shift Adder (SA) cell
  • 8. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1725 IV. Conclusion The recent significant researches that are concerned with reducing the overall area-delay-power complexities of memory based realization of FIR filters are presented in this paper. A detailed survey of memory-based implementation of FIR filters using Distributed Arithmetic is also presented stating its merits over direct memory-based implementation of FIR filters. The main goal behind this review is to assist the researchers in the field of Digital signal processing to understand the available methods and adopt the same in various application environments. Many algorithms and architectures have been suggested in the literature to reduce the area and time- complexities of memory-based implementation of FIR filters but many more efficient algorithms and architectures need to be developed to design flexible area-delay-power efficient memory based FIR filters to meet the growing requirements of DSP applications. References [1] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications., NJ: Prentice-Hall, 1996. [2] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [3] G. R. Goslin, “A Guide to Using Field Programmable Gate Arrays (FPGAs) for Application-Specific Digital Signal Processing Performance”, XILINX, 1995. [4] M. Yamada, and A. Nishihara, “High-Speed FIR Digital Filter with CSD Coefficients Implemented on FPGA”, in Proc. IEEE Design Automation Conference, 2001, pp. 7-8. [5] R. I. Hartley, “Subexpression sharing in filters using canonic signed-digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, no. 10, pp. 677–688, Oct. 1996. [6] M. Potkonjak, M. B. Srivastava, and A. Chandrakasan, “Multiple constant multiplications: Efficient and versatile framework and algorithms for exploring common subexpression elimination,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 15, no. 2, pp. 151–165, Feb. 1996. [7] A. G. Dempster and M. D. Macleod, “Generation of signed-digit representations for integer multiplication,” IEEE Signal Process. Lett., vol.11, no. 8, pp. 663–665, Aug. 2004. [8] M. D. Macleod and A. G. Dempster, “Multiplierless FIR filter design algorithms,” IEEE Signal Processing Letters, vol. 12, no. 3, pp. 186–189,Mar. 2005. [9] Douglas L. Maskell, Jussipekka Leiwo and Jagdish C. Patra,”The Design of Multiplierless FIR Filters with a Minimum Adder Step and Reduced Hardware complexity,” in Proc. 2006 IEEE International Symposium on Circuits and Systems, , p. 4,May 2006. [10] H.-R. Lee, C.-W. Jen, and C.-M. Liu, “On the design automation of the memory-based VLSI architectures for FIR filters,” IEEE Trans. Consumer. Electronics, vol. 39, no. 3, pp. 619–629, Aug. 1993. [11] H. T. Kung, “Why systolic architectures?,” IEEE Computer, vol. 15,no. 1, pp. 37–45, Jan. 1982. [12] R.Wyrzykowski and S. Ovramenko, “Flexible systolic architecture for VLSI FIR filters,” Proc. Inst. Elect. Eng.— Comput. Digit. Techniques,vol. 139, no. 2, pp. 170–172, Mar. 1992. [13] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible celllevel systolic architecture for high throughput implementation of 2-D FIR filters,” Proc. Inst. Elect. Eng.—Comput. Digit. Techniques, vol.143, no. 5, pp. 436–439, Nov. 1996. [14] D. F. Chiper, “A new systolic array algorithm for memory-based VLSI array implementation of DCT,” in Proc. Second IEEE Symp. on Computers and Communications, pp. 297–301,July 1997. [15] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, “Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST,”IEEE Trans. Circuits Syst-I: Regular Papers, vol. 52, no. 6, pp. 1125–1137, June 2005. [16] C. Cheng and K. K. Parhi, “A novel systolic array structure for DCT,”IEEE Trans. Circuits Syst-II: Express Briefs, vol. 52, no. 7, pp. 366–369,July 2005. [17] P. K. Meher, J. C. Patra, and M. N. S. Swamy, “New systolic algorithm and array architecture for prime-length discrete sine transform,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54, no. 3, pp. 262–266,Mar. 2007. [18] P. K. Meher and M. N. S. Swamy, “High-throughput memory- based architecture for DHT using a new convolutional formulation,” IEEETrans. Circuits Syst. II: Express Briefs, vol. 54, no. 7, pp. 606–610,July 2007. [19] P. K. Meher, “Low-latency hardware-efficient memory-based design for large-order FIR digital filters”, Sixth International Conference on Information, Communications and Signal Processing(ICICS 2007), Dec. 2007 [20] P. K. Meher, “New approach to LUT implementation and accumulation for memory-based multiplication,” in Proc. 2009 IEEE Int. Symp.Circuits Syst., ISCAS’09, May 2009, pp. 453– 456. [21] P. K. Meher, “New look-up-table optimizations for memory- based multiplication,” in Proc. Int. Symp. Integr. Circuits (ISIC’09), Dec.2009. [22] P. K. Meher, “New approach to lookup table design and memory based realization of FIR digital filter”, IEEE Transactions on circuit and systems-I, Vol.57, NO.3, March 2010. [23] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, “Digital filter for PCM encoded signals,” U.S. Patent 3 777 130, Dec. 4, 1973. [24] A. Peled and B. Liu, “A new hardware realization of digital filters,” IEEE Trans. Acoustic, Speech, Signal Process., vol. 22, no. 6, pp.456–462, Dec. 1974. [25] S. A. White, “Applications of the distributed arithmetic to digital signal processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp. 5–19,Jul. 1989. [26] P. Choi, S.-C. Shin, and J.-G. Chung, “Efficient ROM size reduction for distributed arithmetic,” in Proc. IEEE Int. Symp. Circuits System (ISCAS), May 2000, vol. 2, pp. 61–64. [27] H. Yoo and D. V. Anderson, “Hardware-efficient distributed arithmetic architecture for high-order digital filters,” in Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (ICASSP), Mar. 2005, vol. 5, pp. v/125–v/128. [28] Mohamed A. Eshtawie and Masuri Othman," On-Line DA-LUT Architecture for High-Speed High-Order Digital FIR Filters”, in the tenth IEEE international conference on communication systems, Nov. 2006, Singapore. [29] C.-F. Chen, “Implementing FIR filters with distributed arithmetic,” IEEE Trans. Acoustic., Speech, Signal Process., vol. 33, no. 5, pp.1318–1321, Oct. 1985. [30] K. Nourji and N. Demassieux, “Optimal VLSI architecture for distributed arithmetic-based algorithms,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Apr. 1994, pp. II/509–II/512. [31] S.-S. Jeng, H.-C. Lin, and S.-M. Chang, “FPGA implementation of FIR filter using M-bit parallel distributed arithmetic,” in Proc.2006,IEEE Int. Symp. Circuits Systems (ISCAS), May 2006, p. 4. [32] M. Mehendale, S. D. Sherlekar, and G..Venkatesh “Area-delay trade-off in distributed arithmetic based implementation of FIR filters,” in Proc.10th Int. Conf. VLSI Design, Jan. 1997, pp. 124– 129. [33] P. K. Meher, “Hardware-efficient systolization of DA-based calculation of finite digital convolution,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707–711, Aug. 2006. [34] P. K. Meher, S. Chandrasekaran, and A. Amira, “FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic,”IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017, July 2008.
  • 9. K. G. Shanthi, N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1726 [35] Jiafeng Xie n, JianjunHe,GuanzhengTan,” FPGA realization of FIR filters for high-speed and medium-speed by using modified distributed arithmetic architectures”, Microelectronics Journal 41, April 2010 pp. 365–370. [36] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, NJ, 2002. [37] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, “LMS adaptive filters using distributed arithmetic for high throughput,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1327–1337, July 2005. [38] Walter Huang, Venkatesh Krishnan, and David V. Anderson,” Conjugate Distributed Arithmetic Adaptive FIR Filters and their Hardware Implementation”, MWSCAS '06,pp.295-299, Circuits and Systems, Volume: 2, 2006. Authors’ information K. G. Shanthi (Corresponding author) completed her B.E in 1996 from Madras university, Chennai and obtained her ME in 2005 from the Government college of technology, Coimbatore. Her major in PG course is VLSI Design. Her field of interest includes design of FPGA based VLSI architectures, VLSI signal processing. She is currently working as Associate professor at R.M.K Engineering College, Chennai. She is currently pursuing her research in the field of VLSI Design. Address: Associate Professor /Department of Electronics & Communication Engg, R.M.K Engineering College, Chennai, Tamilnadu, India .Pin code: 601 206. E-mail: kgs.ece@rmkec.ac.in Nagarajan N. received his B.Tech and M.E. degrees in Electronics Engineering at M.I.T Chennai. He received his PhD in faculty of I.C.E. from Anna University, Chennai. He is currently working as Principal C.I.E.T, Coimbatore. His specialization includes optical, wireless Adhoc and Sensor Networks.