Submit Search
Upload
Memory Based Hardware Efficient Implementation of FIR Filters
•
0 likes
•
24 views
Dr.SHANTHI K.G
Follow
Memory Based Hardware Efficient Implementation of FIR Filters
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 9
Download now
Download to read offline
Recommended
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
sipij
Fundamentals and image compression models
Fundamentals and image compression models
lavanya marichamy
D017542937
D017542937
IOSR Journals
Image compression
Image compression
GARIMA SHAKYA
Image compression: Techniques and Application
Image compression: Techniques and Application
Nidhi Baranwal
Fractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree Decomposition
Harshit Varshney
Lab manual
Lab manual
Mat Awang
Image compression
Image compression
Huda Seyam
Recommended
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
sipij
Fundamentals and image compression models
Fundamentals and image compression models
lavanya marichamy
D017542937
D017542937
IOSR Journals
Image compression
Image compression
GARIMA SHAKYA
Image compression: Techniques and Application
Image compression: Techniques and Application
Nidhi Baranwal
Fractal Image Compression Using Quadtree Decomposition
Fractal Image Compression Using Quadtree Decomposition
Harshit Varshney
Lab manual
Lab manual
Mat Awang
Image compression
Image compression
Huda Seyam
Compression
Compression
anithabalaprabhu
Data Redundacy
Data Redundacy
Poonam Seth
Compression techniques
Compression techniques
m_divya_bharathi
image compresson
image compresson
Ajay Kumar
Image compression
Image compression
Bassam Kanber
Image compression
Image compression
Shiva Krishna Chandra Shekar
image basics and image compression
image basics and image compression
murugan hari
Presentation on Image Compression
Presentation on Image Compression
Fat Fish Marketing Pvt Ltd
Image compression models
Image compression models
priyadharshini murugan
Compressionbasics
Compressionbasics
Rohini R Iyer
Interpixel redundancy
Interpixel redundancy
Naveen Kumar
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
ijma
Image compression
Image compression
°ღ•ŚℋÚßℋÁℳ Пℐğáℳ
Image compression
Image compression
Ishucs
Hufman coding basic
Hufman coding basic
radthees
Run length encoding
Run length encoding
praseethasnair123
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeer
Ranbeer Tyagi
Image compression
Image compression
Ale Johnsan
Compression: Images (JPEG)
Compression: Images (JPEG)
danishrafiq
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
ijaia
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
More Related Content
What's hot
Compression
Compression
anithabalaprabhu
Data Redundacy
Data Redundacy
Poonam Seth
Compression techniques
Compression techniques
m_divya_bharathi
image compresson
image compresson
Ajay Kumar
Image compression
Image compression
Bassam Kanber
Image compression
Image compression
Shiva Krishna Chandra Shekar
image basics and image compression
image basics and image compression
murugan hari
Presentation on Image Compression
Presentation on Image Compression
Fat Fish Marketing Pvt Ltd
Image compression models
Image compression models
priyadharshini murugan
Compressionbasics
Compressionbasics
Rohini R Iyer
Interpixel redundancy
Interpixel redundancy
Naveen Kumar
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
ijma
Image compression
Image compression
°ღ•ŚℋÚßℋÁℳ Пℐğáℳ
Image compression
Image compression
Ishucs
Hufman coding basic
Hufman coding basic
radthees
Run length encoding
Run length encoding
praseethasnair123
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeer
Ranbeer Tyagi
Image compression
Image compression
Ale Johnsan
Compression: Images (JPEG)
Compression: Images (JPEG)
danishrafiq
What's hot
(19)
Compression
Compression
Data Redundacy
Data Redundacy
Compression techniques
Compression techniques
image compresson
image compresson
Image compression
Image compression
Image compression
Image compression
image basics and image compression
image basics and image compression
Presentation on Image Compression
Presentation on Image Compression
Image compression models
Image compression models
Compressionbasics
Compressionbasics
Interpixel redundancy
Interpixel redundancy
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
A N A LTERNATIVE G REEN S CREEN K EYING M ETHOD F OR F ILM V ISUAL E ...
Image compression
Image compression
Image compression
Image compression
Hufman coding basic
Hufman coding basic
Run length encoding
Run length encoding
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeer
Image compression
Image compression
Compression: Images (JPEG)
Compression: Images (JPEG)
Similar to Memory Based Hardware Efficient Implementation of FIR Filters
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
ijaia
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
International Journal of Engineering Inventions www.ijeijournal.com
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
IRJET Journal
F1074145
F1074145
IJERD Editor
International Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
Design of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MAC
IRJET Journal
Fault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch Codes
IJERA Editor
A05410105
A05410105
IOSR-JEN
Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...
IAEME Publication
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
VLSICS Design
INDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORT
ABHISHEK DABRAL
Design of Optimized FIR Filter Using FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
IJEEE
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
ijsrd.com
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
IJERA Editor
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
IJORCS
Performance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate Applications
IJEEE
Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...
IJECEIAES
Similar to Memory Based Hardware Efficient Implementation of FIR Filters
(20)
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
CANONIC SIGNED DIGIT BASED DESIGN OF MULTIPLIER-LESS FIR FILTER USING SELFORG...
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
F1074145
F1074145
International Journal of Engineering Research and Development
International Journal of Engineering Research and Development
Design of Area Efficient Digital FIR Filter using MAC
Design of Area Efficient Digital FIR Filter using MAC
Fault Tolerant Parallel Filters Based On Bch Codes
Fault Tolerant Parallel Filters Based On Bch Codes
A05410105
A05410105
Performance evaluation of efficient structure for fir decimation filters usin...
Performance evaluation of efficient structure for fir decimation filters usin...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILT...
INDUSTRIAL TRAINING REPORT
INDUSTRIAL TRAINING REPORT
Design of Optimized FIR Filter Using FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
Design of Low Pass Digital FIR Filter Using Cuckoo Search Algorithm
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
Performance Analysis and Simulation of Decimator for Multirate Applications
Performance Analysis and Simulation of Decimator for Multirate Applications
Design and implementation of DA FIR filter for bio-inspired computing archite...
Design and implementation of DA FIR filter for bio-inspired computing archite...
More from Dr.SHANTHI K.G
unit4 DTFT .pptx
unit4 DTFT .pptx
Dr.SHANTHI K.G
unit4 sampling.pptx
unit4 sampling.pptx
Dr.SHANTHI K.G
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Dr.SHANTHI K.G
Laplace Transform Problems
Laplace Transform Problems
Dr.SHANTHI K.G
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Dr.SHANTHI K.G
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Dr.SHANTHI K.G
Unit-1 Classification of Signals
Unit-1 Classification of Signals
Dr.SHANTHI K.G
Unit 1 Operation on signals
Unit 1 Operation on signals
Dr.SHANTHI K.G
Scope of signals and systems
Scope of signals and systems
Dr.SHANTHI K.G
Unit 1 -Introduction to signals and standard signals
Unit 1 -Introduction to signals and standard signals
Dr.SHANTHI K.G
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Dr.SHANTHI K.G
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
Dr.SHANTHI K.G
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
Dr.SHANTHI K.G
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
Dr.SHANTHI K.G
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
Dr.SHANTHI K.G
Unit-3:Magnetostatics
Unit-3:Magnetostatics
Dr.SHANTHI K.G
Electric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipole
Dr.SHANTHI K.G
Gauss law and its Applications
Gauss law and its Applications
Dr.SHANTHI K.G
Electric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux density
Dr.SHANTHI K.G
Electric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of charge
Dr.SHANTHI K.G
More from Dr.SHANTHI K.G
(20)
unit4 DTFT .pptx
unit4 DTFT .pptx
unit4 sampling.pptx
unit4 sampling.pptx
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Fourier and Laplace transforms in analysis of CT systems PDf.pdf
Laplace Transform Problems
Laplace Transform Problems
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Orthogonal coordinate systems- Cartesian ,Cylindrical ,Spherical
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Fourier Transform ,LAPLACE TRANSFORM,ROC and its Properties
Unit-1 Classification of Signals
Unit-1 Classification of Signals
Unit 1 Operation on signals
Unit 1 Operation on signals
Scope of signals and systems
Scope of signals and systems
Unit 1 -Introduction to signals and standard signals
Unit 1 -Introduction to signals and standard signals
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
Unit V-Electromagnetic Fields-Normal incidence at a plane dielectric boundary...
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
UNIT IV - WAVE EQUATIONS AND THEIR SOLUTION
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 -Notes
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit 4 - two marks
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
TIME-VARYING FIELDS AND MAXWELL's EQUATIONS -Unit4- problems
Unit-3:Magnetostatics
Unit-3:Magnetostatics
Electric potential, Electric Field and Potential due to dipole
Electric potential, Electric Field and Potential due to dipole
Gauss law and its Applications
Gauss law and its Applications
Electric field intensity due to a charged ring and Electric flux density
Electric field intensity due to a charged ring and Electric flux density
Electric field intensity due to infinite line charge and infinte sheet of charge
Electric field intensity due to infinite line charge and infinte sheet of charge
Recently uploaded
Research Methodology for Engineering pdf
Research Methodology for Engineering pdf
CaalaaAbdulkerim
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
Manicka Mamallan Andavar
OOP concepts -in-Python programming language
OOP concepts -in-Python programming language
SmritiSharma901052
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
sandhya757531
multiple access in wireless communication
multiple access in wireless communication
panditadesh123
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
Sneha Padhiar
Earthing details of Electrical Substation
Earthing details of Electrical Substation
stephanwindworld
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
Sneha Padhiar
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
Erbil Polytechnic University
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
bim.edu.pl
Virtual memory management in Operating System
Virtual memory management in Operating System
Rashmi Bhat
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
Stephen Sitton
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
siddharthjain2303
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
mohitesoham12
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
sahilsajad201
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal Compressors
AlirezaBagherian3
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
elesangwon
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
HafizMudaserAhmad
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Erbil Polytechnic University
Main Memory Management in Operating System
Main Memory Management in Operating System
Rashmi Bhat
Recently uploaded
(20)
Research Methodology for Engineering pdf
Research Methodology for Engineering pdf
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
OOP concepts -in-Python programming language
OOP concepts -in-Python programming language
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
multiple access in wireless communication
multiple access in wireless communication
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
Earthing details of Electrical Substation
Earthing details of Electrical Substation
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
Virtual memory management in Operating System
Virtual memory management in Operating System
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal Compressors
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Main Memory Management in Operating System
Main Memory Management in Operating System
Memory Based Hardware Efficient Implementation of FIR Filters
1.
International Review on
Computers and Software (I.RE.CO.S.), Vol. 8, N. 7 ISSN 1828-6003 July 2013 Manuscript received and revised June 2013, accepted July 2013 Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved 1718 Memory Based Hardware Efficient Implementation of FIR Filters K. G. Shanthi, N. Nagarajan Abstract – Finite impulse response (FIR) digital filters are key components used in many digital signal processing (DSP) systems because of their linear phase, stability, fewer finite precision errors and regular structure. The real time realization of FIR filter with less hardware requirement and less latency has become very critical with increasing developments in very large scale integration (VLSI) technology. The objective of this paper to explore the current trends in the development of algorithms and architectures for memory based realization of FIR filters that are mainly concerned with reducing the overall area-delay-power complexities. The purpose of this study is to compare these architectures based on ROM size, delay and throughput. The results presented here would assist the researchers in the field of Digital Signal processing to select best architecture for an application based on requirements. New algorithms and architectures need to be developed to design area-delay-power-efficient FIR filters for various demanding DSP applications. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved. Keywords: Finite Impulse Response Filter, Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), Distributed Arithmetic (DA), Lookup Table (LUT) Nomenclature y[n] The FIR Filter Output N Order of the Filter Ci Constant coefficients Xi Input data B Input Word length I. Introduction Digital signal processing (DSP) is playing a vital role in the significant advancements of digital technology taking place currently around the world. Digital communication, speech and image data compression, speech recognition, spectral estimation and analysis, adaptive filtering applications, wired and wireless communication, multimedia systems, biomedical instrumentation, satellite and aerospace control, remote sensing are the major areas where DSP has created a major impact [1]. The increased daily use of digital technology has led to the development of improved algorithms and architectures to design the DSP systems with less power dissipation, higher speed performance and less area complexity. Several architectural solutions have been made to minimize the arithmetic complexities of the algorithms in order to reduce the overall area-delay- power complexities [2]. Finite impulse response (FIR) filter is used as a basic tool in many DSP applications. Digital filters are used to modify signal characteristics in time or frequency domain and are used in many DSP systems to perform signal preconditioning, anti-aliasing, band selection, interpolation, low-pass filtering etc [1]. Traditionally, the design methods were mainly focused on multiplier-based architectures to implement the Multiply-and-Accumulate (MAC) blocks that constitute the central piece in FIR filters and several DSP functions. These multipliers consume most of the resources of the system and also involve most of the computation-time. The number of multiply and accumulate operations required per filter output increases with the filter order and thereby real time implementations of these filters is a challenging task. A discrete-time linear finite impulse response (FIR) filter generates the output y[n] as a sum of delayed and scaled input samples x[n].A N- tap FIR digital filter is represented as: 1 0 N i y n c i x n i (1) where y[n] is the FIR filter output, c[i] represents the filter coefficients, x[n-i] is the input data and n is the time index starting from 0. A direct implementation of Eq. (1) requires N Multiply-and-Accumulate blocks, which is expensive in terms of area and speed. To resolve this problem many multiplier-less architectures were proposed in the recent years which are broadly classified in to two basic categories according to how they manipulate the filter coefficients for the multiply operation. The first type of multiplier-less technique is the conversion-based approach and the second type is memory based implementation approach. For the past one decade, there has been a growing trend to implement DSP functions in Field Programmable Gate Arrays (FPGAs) rather than on
2.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1719 Application specific integrated circuits (ASIC) and DSP chips. The implementation on ASICs is not preferred due to high development costs and time-to-market factors. Sequential-execution architecture of programmable DSP processors prevents them from achieving the desired performance. In this context, FPGA platform provides a very attractive solution that balance high flexibility with the option to reconfigure, time-to-market, cost and performance [3]. This paper is organized as follows: In Section 2, a brief overview of the conversion-based multiplier-less FIR filters is presented. Section 3 explores the algorithmic aspects and architectural approach of memory based FIR filters and an in-depth review of FIR filters based on DA. Finally the Conclusion is presented in Section 4. II. Conversion-Based Multiplier-Less Implementation of FIR Filters In this approach the coefficients are transformed to other numeric representations so that the multiplications are implemented with adder/subtractors and shifters. A coefficient in "n-bit" signed-digit representation can be written as: 1 0 2 n- i i i C b (2) where bi is taken from the set {-1 ,0 ,1 }. The representation that has minimum non-zero digits and no consecutive non-zero digits is known as the canonic signed-digit (CSD) representation[2]. Since in shift and add multiplication, non-zero digits represent additions (or subtractions), CSD therefore is significantly more efficient in adders than binary representations. Multipliers [4] in the filter whose coefficients are expressed as canonic signed digit code are realized with wired-shifters, adders and subtractors. Common subexpression elimination [CSE] is a numerical transformation of the constant multiplications that can lead to efficient hardware implementations in terms of area, power and speed [5]-[8]. Subexpression elimination can only be performed on constant multiplications that operate on a common variable. It is the process of examining the shift and add implementations of constant multiplications and finding the redundant operations. Once the redundancies are found, these operations can be performed once and can be shared among the constant multiplications so that number of adders and shifters for implementation are minimized. Common subexpression (CSE) techniques attempt to minimize the number of additions in the multiplier block by reusing terms. These terms can be canonic signed digit (CSD) [5], minimal signed digit (MSD), or all signed digit (ASD) [7]. Multiplierless FIR Filter Design Algorithms by Malcolm D. Macleod, and Andrew G. Dempster introduced a new CSE algorithm, which searches a bounded number of Minimal Signed Digit (MSD) representations [8]. Douglas L. Maskell, Jussipekka Leiwo and Jagdish C. Patra [9] reduced both the coefficient word length and the number of non-zero bits in the filter coefficients so that the adder step can be minimized that resulted in reducing the hardware complexity of linear phase FIR digital filters. III. Algorithms and Architectures for Memory Based FIR Filters The memory based approach involves the use of memories (RAMs, ROMs) or Look-Up Tables (LUTs) that store pre-computed values that can be readout for multiplication operation. With the advancements in the VLSI technology, the semiconductor memory has become cheaper, faster and more efficient in terms power dissipation. Memory-based FIR filters consequently are gaining substantial popularity in the DSP environment. These filters result in high-throughput and reduced- latency since the memory-access time is usually very much shorter compared with multiplication time. They have much less dynamic power consumption due to minimal switching activities associated in obtaining the output product/inner product values by memory read operations. There are two types of memory based FIR filters. One of the techniques is the direct memory-based implementation of FIR filters [10], while the other is based on distributed arithmetic (DA). III.1. Direct-Memory-Based FIR Filters In the direct-memory-based implementations [10], the multiplications of input values with the fixed coefficients can be replaced by a ROM or look-up-table (LUT) which contains the pre-computed product values for all possible values of input samples. Let X be an input word to be multiplied with a W-bit fixed coefficient C. If X is assumed to be an unsigned binary number of word-length N, there are 2N possible values of X, and hence there are 2N possible values of product Y=C*X. Therefore direct memory based implementation of multiplication would require a memory unit of 2N words to be used as LUT consisting of pre-computed product values corresponding to all possible values of X as shown in Fig. 1. The product C* Xi is stored at the memory location whose address is the same as the binary value of Xi for 0<2N -1, such that if N-bit binary value of Xi is used as address for the memory-unit, then the corresponding product value is read-out from the memory. However, the size of ROM increases exponentially with the input length. ROM with 2N words X N Y=C*X N+W Fig. 1. Structure of Direct-memory-based multiplier
3.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1720 A direct implementation of equation (1) requires N number of multiplications where N represents the tap length. Each of the multipliers which involve the multiplications of input values with the fixed coefficients can be replaced by a ROM or LUT, where each of the LUTs contains the pre-computed product values for all possible values of input samples. A systolic system consists of a set of interconnected cells, each capable of performing some simple operation [2], [11]. Systolic designs are very efficient for hardware implementation of computation-intensive DSP applications because of the features like simplicity, regularity and modularity of structure. They also produce high-throughput rate by using pipelining or parallel processing or both. The systolic array for FIR filter of order N is shown in Fig. 2.It consists of N Processing elements (PEs), where each PE during a cycle period performs one MAC operation. Several algorithms and architectures have been suggested for systolization of FIR filters [12], [13]. Fig. 2. Structure of a linear systolic array for an N-tap FIR filter The average computation time and the latency of direct-memory based implementation is high for large transform-lengths and therefore several novel algorithms have been proposed in the last few years to decompose the sinusoidal transforms into multiple number of circular convolution or convolution-like structures of smaller convolution-lengths [14]–[18]. These decompositions have resulted in improvement of throughput performance with substantial reduction of hardware and computational latency. A concurrent recursive algorithm is derived for the computation of FIR filter, and is ported further to a two-dimensional systolic structure for reduced-latency direct-ROM-based realization of large order filters [19]. A new approach to LUT design referred to as the odd- multiple-storage (OMS) scheme is presented, where only the odd multiples of the fixed coefficient are required to be stored thereby the memory-size is reduced to half at the cost of some increase in combinational circuit complexity[20]. By the antisymmetric product coding (APC) approach, the LUT size can also be reduced to half, where the product words are recoded as antisymmetric pairs [21]. Two new approaches are suggested for designing the LUT for LUT-multiplier- based implementation, where the memory-size is reduced to nearly half of the conventional approach [22]. III.2. FIR Filters Based on Distributed Arithmetic (DA) The main operations required for DA-based computation of inner product are a sequence of lookup table accesses followed by shift-accumulation operations of the LUT output to obtain the desired result. DA-based computation is well suited for FPGA realization, because the LUT as well as the shift-add operations, can be efficiently mapped to the LUT-based FPGA logic structures. DA is a bit-serial operation that implements a series of fixed-point MAC operations in a fixed number of steps, regardless of the number of terms to be calculated. DA is often preferred since it eliminates the need for hardware multipliers and is capable of implementing large filters with very high throughput. Croisier et al had proposed the DA algorithm for digital filter implementations in 1973 [23]. The first detailed discussion of DA was given by Abraham Peled and Bede Liu in 1974 at the Arden House Workshop on Digital Signal Processing [24]. S.A.White [25] discussed an organization to form the inner product of a pair of data vectors and gave a criterion for minimizing the ROM size and made modifications to increase the speed by employing techniques such as bit pairing or partitioning the input words into the most significant half and least significant half, thereby introducing parallelism in the computation. III.2.1. Conventional DA approach Consider the inner product of two N point vectors C and X given by: 1 0 N- i i i y n c x (3) where Ci represents the constant coefficients, Xi is the input data which may change from time to time. Let the input sample represent the data coded as B-bit 2’s complement binary number such that |xi|<1. The input sample is given by: 1 0 1 2 B j i i i j j x x x (4) where xi,j ∊ {0, 1}, xi0 is the sign bit and xi, B-1 is the Least significant bit (LSB).Then substituting (4) in (3), the output can be expressed as: 1 1 0 0 1 2 N B j i i i j i j y n c x x (5) 1 1 1 0 0 1 0 2 N B N j i i i i j i j i y n c x c x (6) For a given set of Ci (i = 0, 1, 2,…, N − 1), the terms in the brackets may take one of 2N possible values that can be precomputed and stored in an LUT. All possible 2N values of Ci can be read out from the ROM using the N bit sequence {xi,j for 0≤i≤N} as address bits. These intermediate results are accumulated in B clock cycles to produce one filter output y[n].
4.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1721 Fig. 3. LUT-based DA implementation of a 4-tap (N =4) FIR filter Original LUT-based DA implementation of a 4-tap (N =4) FIR filter consists of three units: the shift register unit, the DA base unit, and the adder/shifter unit. The LUT contains all 16 possible combination sums of the filter weights C0, C1, C2, C3. The bank of shift registers in Fig. 3 stores four consecutive input samples(x[n-i], i=0, 1, 2, 3). The concatenation of rightmost bits of the shift registers becomes the address of the LUT. The shift register is shifted right at every clock cycle. The corresponding LUT entries are also shifted and accumulated in B consecutive times to generate the output y[n]. The sign bits {xi0} are the last bits to arrive. The clock period in which the sign bits all simultaneously arrive is called the "sign-bit time”. During the sign-bit time the control signal S = 1, otherwise S = 0. The time-complexity of FIR filters based on Distributed Arithmetic is independent of the transform- size or the number of filter-taps and depends only on the word-length whereas time-complexity of Direct-memory- based FIR filters is independent of word-length but increases linearly with the transform size. III.2.2. Distributed Arithmetic with Offset Binary Coding The memory requirements (2N ) of DA-based implementation for FIR filter increases exponentially with the filter order N. With the use of offset binary coding(OBC) the memory size can be reduced by half to 2N-1 words [2], [25]. The input data will be interpreted as -1 for 0 and +1 for 1 in offset binary coding. Let the input sample xi in offset binary coding be represented as: 1 2 i i ix x x (7) In 2's-complement notation the negative of Eq. (4) is written as: 1 1 0 1 2 2 B Nj i i i j j x x x (8) where the over score symbol indicates the complement of a bit. From Eqs. (4) and (8), the Eq. (7) can be rewritten as: 1 1 0 0 1 1 2 2 2 B- Nj i i i i j i j j x x x x x (9) Define dij: 0 0 0 0 i j i j i j i j i i d x x j d x x j (10) where dij ∊ {-1, 1}. Eq. (9) can be rewritten as: 1 1 0 1 2 2 2 B Nj i i j j x d (11) Using Eq. (11) in Eq. (3): 1 1 1 0 0 1 2 2 2 N B Nj i i j i j y n c d (12) 1 1 1 1 0 0 0 1 1 2 2 2 2 B N N Nj i i j i j i i y n c d c (13) 1 1 0 2 2 B Nj j initial j y n D D (14) where 1 1 0 0 1 1 2 2 N N j i i j initial i i i D c d , D c . The OBC scheme is characterized by Eq. (14). Table I shows the content of the ROM for N=4. From Table I, notice that the upper-half and the lower- half ROM values are mirrored with sign reversed. Therefore it is possible to reduce the ROM size by a factor of 2 as shown in Table II. Fig. 4 shows a typical architecture for DA-OBC based implementation of a 4-tap (N =4) FIR filter. The XOR gates are used for address decoding; the MUX with the constant Dinitial provides the initial value to the shift accumulator. In Fig. 4, two control signals S1 and S2 are required, where S1 is 1 when j = 0 and 0 otherwise, and S2 is 1 when j = B-1 and 0 otherwise. TABLE I CONTENT OF THE ROM WITH DA-OBC b3 b2 b1 b0 Contents of ROM 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 - (C3 +C2+ C1 +C0 )/2 - (C3 +C2+ C1 -C0 )/2 - (C3 +C2 - C1 +C0 )/2 - (C3 +C2 - C1 -C0 )/2 - (C3 - C2 + C1+C0 )/2 - (C3 -C2 + C1 - C0 )/2 - (C3 - C2- C1 + C0 )/2 - (C3 - C2 - C1 - C0 )/2 (C3 - C2 - C1 - C0 )/2 (C3 - C2 - C1 +C0 )/2 (C3 - C2 + C1- C0 )/2 (C3 -C2+ C1 + C0 )/2 (C3 +C2 - C1 - C0 )/2 (C3 +C2+ C1- C0 )/2 (C3 +C2+ C1 - C0 )/2 (C3 +C2+ C1+ C0 )/2
5.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1722 TABLE II REDUCED SIZE ROM (2N-1 ) WITH DA-OBC CODING FOR 4-TAP (N =4) FIR FILTER b2 b1 b0 Contents of ROM 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 - (C3 +C2+ C1 +C0 )/2 - (C3 +C2+ C1 -C0 )/2 - (C3 +C2 - C1 +C0 )/2 - (C3 +C2 - C1 - C0 )/2 - (C3 - C2+ C1 +C0 )/2 - (C3 -C2+ C1 - C0 )/2 - (C3 - C2- C1 +C0 )/2 - (C3 - C2- C1 - C0 )/2 Fig. 4. DA-OBC based implementation of a 4-tap (N =4) FIR filter III.2.3. Distributed Arithmetic with Modified Offset Binary Coding (DA-MOBC) The DA-MOBC can reduce the LUT size from 2N−2 to as low as 2 by exploiting the observation that if the single term inside the LUT can be relocated outside the LUT, then the lower half of the LUT is mirrored version of the upper half of the LUT with only the signs reversed [26]. From Table II, it can be observed that the ROM values except C3 term are mirrored along the line between the 4- th and the 5-th rows. Except C3 term, the LUT in Table II have only 2N-2 possible values depending on the input values. Table III illustrates the new ROM table. LUT size reduction is achieved with the overhead of control circuits such as XOR gates, MUX (multiplexers), and full adders (FA). While the increase in the number of XOR gates is proportional to the input vector length B, the complexities of other control circuits (MUX, FA) increase in proportion to the coefficient word-length as shown in Fig. 5. III.2.4. Distributed Arithmetic Based LUT-Less Architecture Proposed by Yoo and Anderson A recursive LUT reduction to the original DA decreases the LUT size by half at every iteration and eventually the LUT-less DA architecture can be achieved [27]. From Fig. 3, it can be observed that the lower half of LUT (locations whose addresses have a 1 in the MSB) is the same with the sum of the upper half of LUT (locations whose addresses have a 0 in the MSB) and C3 term. Thus, LUT size can be reduced by a factor of 2 with an additional 2x1 MUX and a full adder. After several iterations of the LUT reduction, final LUT-less DA architecture for a 4-tap FIR filter is achieved as shown in Fig. 6. Fig. 5. Block diagram of the LUT-less DA-OBC (DA-MOBC) for a 4-tap FIR filter TABLE III REDUCED SIZE ROM (2N-2 ) WITH DA-MOBC CODING FOR 4-TAP (N =4) FIR FILTER b2 b1 b0 Contents of ROM 0 0 0 0 0 1 0 1 0 0 1 1 - (C2+ C1 + C0 )/2 - (C2+ C1 - C0 )/2 - (C2 - C1 + C0 )/2 - (C2 - C1 - C0 )/2 Fig. 6. LUT-less Architecture for a 4-tap FIR filter proposed by Yoo and Anderson III.2.5. On-Line DA-LUT Architecture for FIR Filters proposed by Eshtawie, Othman The tri-state buffer and a carry look ahead adder (CLA) are the basic digital logic units that are used to construct the on-line LUT DA-LUT Architecture [28] as shown in Fig. 7. Filter coefficients will pass to the CLA only if their buffer enable signal value is 1. Only the needed location contents are calculated whereas, in the DA technique the contents of locations that may not be used when processing the input signal are also computed. Fig. 7. LUT-less Architecture for a 4-tap FIR filter with tri-state buffers and CLA adders
6.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1723 TABLE IV COMPARISON OF VARIOUS ARCHITECTURES FOR A 4 TAP FILTER (N=4). THE SHIFT REGISTER AND THE ADDER/SHIFTER UNITS ARE NOT CONSIDERED SINCE THEY ARE COMMON FOR ALL STRUCTURES. BC REPRESENTS THE COEFFICIENT WORD LENGTH. Logic Functions LUT-based DA (conventional DA) DA-OBC DA-MOBC LUT-less Architecture of Yoo & Anderson On-Line DA-LUT Architecture ROM Size 2N x BC 2N-1 x BC (2N-2 to 2) x BC 0 0 XOR gates 0 N N-1 0 0 2x1 MUX 0 BC BC N x BC 0 Adders 0 0 0 N-1 x BC N-1 CLA’s Tristate Buffer 0 0 0 0 N Adder/Sub 0 0 N x BC 0 0 In DA technique, even if the location content is zero it will be fetched and added to the partial sum, whereas in on-line LUT no addition operation occurs when calculated contents is zero. Hence the execution time for obtaining the filter output is very short. III.2.6. Memory Partitioning and Multiple Memory Bank Algorithms The main drawback of DA based FIR filter is that as the filter size increases, the memory size requirements of the implementation grow exponentially. Memory access time can be a bottleneck for speed of the entire system when the ROM size is very large. A larger LUT can be avoided by partitioning the circuit in to smaller LUTs and to combine their outputs with adders. Several Memory-partitioning and multiple memory bank approaches along with flexible multi-bit data access mechanisms are presented for FIR filtering and inner- product computation in order to reduce the memory-size of DA-based filters [10], [25], [29]-[32]. The N-tap filter is divided into m-smaller filters each having k-input lines such that N= m × k and it is assumed that N is not prime. The total number of clock cycles required for this implementation will be B+log2(m); the additional second term is the number of clock cycles required to implement an adder tree to calculate the sum of the outputs from m LUTS. The decrease in throughput is very less with this implementation when compared with a large LUT required for a high order filter. Hence Eq. (6) is rewritten as: 1 11 0 0 1 11 1 1 0 2 z km- i i z i zk z kB m j i i j j z i zk y n c x c x (15) For example, a 32 tap DA FIR filter would require a large LUT with 232 entries. This problem can be overcome by breaking up the LUT into 8 smaller LUT units with each having 4 input lines. Hence a single large LUT with 232 memory elements is replaced by 8 LUTS each having only 24 =16 memory elements. Fig. 8 shows the implementation of a 4-tap FIR filter based on equation (15) for m=2 and k=2. Fig. 8. Implementation of a 4-tap FIR filter using memory partitioning with m=k=2 TABLE VI COMPARISON OF VARIOUS REQUIREMENTS WITH AND WITHOUT MEMORY-PARTITIONING Memory Variants No. of Address bits Memory size Clock cycles required Without memory partitioning (Full LUT implementation) N 2N B With Memory- partitioning (ROM decomposition) N k m 2 2 N / m k m or m 2 B mlog 0 5 10 15 20 Full LUT Partitioned LUT LUTSize ClockCycles Fig. 9. Comparison of a 4-tap FIR filter (N=4) with and without memory partitioning with m=k=2 with the input word length B=8 III.2.7. Systolic Architectures for DA-Based Implementation of FIR Filters Systolic architectures can result in cost effective, high performance system by exploiting high-level of concurrency using pipelining or parallel processing or both [11]. Novel one- and two-dimensional systolic structures were designed for computation of circular convolution using distributed arithmetic (DA) that resulted in less memory and less area-delay complexity compared with the other DA-based structures for circular convolution [33]. One- and two-dimensional fully pipelined computing structures are presented for area-delay-power-efficient
7.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1724 implementation of FIR filter by systolic decomposition of distributed arithmetic based inner-product computation [34]. A linear array consisting of number of Processing elements (PEs) and an output cell is shown in Fig. 10. Each PE consists of a ROM of 2M words. Each PE reads the content on its ROM at the location specified by the input bit vector during a cycle period. The value read from the ROM is then added to the input available to the PE from its left. During every cycle period, the sum is then transferred as output to its right as shown in Figs. 11. Each output cell contains a shift-register and an adder. It shifts the content of its register left by one position and then adds the available input to the recently shifted content in its register during every cycle period. For high-throughput implementation of FIR filters, a two dimensional systolic array is used as shown in Figs. 12. FPGA realization of FIR filters for high-speed and medium-speed by using modified distributed arithmetic architectures were suggested by Jiafeng Xie et al., which made use of pipelined registers and pipelined shift adder tree [35]. III.2.8. DA Based Architectures for Adaptive FIR Filtering Adaptive filtering DSP algorithms are employed in several hand held mobile devices for applications such as echo cancellation, signal de-noising, and channel equalization. New hardware adaptive filter architecture for very high throughput LMS adaptive filters using distributed arithmetic (DA) has been suggested where building adaptive DA filters requires recalculating the contents of LUTs for each adaptation. By using an auxiliary LUT with special addressing, the efficiency and throughput of DA adaptive filters can be of the same order as fixed DA filters [36], [37]. A new hardware architecture using conjugate distributed arithmetic (CDA) for high throughput hardware implementations of LMS adaptive filters is presented where all possible combination sums of the input signal samples are stored in the LUT and updated at the arrival of every sample using an efficient update procedure [36], [38]. Fig. 10. Linear 1-D systolic array for DA-based implementation of FIR filter Figs. 11. (a) Function of PE, (b) Function of output cell of 1-D systolic array Figs. 12. (a) 2-D systolic array for FIR filter; (b) function of PE; and (c) function of Shift Adder (SA) cell
8.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1725 IV. Conclusion The recent significant researches that are concerned with reducing the overall area-delay-power complexities of memory based realization of FIR filters are presented in this paper. A detailed survey of memory-based implementation of FIR filters using Distributed Arithmetic is also presented stating its merits over direct memory-based implementation of FIR filters. The main goal behind this review is to assist the researchers in the field of Digital signal processing to understand the available methods and adopt the same in various application environments. Many algorithms and architectures have been suggested in the literature to reduce the area and time- complexities of memory-based implementation of FIR filters but many more efficient algorithms and architectures need to be developed to design flexible area-delay-power efficient memory based FIR filters to meet the growing requirements of DSP applications. References [1] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications., NJ: Prentice-Hall, 1996. [2] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [3] G. R. Goslin, “A Guide to Using Field Programmable Gate Arrays (FPGAs) for Application-Specific Digital Signal Processing Performance”, XILINX, 1995. [4] M. Yamada, and A. Nishihara, “High-Speed FIR Digital Filter with CSD Coefficients Implemented on FPGA”, in Proc. IEEE Design Automation Conference, 2001, pp. 7-8. [5] R. I. Hartley, “Subexpression sharing in filters using canonic signed-digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, no. 10, pp. 677–688, Oct. 1996. [6] M. Potkonjak, M. B. Srivastava, and A. Chandrakasan, “Multiple constant multiplications: Efficient and versatile framework and algorithms for exploring common subexpression elimination,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 15, no. 2, pp. 151–165, Feb. 1996. [7] A. G. Dempster and M. D. Macleod, “Generation of signed-digit representations for integer multiplication,” IEEE Signal Process. Lett., vol.11, no. 8, pp. 663–665, Aug. 2004. [8] M. D. Macleod and A. G. Dempster, “Multiplierless FIR filter design algorithms,” IEEE Signal Processing Letters, vol. 12, no. 3, pp. 186–189,Mar. 2005. [9] Douglas L. Maskell, Jussipekka Leiwo and Jagdish C. Patra,”The Design of Multiplierless FIR Filters with a Minimum Adder Step and Reduced Hardware complexity,” in Proc. 2006 IEEE International Symposium on Circuits and Systems, , p. 4,May 2006. [10] H.-R. Lee, C.-W. Jen, and C.-M. Liu, “On the design automation of the memory-based VLSI architectures for FIR filters,” IEEE Trans. Consumer. Electronics, vol. 39, no. 3, pp. 619–629, Aug. 1993. [11] H. T. Kung, “Why systolic architectures?,” IEEE Computer, vol. 15,no. 1, pp. 37–45, Jan. 1982. [12] R.Wyrzykowski and S. Ovramenko, “Flexible systolic architecture for VLSI FIR filters,” Proc. Inst. Elect. Eng.— Comput. Digit. Techniques,vol. 139, no. 2, pp. 170–172, Mar. 1992. [13] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible celllevel systolic architecture for high throughput implementation of 2-D FIR filters,” Proc. Inst. Elect. Eng.—Comput. Digit. Techniques, vol.143, no. 5, pp. 436–439, Nov. 1996. [14] D. F. Chiper, “A new systolic array algorithm for memory-based VLSI array implementation of DCT,” in Proc. Second IEEE Symp. on Computers and Communications, pp. 297–301,July 1997. [15] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, “Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST,”IEEE Trans. Circuits Syst-I: Regular Papers, vol. 52, no. 6, pp. 1125–1137, June 2005. [16] C. Cheng and K. K. Parhi, “A novel systolic array structure for DCT,”IEEE Trans. Circuits Syst-II: Express Briefs, vol. 52, no. 7, pp. 366–369,July 2005. [17] P. K. Meher, J. C. Patra, and M. N. S. Swamy, “New systolic algorithm and array architecture for prime-length discrete sine transform,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54, no. 3, pp. 262–266,Mar. 2007. [18] P. K. Meher and M. N. S. Swamy, “High-throughput memory- based architecture for DHT using a new convolutional formulation,” IEEETrans. Circuits Syst. II: Express Briefs, vol. 54, no. 7, pp. 606–610,July 2007. [19] P. K. Meher, “Low-latency hardware-efficient memory-based design for large-order FIR digital filters”, Sixth International Conference on Information, Communications and Signal Processing(ICICS 2007), Dec. 2007 [20] P. K. Meher, “New approach to LUT implementation and accumulation for memory-based multiplication,” in Proc. 2009 IEEE Int. Symp.Circuits Syst., ISCAS’09, May 2009, pp. 453– 456. [21] P. K. Meher, “New look-up-table optimizations for memory- based multiplication,” in Proc. Int. Symp. Integr. Circuits (ISIC’09), Dec.2009. [22] P. K. Meher, “New approach to lookup table design and memory based realization of FIR digital filter”, IEEE Transactions on circuit and systems-I, Vol.57, NO.3, March 2010. [23] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, “Digital filter for PCM encoded signals,” U.S. Patent 3 777 130, Dec. 4, 1973. [24] A. Peled and B. Liu, “A new hardware realization of digital filters,” IEEE Trans. Acoustic, Speech, Signal Process., vol. 22, no. 6, pp.456–462, Dec. 1974. [25] S. A. White, “Applications of the distributed arithmetic to digital signal processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp. 5–19,Jul. 1989. [26] P. Choi, S.-C. Shin, and J.-G. Chung, “Efficient ROM size reduction for distributed arithmetic,” in Proc. IEEE Int. Symp. Circuits System (ISCAS), May 2000, vol. 2, pp. 61–64. [27] H. Yoo and D. V. Anderson, “Hardware-efficient distributed arithmetic architecture for high-order digital filters,” in Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (ICASSP), Mar. 2005, vol. 5, pp. v/125–v/128. [28] Mohamed A. Eshtawie and Masuri Othman," On-Line DA-LUT Architecture for High-Speed High-Order Digital FIR Filters”, in the tenth IEEE international conference on communication systems, Nov. 2006, Singapore. [29] C.-F. Chen, “Implementing FIR filters with distributed arithmetic,” IEEE Trans. Acoustic., Speech, Signal Process., vol. 33, no. 5, pp.1318–1321, Oct. 1985. [30] K. Nourji and N. Demassieux, “Optimal VLSI architecture for distributed arithmetic-based algorithms,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Apr. 1994, pp. II/509–II/512. [31] S.-S. Jeng, H.-C. Lin, and S.-M. Chang, “FPGA implementation of FIR filter using M-bit parallel distributed arithmetic,” in Proc.2006,IEEE Int. Symp. Circuits Systems (ISCAS), May 2006, p. 4. [32] M. Mehendale, S. D. Sherlekar, and G..Venkatesh “Area-delay trade-off in distributed arithmetic based implementation of FIR filters,” in Proc.10th Int. Conf. VLSI Design, Jan. 1997, pp. 124– 129. [33] P. K. Meher, “Hardware-efficient systolization of DA-based calculation of finite digital convolution,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707–711, Aug. 2006. [34] P. K. Meher, S. Chandrasekaran, and A. Amira, “FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic,”IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017, July 2008.
9.
K. G. Shanthi,
N. Nagarajan Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 7 1726 [35] Jiafeng Xie n, JianjunHe,GuanzhengTan,” FPGA realization of FIR filters for high-speed and medium-speed by using modified distributed arithmetic architectures”, Microelectronics Journal 41, April 2010 pp. 365–370. [36] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, NJ, 2002. [37] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, “LMS adaptive filters using distributed arithmetic for high throughput,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1327–1337, July 2005. [38] Walter Huang, Venkatesh Krishnan, and David V. Anderson,” Conjugate Distributed Arithmetic Adaptive FIR Filters and their Hardware Implementation”, MWSCAS '06,pp.295-299, Circuits and Systems, Volume: 2, 2006. Authors’ information K. G. Shanthi (Corresponding author) completed her B.E in 1996 from Madras university, Chennai and obtained her ME in 2005 from the Government college of technology, Coimbatore. Her major in PG course is VLSI Design. Her field of interest includes design of FPGA based VLSI architectures, VLSI signal processing. She is currently working as Associate professor at R.M.K Engineering College, Chennai. She is currently pursuing her research in the field of VLSI Design. Address: Associate Professor /Department of Electronics & Communication Engg, R.M.K Engineering College, Chennai, Tamilnadu, India .Pin code: 601 206. E-mail: kgs.ece@rmkec.ac.in Nagarajan N. received his B.Tech and M.E. degrees in Electronics Engineering at M.I.T Chennai. He received his PhD in faculty of I.C.E. from Anna University, Chennai. He is currently working as Principal C.I.E.T, Coimbatore. His specialization includes optical, wireless Adhoc and Sensor Networks.
Download now