Lut optimization for distributed arithmetic based block least mean square adaptive filter Lut optimization for distributed arithmetic based block least mean square adaptive filter Lut optimization for distributed arithmetic based block least mean square adaptive filter Lut optimization for distributed arithmetic based block least mean square adaptive filter Lut optimization for distributed arithmetic based block least mean square adaptive filter
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Lut optimization for distributed arithmetic based block least mean square adaptive filter
1. A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
LUT Optimization for Distributed Arithmetic-Based
Block Least Mean Square Adaptive Filter
Abstract:
In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)-
based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra-
iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration
period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block
size 8 and still higher saving for larger block sizes over the conventional design approach. The
proposed architecture of this paper analysis the logic size, area and power consumption using
Xilinx 14.2.
Enhancement of the project:
Existing System:
Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity
hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation
of filter output and weight-increment terms, which constitute most of its hardware resources. The
DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and
weight-increment terms. Few design schemes have been suggested in recent past for efficient
realization of LMS ADF in FPGA.
A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low
adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs,
where a single LUT is used to perform both filtering and weight-updating and a parallel LUT-
update method is used to reduce LUT-update time. Carry-save accumulation is used to further
reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also
been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for
BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMS-
based ADF, we do not find any LUT optimization scheme in the literature specific to BLMS
DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based
BLMS ADF design to find the redundant LUT words which could be shared to minimize
hardware resources, the number of LUT accesses, energy consumption and iteration period.
Disadvantages:
2. A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
The LUT size is large
LUT-update is complex
Proposed System:
Allred et al. have identified the LUT redundancy corresponding to successive iterations of the
DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No
LUT optimization scheme, however, has been proposed to take advantage of redundant LUT
values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant
LUT values belong to different processing cycles and they need to be stored in LUT or outside
LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of
DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in
the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are
created within a processing cycle, which allow the possibility of LUT optimization, where L is
the block size.
Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For
filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The
contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content
is represented by function E(.), which enumerates a sum of 16 possible combination of an input
vector.
3. A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k
+ 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input
argument s i,0 k for 0 ≤ i ≤ 3 of the first column of LU is defined for the kth iteration input-block {x(n) →
x(n − 3)}, where n = k L. {x(n) → x(n − 3)}: input sequence {x(n), x(n − 1), x(n − 2), x(n − 3)}. Gray
color: succeeding LUTs with overlapped input vectors.
4. A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Intra-iteration LUT Sharing
The LUT content depends on the argument (sij
k,p) of the LUT enumeration function E which
does not change during an iteration. We analyze the arguments (sij
k,p) corresponding to one
column of the LU matrix to find the redundant values in the LUTs of one column of LU.
Inter-iteration LUT Reuse
As shown in Fig. 1, The LUT contents of the first (M − 1) columns of LUs of any given iteration
can be reused by the last (M − 1) columns of LUs during the next iteration, which need not be
updated.
Proposed Design Strategy
The entire LUT content needs to be available in the same cycle for the sharing of LUT words.
The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle,
they allow access to only one (or a few in the case of multiported RAM) of the stored LUT
values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based
design.
Based on these facts, we have arrived at the following design strategy to derive an area-delay-
power efficient structure for the DA-based BLMS ADF.
1) The register-based shared LUT is used instead of the conventional RAM-based LUT to
exploit intra-iteration LUT sharing.
2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of
(N/L) columns of the LU matrix is updated in every iteration.
3) A full-parallel design for LUT-update unit is used to generate update values of one LU
column to update its contents in one cycle.
The proposed structure is similar to the structure of at block level. However, the internal
structures of LUT-update block and processing element (PE) of the DA module are different than
that of due to shared LUTs used in the proposed design.
The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DA-
module uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT
sharing property. It requires only (16L − 25) registers instead of 16P L RAM words as
required.The LUT-update unit of the DA-module of the proposed structure computes a set of
(16L−25) values to update LUTs of a PE in one cycle against 16 cycles required.
5. A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size
L, where N = M L.
Advantages:
reduce the LUT-size
reduce LUT-update complexity
Software implementation:
Modelsim
Xilinx ISE