SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
An Efficient DSP-Based Implementation of a Fast
Convolution Approach with non Uniform Partitioning
Andrea Primavera1
, Stefania Cecchi1
, Laura Romoli1
, Francesco Piazza1
and
Marco Moschetti2
1
A3lab - DII - Universit`a Politecnica delle Marche -
Ancona - ITALY
2
Korg Italy - Osimo (AN) - ITALY
5th
European DSP in Education and Research Conference, 13th
and 14th
September, 2012, Amsterdam, Netherlands.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 1/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
1 Fast Convolution
Introduction
State of the art
2 Proposed Algorithm
3 Efficient DSP Implementation
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
4 Results
Case study: artificial reverberator
UPOLS performance
NUPOLS performance
5 Conclusion
Conclusion
Questions
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 2/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
FIR filtering is probably one of the most recurrent operations in DSP. It
is an expensive task especially for long impulse responses (IRs) and low
I/O latency.
LOW LATENCY
CONVOLUTION
COMPUTATIONAL
COST
MINIMIZATION
Problem
In the last 30 years, fast convolution algorithms have been deeply
investigated:
• OverLap and Save (OLS), OverLap and Add (OLA).
• Partitioned OverLap and Save (UPOLS).
• Non Uniform Partitioned OverLap and Save (NUPOLS).
State of the Art
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 3/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
FIR filtering is probably one of the most recurrent operations in DSP. It
is an expensive task especially for long impulse responses (IRs) and low
I/O latency.
LOW LATENCY
CONVOLUTION
COMPUTATIONAL
COST
MINIMIZATION
Problem
In the last 30 years, fast convolution algorithms have been deeply
investigated:
• OverLap and Save (OLS), OverLap and Add (OLA).
• Partitioned OverLap and Save (UPOLS).
• Non Uniform Partitioned OverLap and Save (NUPOLS).
State of the Art
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 3/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
FIR filtering is probably one of the most recurrent operations in DSP. It
is an expensive task especially for long impulse responses (IRs) and low
I/O latency.
We propose an efficient DSP based real-time implementation of a
fast convolution approach with non uniform partitioning (NUPOLS)
taking into account:
• OMAP L137.
• Efficient partitioning.
• Usage of smart DSP expedients.
• Psychoacoustic improvement.
Proposed Solution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 4/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
Assuming a linear time-invariant system, the linear convolution between
the input signal x and the system impulse response h is defined as follows:
y(t) = x(t) ∗ h(t) =
∞
−∞
x(t − τ)h(τ)dτ. (1)
For discrete-time signals and impulse response with a finite length N, it
results:
y[n] = x[n] ∗ h[n] =
N−1
m=0
x(n)h(m − n) (2)
The convolution is performed using equation (2).
LATENCY: Theoretically zero.
COMPUTATIONAL COST: N − 1 additions and N multiplications.
CONSIDERATIONS: It results too expensive for long IR.
Time Domain Convolution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 5/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
Assuming a linear time-invariant system, the linear convolution between
the input signal x and the system impulse response h is defined as follows:
y(t) = x(t) ∗ h(t) =
∞
−∞
x(t − τ)h(τ)dτ. (1)
For discrete-time signals and impulse response with a finite length N, it
results:
y[n] = x[n] ∗ h[n] =
N−1
m=0
x(n)h(m − n) (2)
The convolution is performed using equation (2).
LATENCY: Theoretically zero.
COMPUTATIONAL COST: N − 1 additions and N multiplications.
CONSIDERATIONS: It results too expensive for long IR.
Time Domain Convolution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 5/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
Considering the circular convolution and the DFT property:
y[n] = x[n] N h[n] =
N−1
m=0
x[(n − m)N ]h[m], (3)
x[n] N h[n] ↔ X[k]H[k], (4)
it results that the convolution can be computed in the frequency
domain.
Frequency Domain Convolution
Allowing to convert a circular convolution into a linear convolution.
LATENCY: Equal to K samples with K > N.
COMPUTATIONAL COST: 2LlogL
K + L
K complex multiplications (with
K power of 2 and L = 2K for 50% overlap).
CONSIDERATIONS: I/O latency is too high for long IR.
OverLap and Save (OLS)
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 6/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
Considering the circular convolution and the DFT property:
y[n] = x[n] N h[n] =
N−1
m=0
x[(n − m)N ]h[m], (3)
x[n] N h[n] ↔ X[k]H[k], (4)
it results that the convolution can be computed in the frequency
domain.
Frequency Domain Convolution
Allowing to convert a circular convolution into a linear convolution.
LATENCY: Equal to K samples with K > N.
COMPUTATIONAL COST: 2LlogL
K + L
K complex multiplications (with
K power of 2 and L = 2K for 50% overlap).
CONSIDERATIONS: I/O latency is too high for long IR.
OverLap and Save (OLS)
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 6/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
The IR is partitioned in sections of equal size, then, an OLS is applied
on each sub-filter.
LATENCY: Equal to K samples with K arbitrarily chosen.
COMPUTATIONAL COST: 2LlogL
K + LP
K complex multiplications and
L(P−1)
K additions (with K power of 2, P the number of partitions and
L = 2K for 50% overlap).
CONSIDERATIONS: Computational cost higher than OLS.
Uniform Partitioned OverLap and Save (UPOLS)
The IR is partitioned in sections of increasing size, reducing the com-
putational cost with respect to UPOLS algorithm.
LATENCY: Theoretically zero.
COMPUTATIONAL COST: It depends on the adopted partitioning.
CONSIDERATIONS: It is difficult to find the optimal partitioning.
Non Uniform Partitioned OverLap and Save (NUPOLS)
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 7/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Introduction
State of the art
The IR is partitioned in sections of equal size, then, an OLS is applied
on each sub-filter.
LATENCY: Equal to K samples with K arbitrarily chosen.
COMPUTATIONAL COST: 2LlogL
K + LP
K complex multiplications and
L(P−1)
K additions (with K power of 2, P the number of partitions and
L = 2K for 50% overlap).
CONSIDERATIONS: Computational cost higher than OLS.
Uniform Partitioned OverLap and Save (UPOLS)
The IR is partitioned in sections of increasing size, reducing the com-
putational cost with respect to UPOLS algorithm.
LATENCY: Theoretically zero.
COMPUTATIONAL COST: It depends on the adopted partitioning.
CONSIDERATIONS: It is difficult to find the optimal partitioning.
Non Uniform Partitioned OverLap and Save (NUPOLS)
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 7/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
An efficient DSP based implementation of a low latency fast convolution
is proposed considering the NUPOLS algorithm.
Block diagram of the non uniform partitioned overlap and
save algorithm
g(t): impulse response
x(t): input signal
gi (t) : sub-filter i-th
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 8/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
An efficient DSP based implementation of a low latency fast convolution
is proposed considering the NUPOLS algorithm.
Block diagram of the proposed approach
g(t): impulse response
x(t): input signal
gi (t) : sub-filter i-th
• First UPOLS: characterized by a small block size (i.e., 64 samples)
for selecting the desired input/output latency.
• Second UPOLS: with a larger framesize allows one to minimize the
computational cost required to perform the convolution operation.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 9/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
An efficient DSP based implementation of a low latency fast convolution
is proposed considering the NUPOLS algorithm.
Block diagram of the proposed approach
g(t): impulse response
x(t): input signal
gi (t) : sub-filter i-th
• First UPOLS: characterized by a small block size (i.e., 64 samples)
for selecting the desired input/output latency.
• Second UPOLS: with a larger framesize allows one to minimize the
computational cost required to perform the convolution operation.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 10/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
An efficient DSP based implementation of a low latency fast convolution
is proposed considering the NUPOLS algorithm.
Block diagram of the proposed approach
g(t): impulse response
x(t): input signal
gi (t) : sub-filter i-th
• First UPOLS: characterized by a small block size (i.e., 64 samples)
for selecting the desired input/output latency.
• Second UPOLS: with a larger framesize allows one to minimize the
computational cost required to perform the convolution operation.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 11/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The real time implementation of the proposed approach has been done
through the Texas Instruments Evaluation Board OMAPL137.
Hardware features
Dual-Core System-On-Chip
300MHz ARM926EJ-S RISC MPU
300MHz C674x VLIW Floating Point DSP
128KByte RAM Shared Memory
64MByte SDRAM
Enhanced Direct-Memory-Access Controller 3 (EDMA3)
2 I/O audio channel
32KByte L1P Program RAM/Cache (DSP side)
32KByte L1D Data RAM/Cache (DSP side)
256KByte L2 Unified Mapped RAM/Cache (DSP side)
• Design constraints: Sample frequency 48 kHz, latency 64 samples,
stereo implementation, floating point implementation.
• ARM: used to manage the control parameters.
• DSP: used to perform the DSP operations, exploiting its own
libraries (i.e., DSPLib) and DMA engine.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The real time implementation of the proposed approach has been done
through the Texas Instruments Evaluation Board OMAPL137.
Hardware features
Dual-Core System-On-Chip
300MHz ARM926EJ-S RISC MPU
300MHz C674x VLIW Floating Point DSP
128KByte RAM Shared Memory
64MByte SDRAM
Enhanced Direct-Memory-Access Controller 3 (EDMA3)
2 I/O audio channel
32KByte L1P Program RAM/Cache (DSP side)
32KByte L1D Data RAM/Cache (DSP side)
256KByte L2 Unified Mapped RAM/Cache (DSP side)
• Design constraints: Sample frequency 48 kHz, latency 64 samples,
stereo implementation, floating point implementation.
• ARM: used to manage the control parameters.
• DSP: used to perform the DSP operations, exploiting its own
libraries (i.e., DSPLib) and DMA engine.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The real time implementation of the proposed approach has been done
through the Texas Instruments Evaluation Board OMAPL137.
Hardware features
Dual-Core System-On-Chip
300MHz ARM926EJ-S RISC MPU
300MHz C674x VLIW Floating Point DSP
128KByte RAM Shared Memory
64MByte SDRAM
Enhanced Direct-Memory-Access Controller 3 (EDMA3)
2 I/O audio channel
32KByte L1P Program RAM/Cache (DSP side)
32KByte L1D Data RAM/Cache (DSP side)
256KByte L2 Unified Mapped RAM/Cache (DSP side)
• Design constraints: Sample frequency 48 kHz, latency 64 samples,
stereo implementation, floating point implementation.
• ARM: used to manage the control parameters.
• DSP: used to perform the DSP operations, exploiting its own
libraries (i.e., DSPLib) and DMA engine.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The real time implementation of the proposed approach has been done
through the Texas Instruments Evaluation Board OMAPL137.
Hardware features
Dual-Core System-On-Chip
300MHz ARM926EJ-S RISC MPU
300MHz C674x VLIW Floating Point DSP
128KByte RAM Shared Memory
64MByte SDRAM
Enhanced Direct-Memory-Access Controller 3 (EDMA3)
2 I/O audio channel
32KByte L1P Program RAM/Cache (DSP side)
32KByte L1D Data RAM/Cache (DSP side)
256KByte L2 Unified Mapped RAM/Cache (DSP side)
• Design constraints: Sample frequency 48 kHz, latency 64 samples,
stereo implementation, floating point implementation.
• ARM: used to manage the control parameters.
• DSP: used to perform the DSP operations, exploiting its own
libraries (i.e., DSPLib) and DMA engine.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The UPOLS algorithm implementation can be summarized considering
three main phases:
• Impulse response partitioning
• Input signal partitioning
• Filtering
N
K K K K
h(t)
x(t) ..............x0 x1 x2 xn
L-points
FFT
H1 H2 H3× × ×
.....
+ +
+ +
L-points
IFFT
L-points
IFFT
L-points
IFFT
last
K points
last
K points
last
K points
K K K K
y(t) ..............y0 y1 y2 yn
.......
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 13/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The UPOLS algorithm implementation can be summarized considering
three main phases:
• Impulse response partitioning
- The impulse response h is partitioned in P
blocks hn of length K.
- The filters set Hn is obtained by using a
L-points FFT of each block hn (with
L = 2K, overlap 50%).
- The set of P filters are then stored in a
delay line held in the external memory.
- The operation is performed offline using a
Matlab script.
• Input signal partitioning
• Filtering
N
K K K K
h(t)
x(t) ..............x0 x1 x2 xn
L-points
FFT
H1 H2 H3× × ×
.....
+ +
+ +
L-points
IFFT
L-points
IFFT
L-points
IFFT
last
K points
last
K points
last
K points
K K K K
y(t) ..............y0 y1 y2 yn
.......
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 14/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The UPOLS algorithm implementation can be summarized considering
three main phases:
• Impulse response partitioning
• Input signal partitioning
- The input signal x is partitioned in blocks
of length K.
- The frequency domain block Xn is obtained
performing an L-points FFT to the input
vector composed of the new frame xn and
the previous frame xn−1 (overlap 50%).
- This vector Xn is stored in a delay line held
in the external memory together with the
P − 1 previous blocks.
• Filtering
N
K K K K
h(t)
x(t) ..............x0 x1 x2 xn
L-points
FFT
H1 H2 H3× × ×
.....
+ +
+ +
L-points
IFFT
L-points
IFFT
L-points
IFFT
last
K points
last
K points
last
K points
K K K K
y(t) ..............y0 y1 y2 yn
.......
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 15/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The UPOLS algorithm implementation can be summarized considering
three main phases:
• Impulse response partitioning
• Input signal partitioning
• Filtering
- The output block Yn is obtained through
filtering operations:
Yn =
P−1
i=0
Xn−P+1+i HP−1−i (5)
- The time-domain output signal yn is
composed of the last K samples of the
L-points IFFT of Yn.
N
K K K K
h(t)
x(t) ..............x0 x1 x2 xn
L-points
FFT
H1 H2 H3× × ×
.....
+ +
+ +
L-points
IFFT
L-points
IFFT
L-points
IFFT
last
K points
last
K points
last
K points
K K K K
y(t) ..............y0 y1 y2 yn
.......
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 16/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
Complex multiplications and accesses to external memory data are the
main bottlenecks in fast convolution implementation.
HOW TO SOLVE THESE PROBLEMS?
• NUPOLS algorithm allows one to minimize both the number of
complex multiplications and the memory accesses compared to
the UPOLS approach.
• The DMA engine allows one to parallelize transfers from/into
external memory and processing operations.
Adopted Solution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 17/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
Complex multiplications and accesses to external memory data are the
main bottlenecks in fast convolution implementation.
HOW TO SOLVE THESE PROBLEMS?
• NUPOLS algorithm allows one to minimize both the number of
complex multiplications and the memory accesses compared to
the UPOLS approach.
• The DMA engine allows one to parallelize transfers from/into
external memory and processing operations.
Adopted Solution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 17/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
Parallelization of the transfers from/into external memory (executed by
DMA engine) and processing operations
Read Hn
(Blocking)
Read Xn
(Blocking)
Compute Yn
(i)
Read Hn
(Blocking)
Read Xn+1
(Non Blocking)
Compute Yn
Read Xn
(Blocking)
(ii)
Kernel used for UPOLS algorithms. (i) Basic approach. (ii) Improved approach.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 18/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The workload required for FFT/IFFT computation can be reduced taking
advantage of the stereo implementation and considering the real nature
of the audio signal.
• Two L-points FFTs/IFFTs of real sequences may be calculated
through one FFT/IFFT of a complex sequence.
• The symmetry property of the FFT has be exploited. This
decrease the number of access to the external memory and the
number of frequency multiplications from L to (K + 1) for each
of the P processed frequency block.
FFT Optimization
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 19/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
The workload required for FFT/IFFT computation can be reduced taking
advantage of the stereo implementation and considering the real nature
of the audio signal.
• Two L-points FFTs/IFFTs of real sequences may be calculated
through one FFT/IFFT of a complex sequence.
• The symmetry property of the FFT has be exploited. This
decrease the number of access to the external memory and the
number of frequency multiplications from L to (K + 1) for each
of the P processed frequency block.
FFT Optimization
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 19/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
Psychoacoustic allows one to reduce the number of
complex multiplications and memory accesses.
All the components (frequency bins) overs a certain cut-off frequency
fc (e.g., 18 kHz) are leaved out.
Psychoacoustic Optimization
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 20/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
HOW TO PARALLELIZE THE 2 UPOLS?
In a low latency context multithreaded approach does not guarantee high
performance on the DSP board.
A manual partitioning of the code has been realized aiming to
uniformly distribute the FFT/IFFT operations and the complex
multiplications of both the UPOLS throughout the processing.
Adopted Solution
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 21/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Target
UPOLS implementation
Memory management
FFT/IFFT operations
Psychoacoustic expedients
Final remarks
HOW TO PARALLELIZE THE 2 UPOLS?
The manual partitioning aims to uniformly distribute the FFT/IFFT
operations and the complex multiplications related to the larger POLS
during the K2
K1
iterations necessary to respect the processing constraint.
Iteration Operation Iteration Operation
1 Large FFT 3/3 17 MAC Left Channel
2 MUL Left Channel 18 MAC Left Channel
3 MUL Right Channel 19 MAC Right Channel
4 Large IFFT 1/3 20 MAC Right Channel
5 Large IFFT 2/3 21 MAC Right Channel
6 Large IFFT 3/3 22 MAC Right Channel
7 MAC Left Channel 23 MAC Right Channel
8 MAC Left Channel 24 MAC Right Channel
9 MAC Left Channel 25 MAC Right Channel
10 MAC Left Channel 26 MAC Right Channel
11 MAC Left Channel 27 MAC Right Channel
12 MAC Left Channel 28 MAC Right Channel
13 MAC Left Channel 29 MAC Right Channel
14 MAC Left Channel 30 MAC Right Channel
15 MAC Left Channel 31 Large FFT 1/3
16 MAC Left Channel 32 Large FFT 2/3
Distribution of the UPOLS operations in a NUPOLS implementation with K1 = 64
and K2 = 2048.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 22/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Case study: artificial reverberator
UPOLS performance
NUPOLS performance
Fast convolution could be employed in many different real time audio
applications.
Digital artificial reverberation is the application that really points out
limits of real time FIR filtering.
• Convolutions with long IRs can be performed to simulate large
environments.
• Low input/output latencies are required in musical instruments.
Case Study: Artificial Reverberator
Several tests have been carried out to evaluate the effectiveness of
the proposed approach comparing the required workload of UPOLS
and NUPOLS implementation.
Tests
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 23/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Case study: artificial reverberator
UPOLS performance
NUPOLS performance
UPOLS PERFORMANCE
0.1 0.2 0.3 0.4 0.5
0
20
40
60
80
100
Impulse Response Length [s]
Workload
(a)
(b)
Workload of the Uniform Partitioned Overlap and Save algorithm (K = 64). (a)
Classic implementation. (b) Psychoacoustic approach
• The maximum impulse response length is about 0.55s
(guaranteeing real time performance).
• The approach is not suitable for the simulation of large
reverberating environments in musical instruments.
Considerations
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 24/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Case study: artificial reverberator
UPOLS performance
NUPOLS performance
NUPOLS PERFORMANCE
0 1 2 3 4 5
0
20
40
60
80
100
Impulse Response Length [s]
Workload
(a) (b) (c) (d)
(i)
0 1 2 3 4 5
0
20
40
60
80
100
Impulse Response Length [s]
Workload
(a) (b) (c) (d)
(ii)
0 1 2 3 4 5
0
20
40
60
80
100
Impulse Response Length [s]
Workload
(a) (b) (c) (d)
(iii)
0 1 2 3 4 5
0
20
40
60
80
100
Impulse Response Length [s]
Workload
(a)
K2
= 2048K2
= 512 K
2
= 1024
(iv)
Workload of NUPOLS algorithm with 4 different partitionings ((i) K1 = 64
K2 = 2048, (ii) K1 = 64 K2 = 1024, (iii) K1 = 64 K2 = 512, and (iv) optimal
partitioning). Mean (a) and max (b) workload for classic implementation. Mean (c)
and max (d) workload using psychoacoustic approach.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 25/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Case study: artificial reverberator
UPOLS performance
NUPOLS performance
NUPOLS PERFORMANCE
5 10 15 20 25 30
0
10
20
30
40
50
Processing iteration
Workload
(a)
(b)
(c)
NUPOLS workload as a function of the
processing cycle (IR Length=3.164 sec). (a)
Workload NUPOLS (b) Workload small
UPOLS (K1 = 64), (c) Workload large UPOLS
(K2 = 2048).
Partitioning Internal Memory
Usage
K1 = 64 K2 = 2048 100kB
K1 = 64 K2 = 1024 50kB
K1 = 64 K2 = 512 30kB
• Evident improvement in terms of performance with respect to
the uniform partitioning based approach.
• It is possible to perform a stereo convolution with an impulse
response of length 6s using about 50% of the DSP resources.
Considerations
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 26/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Conclusion
Questions
In conclusion:
• A novel approach for fast convolution computation has been
proposed based on non uniform partitioning of the impulse response.
• Two UPOLSs with uniform partitioning are introduced considering
two different framesize: the desired input/output latency is obtained
through the UPOLS with lower framesize while the other UPOLS is
exploited for decreasing the number of memory accesses and
complex multiplications.
• A DSP-based real time implementation has been performed and
several experimental results have been carried out considering digital
reverberation as a particular case study.
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 27/28
Fast Convolution
Proposed Algorithm
Efficient DSP Implementation
Results
Conclusion
Conclusion
Questions
QUESTIONS?
Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 28/28

Mais conteúdo relacionado

Mais procurados

Implementation Adaptive Noise Canceler
Implementation Adaptive Noise Canceler Implementation Adaptive Noise Canceler
Implementation Adaptive Noise Canceler Akshatha suresh
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Takuma_OKAMOTO
 
Adaptive Noise Cancellation using Multirate Techniques
Adaptive Noise Cancellation using Multirate TechniquesAdaptive Noise Cancellation using Multirate Techniques
Adaptive Noise Cancellation using Multirate TechniquesIJERD Editor
 
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...sipij
 
Performance analysis of adaptive noise canceller for an ecg signal
Performance analysis of adaptive noise canceller for an ecg signalPerformance analysis of adaptive noise canceller for an ecg signal
Performance analysis of adaptive noise canceller for an ecg signalRaj Kumar Thenua
 
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...Raj Kumar Thenua
 
Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...Takuma_OKAMOTO
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filterA. Shamel
 
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingDSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingAmr E. Mohamed
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
DSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesDSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesAmr E. Mohamed
 
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713CSCJournals
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...shaik chand basha
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filtersop205
 

Mais procurados (20)

Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Implementation Adaptive Noise Canceler
Implementation Adaptive Noise Canceler Implementation Adaptive Noise Canceler
Implementation Adaptive Noise Canceler
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
 
Adaptive Noise Cancellation using Multirate Techniques
Adaptive Noise Cancellation using Multirate TechniquesAdaptive Noise Cancellation using Multirate Techniques
Adaptive Noise Cancellation using Multirate Techniques
 
Dct and adaptive filters
Dct and adaptive filtersDct and adaptive filters
Dct and adaptive filters
 
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...
AN ANALYSIS OF THE KALMAN, EXTENDED KALMAN, UNCENTED KALMAN AND PARTICLE FILT...
 
Performance analysis of adaptive noise canceller for an ecg signal
Performance analysis of adaptive noise canceller for an ecg signalPerformance analysis of adaptive noise canceller for an ecg signal
Performance analysis of adaptive noise canceller for an ecg signal
 
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...
 
Dsp ppt madhuri.anudeep
Dsp ppt madhuri.anudeepDsp ppt madhuri.anudeep
Dsp ppt madhuri.anudeep
 
Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingDSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
DSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesDSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course Outlines
 
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
Digital signal processing part2
Digital signal processing part2Digital signal processing part2
Digital signal processing part2
 
Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...Low power vlsi implementation adaptive noise cancellor based on least means s...
Low power vlsi implementation adaptive noise cancellor based on least means s...
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filters
 

Destaque (8)

One sided z transform
One sided z transformOne sided z transform
One sided z transform
 
Z transform
 Z transform Z transform
Z transform
 
Chapter 5 (maths 3)
Chapter 5 (maths 3)Chapter 5 (maths 3)
Chapter 5 (maths 3)
 
Z transfrm ppt
Z transfrm pptZ transfrm ppt
Z transfrm ppt
 
Lti and z transform
Lti and z transformLti and z transform
Lti and z transform
 
Lti system
Lti systemLti system
Lti system
 
Dsp U Lec05 The Z Transform
Dsp U   Lec05 The Z TransformDsp U   Lec05 The Z Transform
Dsp U Lec05 The Z Transform
 
z transforms
z transformsz transforms
z transforms
 

Semelhante a An Efficient DSP Based Implementation of a Fast Convolution Approach with non Uniform Partitioning

Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions ssuser2797e4
 
3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transformWiw Miu
 
Direct digital frequency synthesizer
Direct digital frequency synthesizerDirect digital frequency synthesizer
Direct digital frequency synthesizerVenkat Malai Avichi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
fft using labview
fft using labviewfft using labview
fft using labviewkiranrockz
 
Dct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionDct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionMuhammad Younas
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
 
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Rupesh Sharma
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processinganeetaanu
 
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...theijes
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmeSAT Publishing House
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Digital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryDigital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryIJERD Editor
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxKIRUTHIKAAR2
 

Semelhante a An Efficient DSP Based Implementation of a Fast Convolution Approach with non Uniform Partitioning (20)

Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
 
Res701 research methodology fft1
Res701 research methodology fft1Res701 research methodology fft1
Res701 research methodology fft1
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions
 
3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform3 f3 3_fast_ fourier_transform
3 f3 3_fast_ fourier_transform
 
Direct digital frequency synthesizer
Direct digital frequency synthesizerDirect digital frequency synthesizer
Direct digital frequency synthesizer
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
fft using labview
fft using labviewfft using labview
fft using labview
 
Dct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionDct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decomposition
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
 
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...
Low Peak to Average Power Ratio and High Spectral Efficiency Using Selective ...
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Digital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryDigital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier Recovery
 
D0341015020
D0341015020D0341015020
D0341015020
 
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptx
 

Mais de a3labdsp

System Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic SplinesSystem Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic Splinesa3labdsp
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
 
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...a3labdsp
 
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...a3labdsp
 
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
 
Mixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response EqualizationMixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response Equalizationa3labdsp
 
Audio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound GenerationAudio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound Generationa3labdsp
 
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...a3labdsp
 
Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures a3labdsp
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulationa3labdsp
 
Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizera3labdsp
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberatora3labdsp
 

Mais de a3labdsp (14)

System Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic SplinesSystem Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic Splines
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
 
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
 
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
 
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approach
 
Mixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response EqualizationMixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response Equalization
 
Audio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound GenerationAudio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound Generation
 
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
 
Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulation
 
Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizer
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberator
 

Último

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answersdalebeck957
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 

An Efficient DSP Based Implementation of a Fast Convolution Approach with non Uniform Partitioning

  • 1. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning Andrea Primavera1 , Stefania Cecchi1 , Laura Romoli1 , Francesco Piazza1 and Marco Moschetti2 1 A3lab - DII - Universit`a Politecnica delle Marche - Ancona - ITALY 2 Korg Italy - Osimo (AN) - ITALY 5th European DSP in Education and Research Conference, 13th and 14th September, 2012, Amsterdam, Netherlands. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 1/28
  • 2. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion 1 Fast Convolution Introduction State of the art 2 Proposed Algorithm 3 Efficient DSP Implementation Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks 4 Results Case study: artificial reverberator UPOLS performance NUPOLS performance 5 Conclusion Conclusion Questions Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 2/28
  • 3. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art FIR filtering is probably one of the most recurrent operations in DSP. It is an expensive task especially for long impulse responses (IRs) and low I/O latency. LOW LATENCY CONVOLUTION COMPUTATIONAL COST MINIMIZATION Problem In the last 30 years, fast convolution algorithms have been deeply investigated: • OverLap and Save (OLS), OverLap and Add (OLA). • Partitioned OverLap and Save (UPOLS). • Non Uniform Partitioned OverLap and Save (NUPOLS). State of the Art Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 3/28
  • 4. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art FIR filtering is probably one of the most recurrent operations in DSP. It is an expensive task especially for long impulse responses (IRs) and low I/O latency. LOW LATENCY CONVOLUTION COMPUTATIONAL COST MINIMIZATION Problem In the last 30 years, fast convolution algorithms have been deeply investigated: • OverLap and Save (OLS), OverLap and Add (OLA). • Partitioned OverLap and Save (UPOLS). • Non Uniform Partitioned OverLap and Save (NUPOLS). State of the Art Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 3/28
  • 5. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art FIR filtering is probably one of the most recurrent operations in DSP. It is an expensive task especially for long impulse responses (IRs) and low I/O latency. We propose an efficient DSP based real-time implementation of a fast convolution approach with non uniform partitioning (NUPOLS) taking into account: • OMAP L137. • Efficient partitioning. • Usage of smart DSP expedients. • Psychoacoustic improvement. Proposed Solution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 4/28
  • 6. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art Assuming a linear time-invariant system, the linear convolution between the input signal x and the system impulse response h is defined as follows: y(t) = x(t) ∗ h(t) = ∞ −∞ x(t − τ)h(τ)dτ. (1) For discrete-time signals and impulse response with a finite length N, it results: y[n] = x[n] ∗ h[n] = N−1 m=0 x(n)h(m − n) (2) The convolution is performed using equation (2). LATENCY: Theoretically zero. COMPUTATIONAL COST: N − 1 additions and N multiplications. CONSIDERATIONS: It results too expensive for long IR. Time Domain Convolution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 5/28
  • 7. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art Assuming a linear time-invariant system, the linear convolution between the input signal x and the system impulse response h is defined as follows: y(t) = x(t) ∗ h(t) = ∞ −∞ x(t − τ)h(τ)dτ. (1) For discrete-time signals and impulse response with a finite length N, it results: y[n] = x[n] ∗ h[n] = N−1 m=0 x(n)h(m − n) (2) The convolution is performed using equation (2). LATENCY: Theoretically zero. COMPUTATIONAL COST: N − 1 additions and N multiplications. CONSIDERATIONS: It results too expensive for long IR. Time Domain Convolution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 5/28
  • 8. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art Considering the circular convolution and the DFT property: y[n] = x[n] N h[n] = N−1 m=0 x[(n − m)N ]h[m], (3) x[n] N h[n] ↔ X[k]H[k], (4) it results that the convolution can be computed in the frequency domain. Frequency Domain Convolution Allowing to convert a circular convolution into a linear convolution. LATENCY: Equal to K samples with K > N. COMPUTATIONAL COST: 2LlogL K + L K complex multiplications (with K power of 2 and L = 2K for 50% overlap). CONSIDERATIONS: I/O latency is too high for long IR. OverLap and Save (OLS) Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 6/28
  • 9. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art Considering the circular convolution and the DFT property: y[n] = x[n] N h[n] = N−1 m=0 x[(n − m)N ]h[m], (3) x[n] N h[n] ↔ X[k]H[k], (4) it results that the convolution can be computed in the frequency domain. Frequency Domain Convolution Allowing to convert a circular convolution into a linear convolution. LATENCY: Equal to K samples with K > N. COMPUTATIONAL COST: 2LlogL K + L K complex multiplications (with K power of 2 and L = 2K for 50% overlap). CONSIDERATIONS: I/O latency is too high for long IR. OverLap and Save (OLS) Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 6/28
  • 10. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art The IR is partitioned in sections of equal size, then, an OLS is applied on each sub-filter. LATENCY: Equal to K samples with K arbitrarily chosen. COMPUTATIONAL COST: 2LlogL K + LP K complex multiplications and L(P−1) K additions (with K power of 2, P the number of partitions and L = 2K for 50% overlap). CONSIDERATIONS: Computational cost higher than OLS. Uniform Partitioned OverLap and Save (UPOLS) The IR is partitioned in sections of increasing size, reducing the com- putational cost with respect to UPOLS algorithm. LATENCY: Theoretically zero. COMPUTATIONAL COST: It depends on the adopted partitioning. CONSIDERATIONS: It is difficult to find the optimal partitioning. Non Uniform Partitioned OverLap and Save (NUPOLS) Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 7/28
  • 11. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Introduction State of the art The IR is partitioned in sections of equal size, then, an OLS is applied on each sub-filter. LATENCY: Equal to K samples with K arbitrarily chosen. COMPUTATIONAL COST: 2LlogL K + LP K complex multiplications and L(P−1) K additions (with K power of 2, P the number of partitions and L = 2K for 50% overlap). CONSIDERATIONS: Computational cost higher than OLS. Uniform Partitioned OverLap and Save (UPOLS) The IR is partitioned in sections of increasing size, reducing the com- putational cost with respect to UPOLS algorithm. LATENCY: Theoretically zero. COMPUTATIONAL COST: It depends on the adopted partitioning. CONSIDERATIONS: It is difficult to find the optimal partitioning. Non Uniform Partitioned OverLap and Save (NUPOLS) Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 7/28
  • 12. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion An efficient DSP based implementation of a low latency fast convolution is proposed considering the NUPOLS algorithm. Block diagram of the non uniform partitioned overlap and save algorithm g(t): impulse response x(t): input signal gi (t) : sub-filter i-th Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 8/28
  • 13. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion An efficient DSP based implementation of a low latency fast convolution is proposed considering the NUPOLS algorithm. Block diagram of the proposed approach g(t): impulse response x(t): input signal gi (t) : sub-filter i-th • First UPOLS: characterized by a small block size (i.e., 64 samples) for selecting the desired input/output latency. • Second UPOLS: with a larger framesize allows one to minimize the computational cost required to perform the convolution operation. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 9/28
  • 14. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion An efficient DSP based implementation of a low latency fast convolution is proposed considering the NUPOLS algorithm. Block diagram of the proposed approach g(t): impulse response x(t): input signal gi (t) : sub-filter i-th • First UPOLS: characterized by a small block size (i.e., 64 samples) for selecting the desired input/output latency. • Second UPOLS: with a larger framesize allows one to minimize the computational cost required to perform the convolution operation. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 10/28
  • 15. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion An efficient DSP based implementation of a low latency fast convolution is proposed considering the NUPOLS algorithm. Block diagram of the proposed approach g(t): impulse response x(t): input signal gi (t) : sub-filter i-th • First UPOLS: characterized by a small block size (i.e., 64 samples) for selecting the desired input/output latency. • Second UPOLS: with a larger framesize allows one to minimize the computational cost required to perform the convolution operation. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 11/28
  • 16. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The real time implementation of the proposed approach has been done through the Texas Instruments Evaluation Board OMAPL137. Hardware features Dual-Core System-On-Chip 300MHz ARM926EJ-S RISC MPU 300MHz C674x VLIW Floating Point DSP 128KByte RAM Shared Memory 64MByte SDRAM Enhanced Direct-Memory-Access Controller 3 (EDMA3) 2 I/O audio channel 32KByte L1P Program RAM/Cache (DSP side) 32KByte L1D Data RAM/Cache (DSP side) 256KByte L2 Unified Mapped RAM/Cache (DSP side) • Design constraints: Sample frequency 48 kHz, latency 64 samples, stereo implementation, floating point implementation. • ARM: used to manage the control parameters. • DSP: used to perform the DSP operations, exploiting its own libraries (i.e., DSPLib) and DMA engine. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
  • 17. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The real time implementation of the proposed approach has been done through the Texas Instruments Evaluation Board OMAPL137. Hardware features Dual-Core System-On-Chip 300MHz ARM926EJ-S RISC MPU 300MHz C674x VLIW Floating Point DSP 128KByte RAM Shared Memory 64MByte SDRAM Enhanced Direct-Memory-Access Controller 3 (EDMA3) 2 I/O audio channel 32KByte L1P Program RAM/Cache (DSP side) 32KByte L1D Data RAM/Cache (DSP side) 256KByte L2 Unified Mapped RAM/Cache (DSP side) • Design constraints: Sample frequency 48 kHz, latency 64 samples, stereo implementation, floating point implementation. • ARM: used to manage the control parameters. • DSP: used to perform the DSP operations, exploiting its own libraries (i.e., DSPLib) and DMA engine. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
  • 18. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The real time implementation of the proposed approach has been done through the Texas Instruments Evaluation Board OMAPL137. Hardware features Dual-Core System-On-Chip 300MHz ARM926EJ-S RISC MPU 300MHz C674x VLIW Floating Point DSP 128KByte RAM Shared Memory 64MByte SDRAM Enhanced Direct-Memory-Access Controller 3 (EDMA3) 2 I/O audio channel 32KByte L1P Program RAM/Cache (DSP side) 32KByte L1D Data RAM/Cache (DSP side) 256KByte L2 Unified Mapped RAM/Cache (DSP side) • Design constraints: Sample frequency 48 kHz, latency 64 samples, stereo implementation, floating point implementation. • ARM: used to manage the control parameters. • DSP: used to perform the DSP operations, exploiting its own libraries (i.e., DSPLib) and DMA engine. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
  • 19. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The real time implementation of the proposed approach has been done through the Texas Instruments Evaluation Board OMAPL137. Hardware features Dual-Core System-On-Chip 300MHz ARM926EJ-S RISC MPU 300MHz C674x VLIW Floating Point DSP 128KByte RAM Shared Memory 64MByte SDRAM Enhanced Direct-Memory-Access Controller 3 (EDMA3) 2 I/O audio channel 32KByte L1P Program RAM/Cache (DSP side) 32KByte L1D Data RAM/Cache (DSP side) 256KByte L2 Unified Mapped RAM/Cache (DSP side) • Design constraints: Sample frequency 48 kHz, latency 64 samples, stereo implementation, floating point implementation. • ARM: used to manage the control parameters. • DSP: used to perform the DSP operations, exploiting its own libraries (i.e., DSPLib) and DMA engine. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 12/28
  • 20. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The UPOLS algorithm implementation can be summarized considering three main phases: • Impulse response partitioning • Input signal partitioning • Filtering N K K K K h(t) x(t) ..............x0 x1 x2 xn L-points FFT H1 H2 H3× × × ..... + + + + L-points IFFT L-points IFFT L-points IFFT last K points last K points last K points K K K K y(t) ..............y0 y1 y2 yn ....... Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 13/28
  • 21. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The UPOLS algorithm implementation can be summarized considering three main phases: • Impulse response partitioning - The impulse response h is partitioned in P blocks hn of length K. - The filters set Hn is obtained by using a L-points FFT of each block hn (with L = 2K, overlap 50%). - The set of P filters are then stored in a delay line held in the external memory. - The operation is performed offline using a Matlab script. • Input signal partitioning • Filtering N K K K K h(t) x(t) ..............x0 x1 x2 xn L-points FFT H1 H2 H3× × × ..... + + + + L-points IFFT L-points IFFT L-points IFFT last K points last K points last K points K K K K y(t) ..............y0 y1 y2 yn ....... Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 14/28
  • 22. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The UPOLS algorithm implementation can be summarized considering three main phases: • Impulse response partitioning • Input signal partitioning - The input signal x is partitioned in blocks of length K. - The frequency domain block Xn is obtained performing an L-points FFT to the input vector composed of the new frame xn and the previous frame xn−1 (overlap 50%). - This vector Xn is stored in a delay line held in the external memory together with the P − 1 previous blocks. • Filtering N K K K K h(t) x(t) ..............x0 x1 x2 xn L-points FFT H1 H2 H3× × × ..... + + + + L-points IFFT L-points IFFT L-points IFFT last K points last K points last K points K K K K y(t) ..............y0 y1 y2 yn ....... Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 15/28
  • 23. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The UPOLS algorithm implementation can be summarized considering three main phases: • Impulse response partitioning • Input signal partitioning • Filtering - The output block Yn is obtained through filtering operations: Yn = P−1 i=0 Xn−P+1+i HP−1−i (5) - The time-domain output signal yn is composed of the last K samples of the L-points IFFT of Yn. N K K K K h(t) x(t) ..............x0 x1 x2 xn L-points FFT H1 H2 H3× × × ..... + + + + L-points IFFT L-points IFFT L-points IFFT last K points last K points last K points K K K K y(t) ..............y0 y1 y2 yn ....... Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 16/28
  • 24. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks Complex multiplications and accesses to external memory data are the main bottlenecks in fast convolution implementation. HOW TO SOLVE THESE PROBLEMS? • NUPOLS algorithm allows one to minimize both the number of complex multiplications and the memory accesses compared to the UPOLS approach. • The DMA engine allows one to parallelize transfers from/into external memory and processing operations. Adopted Solution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 17/28
  • 25. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks Complex multiplications and accesses to external memory data are the main bottlenecks in fast convolution implementation. HOW TO SOLVE THESE PROBLEMS? • NUPOLS algorithm allows one to minimize both the number of complex multiplications and the memory accesses compared to the UPOLS approach. • The DMA engine allows one to parallelize transfers from/into external memory and processing operations. Adopted Solution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 17/28
  • 26. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks Parallelization of the transfers from/into external memory (executed by DMA engine) and processing operations Read Hn (Blocking) Read Xn (Blocking) Compute Yn (i) Read Hn (Blocking) Read Xn+1 (Non Blocking) Compute Yn Read Xn (Blocking) (ii) Kernel used for UPOLS algorithms. (i) Basic approach. (ii) Improved approach. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 18/28
  • 27. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The workload required for FFT/IFFT computation can be reduced taking advantage of the stereo implementation and considering the real nature of the audio signal. • Two L-points FFTs/IFFTs of real sequences may be calculated through one FFT/IFFT of a complex sequence. • The symmetry property of the FFT has be exploited. This decrease the number of access to the external memory and the number of frequency multiplications from L to (K + 1) for each of the P processed frequency block. FFT Optimization Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 19/28
  • 28. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks The workload required for FFT/IFFT computation can be reduced taking advantage of the stereo implementation and considering the real nature of the audio signal. • Two L-points FFTs/IFFTs of real sequences may be calculated through one FFT/IFFT of a complex sequence. • The symmetry property of the FFT has be exploited. This decrease the number of access to the external memory and the number of frequency multiplications from L to (K + 1) for each of the P processed frequency block. FFT Optimization Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 19/28
  • 29. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks Psychoacoustic allows one to reduce the number of complex multiplications and memory accesses. All the components (frequency bins) overs a certain cut-off frequency fc (e.g., 18 kHz) are leaved out. Psychoacoustic Optimization Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 20/28
  • 30. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks HOW TO PARALLELIZE THE 2 UPOLS? In a low latency context multithreaded approach does not guarantee high performance on the DSP board. A manual partitioning of the code has been realized aiming to uniformly distribute the FFT/IFFT operations and the complex multiplications of both the UPOLS throughout the processing. Adopted Solution Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 21/28
  • 31. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Target UPOLS implementation Memory management FFT/IFFT operations Psychoacoustic expedients Final remarks HOW TO PARALLELIZE THE 2 UPOLS? The manual partitioning aims to uniformly distribute the FFT/IFFT operations and the complex multiplications related to the larger POLS during the K2 K1 iterations necessary to respect the processing constraint. Iteration Operation Iteration Operation 1 Large FFT 3/3 17 MAC Left Channel 2 MUL Left Channel 18 MAC Left Channel 3 MUL Right Channel 19 MAC Right Channel 4 Large IFFT 1/3 20 MAC Right Channel 5 Large IFFT 2/3 21 MAC Right Channel 6 Large IFFT 3/3 22 MAC Right Channel 7 MAC Left Channel 23 MAC Right Channel 8 MAC Left Channel 24 MAC Right Channel 9 MAC Left Channel 25 MAC Right Channel 10 MAC Left Channel 26 MAC Right Channel 11 MAC Left Channel 27 MAC Right Channel 12 MAC Left Channel 28 MAC Right Channel 13 MAC Left Channel 29 MAC Right Channel 14 MAC Left Channel 30 MAC Right Channel 15 MAC Left Channel 31 Large FFT 1/3 16 MAC Left Channel 32 Large FFT 2/3 Distribution of the UPOLS operations in a NUPOLS implementation with K1 = 64 and K2 = 2048. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 22/28
  • 32. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Case study: artificial reverberator UPOLS performance NUPOLS performance Fast convolution could be employed in many different real time audio applications. Digital artificial reverberation is the application that really points out limits of real time FIR filtering. • Convolutions with long IRs can be performed to simulate large environments. • Low input/output latencies are required in musical instruments. Case Study: Artificial Reverberator Several tests have been carried out to evaluate the effectiveness of the proposed approach comparing the required workload of UPOLS and NUPOLS implementation. Tests Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 23/28
  • 33. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Case study: artificial reverberator UPOLS performance NUPOLS performance UPOLS PERFORMANCE 0.1 0.2 0.3 0.4 0.5 0 20 40 60 80 100 Impulse Response Length [s] Workload (a) (b) Workload of the Uniform Partitioned Overlap and Save algorithm (K = 64). (a) Classic implementation. (b) Psychoacoustic approach • The maximum impulse response length is about 0.55s (guaranteeing real time performance). • The approach is not suitable for the simulation of large reverberating environments in musical instruments. Considerations Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 24/28
  • 34. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Case study: artificial reverberator UPOLS performance NUPOLS performance NUPOLS PERFORMANCE 0 1 2 3 4 5 0 20 40 60 80 100 Impulse Response Length [s] Workload (a) (b) (c) (d) (i) 0 1 2 3 4 5 0 20 40 60 80 100 Impulse Response Length [s] Workload (a) (b) (c) (d) (ii) 0 1 2 3 4 5 0 20 40 60 80 100 Impulse Response Length [s] Workload (a) (b) (c) (d) (iii) 0 1 2 3 4 5 0 20 40 60 80 100 Impulse Response Length [s] Workload (a) K2 = 2048K2 = 512 K 2 = 1024 (iv) Workload of NUPOLS algorithm with 4 different partitionings ((i) K1 = 64 K2 = 2048, (ii) K1 = 64 K2 = 1024, (iii) K1 = 64 K2 = 512, and (iv) optimal partitioning). Mean (a) and max (b) workload for classic implementation. Mean (c) and max (d) workload using psychoacoustic approach. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 25/28
  • 35. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Case study: artificial reverberator UPOLS performance NUPOLS performance NUPOLS PERFORMANCE 5 10 15 20 25 30 0 10 20 30 40 50 Processing iteration Workload (a) (b) (c) NUPOLS workload as a function of the processing cycle (IR Length=3.164 sec). (a) Workload NUPOLS (b) Workload small UPOLS (K1 = 64), (c) Workload large UPOLS (K2 = 2048). Partitioning Internal Memory Usage K1 = 64 K2 = 2048 100kB K1 = 64 K2 = 1024 50kB K1 = 64 K2 = 512 30kB • Evident improvement in terms of performance with respect to the uniform partitioning based approach. • It is possible to perform a stereo convolution with an impulse response of length 6s using about 50% of the DSP resources. Considerations Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 26/28
  • 36. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Conclusion Questions In conclusion: • A novel approach for fast convolution computation has been proposed based on non uniform partitioning of the impulse response. • Two UPOLSs with uniform partitioning are introduced considering two different framesize: the desired input/output latency is obtained through the UPOLS with lower framesize while the other UPOLS is exploited for decreasing the number of memory accesses and complex multiplications. • A DSP-based real time implementation has been performed and several experimental results have been carried out considering digital reverberation as a particular case study. Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 27/28
  • 37. Fast Convolution Proposed Algorithm Efficient DSP Implementation Results Conclusion Conclusion Questions QUESTIONS? Andrea Primavera An Efficient DSP-Based Implementation of a Fast Convolution Approach with non Uniform Partitioning 28/28