SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
DOI : 10.5121/vlsic.2019.10201 1
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR
SAD COMPUTATION ON FPGA
Jaya Koshta, Kavita Khare and M.K Gupta
Maulana Azad National Institute of Technology, Bhopal
ABSTRACT
Video Compression is very essential to meet the technological demands such as low power, less memory
and fast transfer rate for different range of devices and for various multimedia applications. Video
compression is primarily achieved by Motion Estimation (ME) process in any video encoder which
contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric
in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung
Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder
scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42
% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using
Virtex 7 FPGA.
KEYWORDS
HEVC, motion estimation, sum of absolute difference, parallel prefix adders, Brent Kung Adder.
1. INTRODUCTION
To meet the technological demands such as low power, less memory and fast transfer rate for a
wide range of applications including the growing demand of High Definition(HD)(1080p) to
Ultra-High Definition(UHD)(4K and 8K), resulted in creation of stronger needs for better video
compression efficiency. HEVC or H.265is a video compression standard designed to substantially
improve coding efficiency when compared to its precedent, the Advanced Video Coding (AVC)
or H.264.HEVC is a block-based video compression designed to support higher resolutions and it
can achieve 50% bit rate saving compared to H.264/MPEG-4 AVC for the same video quality
[1][2].This considerable increase in performance is due to the many enhanced techniques and
methodologies that have been introduced in H.265/HEVC. Some of these enhancements are
included in the motion estimation process, which is one the most complex and time consuming
block in video encoding. The objective of the ME unit is to find the best matched block in the
reference (past/future) frame search window (region of interest), for every block of the current
frame such that there constructed frame contributes to the lowest residual information [3].The
bottleneck for the ME system design lies in the implementation of an appropriate SAD
architecture. For block based ME, the Sum of Absolute Difference(SAD) is the generally used
metric which adds up the absolute differences between corresponding elements in a candidate and
reference block in video frames. However with the increase in the coding block size to 64x64 in
HEVC, compared to 16x16 in H.264/AVC, results in greater complexity in ME process. Thus in
HEVC for ME process, the number of SAD computations vastly increases resulting in increase in
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
2
the processing delay, power consumption and hardware complexity. Thus, a pragmatic strategy to
elevate the performance of H.265/HEVC relies on enhancing the performance of the core
component of the motion estimation engine which is the block matching unit.
Different VLSI architectures for SAD computations are available in the literature where there is a
lot of tradeoffs between the speed, power and area during the hardware implementation. Due to
the increasing demand for the portable devices with low power consumption and high
performance, the circuits are optimized according to the applications. In this paper,
implementation of absolute difference circuit on FPGA is proposed to increase speed
performance and to minimize the amount of occupied resources on FPGA for SAD calculation.
2. RELATED WORK
SAD is one of the most popular techniques used for motion estimation in digital video encoding
systems. There are large number of methods available for SAD computation where there are
tradeoffs between speed, power and area during the hardware implementation. Typically, the
SAD computation consists of first computing the absolute difference between corresponding
pixels in current and reference video frame.
The calculation of SAD from the current and reference block is performed using equation (1),
SAD = -------------(1)
Where,
CB - Current Block,
RB –Reference Block,
N X M - block size of current and reference block,
i,j - two dimensional coordinates of block.
Generally, the SAD operation basically consists of first computing the absolute difference |CB(i,j)
– RB(i,j)| and then summing up these in a multi-input addition.
For absolute difference calculation, one method is to detect the smaller operand in the absolute
difference computation |CB−RB| and to subtract it from the larger operand [4],[5].The other
method comprises of complimenting the smaller of the two numbers and then performing
addition of two numbers followed by plus one to compute the absolute difference[6]. To compute
the absolute difference for SAD, in [7] a novel architecture is optimized for realizing efficient
absolute difference circuits in Virtex-5 FPGA devices which uses the 6-input look-up tables
available within the chosen devices family to maximize speed performance and to minimize the
amount of occupied resources. In [8] an improved architecture for efficiently computing the sum
of absolute differences (SAD) on FPGAs is proposed based on a configurable adder/subtractor
implementation in which each adder input can be negated at runtime. The SAD architecture
proposed in this paper provides a significant resource reduction on current FPGAs. In [9] FPGA
design for fast computing of the minimum SAD is proposed. The hardware unit proposed is
intended to augment a general-purpose core and with the use online arithmetic (OLA) it is
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
3
possible to implement a full 16 X 16 macroblock SAD in a single FPGA device and it permits to
speed up the computation by early truncation of the SAD calculation when the involved candidate
is bigger than the current reference SAD. In [10] pipelined SAD architecture for efficient SAD
calculations is proposed where consecutive pipeline stages performs the addition of absolute
differences to obtain SAD of 8 X 8 block. In [11] Different architectures for binary addition were
proposed for SAD computation on FPGA
Table 1 summarises the above mentioned related work on SAD computation. Review work
motivates for a proposal of high speed compact AD circuits on FPGA for SAD calculation. In this
paper, AD circuit on FPGA is proposed to increase speed performance and to minimize the
amount of occupied resources on FPGA for SAD calculation.
Table 1: Related work on SAD computation
3. PROPOSED ARCHITECTURE
Hardware architecture for computing the absolute difference between corresponding pixels in
current and reference video block is proposed. The method used for AD calculation is as used in
[6] where adder and comparator form the basic component as shown in Figure 1.The 8-bit
comparator compares two numbers and returns the 1’s complement of the smaller number and the
larger number as it is.
8-bit Comparator
8-bit Adder
8-bit Adder
8B’1
Reg
Current Pixel Reference Pixel
Figure 1: AD circuit block diagram
Ref. Architecture Advantages Disadvantages
[7] 6-input look-up tables on
Virtex 5
High speed and compact
Optimized for only
Virtex 5
[8] Configurable adder/subtractor
on Virtex 6
Resource reduction on current
FPGAs
SAD 1X2
implemented
[9] Online arithmetic for 16 X 16
SAD on Virtex 2
High speed Large Area
[10]
Pipelined SAD architecture High speed and low power Large area
[11] Different architectures for
binary addition
Low power
Implemented on
FPGA Spartan
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
4
Proposed Architecture: In this architecture:
 Ripple Carry Adders in AD circuit (Figure 1) are replaced by Brent Kung Adder (Type of
Parallel Prefix Adders-PPA).
 8-bit conventional comparator in Figure 1 is replaced by comparator based on modified 1’s
complement principle and conditional sum adder scheme.
These replacements results in reduction in delay and reduces LUTs count used in the circuit.
PPA:
PPA (Parallel Prefix Adder) circuits use a tree network to reduce the latency to O (log2n)where
‘n’ represents the number of bits.PPA employs 3-structural stages as shown in Figure 2. The first
stage at the top is used for computing generate and propagate signals as given in equation (2) and
(3)exactly as in Carry Look ahead Adder (CLA) where a and b represents binary digits.
g=a.b--------------------------------------------(2)
p=a⊕b------------------------------------------(3)
The carry bits are calculated in the second stage. In this stage, to formulate the operation of the
prefix adders, “prefix operator” which is represented as “.” is used. The “prefix operator” function
has two essential properties to keep the computational operation faster. The first one is called the
associative property and the second one is idempotency property. Using the advantages of these
properties, carry-out can be found at a depth proportional to log2(n) [12]. The final summation is
obtained in the last stage.The associative and idempotency properties of "." operator allow carry
output to be computed in a different number of levels or simply depth. Therefore, various
topologies of prefix adders can be designed which are mentioned in the literature and are
inspiring to VLSI designers because of their minimum depth and delay.
The logical structures used in prefix adders scheme consists of black cell, gray cell and white cell
as defined in Figure 3[13].
Calculation of the carries
This part is parallelizable to reduce time
Pre-calculation of pi ,gi terms
Simple adder to generate the sum
Prefix graphs can
be used to describe
the structure that
performs this part
Straight forward
as in the CLA
adder
Figure 2: Parallel Prefix Adder stages
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
5
The logical structure of black cell can propagate and generate signals while the logical structure
of gray cell can only generate signals. The white cell (buffer)is used for loading the signal out. In
parallel prefix scheme, generate and propagate signals can be grouped in multiple ways to get the
same correct carry signals. Based on different methodologies of grouping these signals, different
prefix architectures can be created. The parallel form of obtaining the carry bit makes PPAs
perform addition arithmetic faster. There are many types of PPA such as Brent Kung, Kogge
Stone, Ladner Fisher, Hans Carlson and Knowles[12].
i : k i : k
i : j i : j
k-1: j k-1: j i : j
Black cell Gray cell Buffer
i : j
p k-1:j
g i:k
g i:j
p i:k
g k-1:j
p i:j
g i:k
p i:k
g k-i:j
g i:j
g i:j g i:j
p i:jp i:j
Both propagate & generate Generate only Different load
Figure 3 Cell definitions for parallel-prefix scheme [10]
Out of many types of PPA, Brent Kung Adder(BKA)is used as for 8-bit implementation it has
less delay compared to other types of PPA[12][13].
In BKA propagate signals and generate signals are combined into groups of two by using the
associative property. The first stage in BKA includes computation of generate and propagate
signals corresponding to each pair of bits in a and bas given by equation (2) and (3). The second
stage ie. prefix carry tree includes computation of carries corresponding to each bit. Equations (4)
and (5)below shows how propagate and generate signals are calculated in BKA,
pi:j = pi:k. pk-1:j--------------------------------------(4)
gi:j = gi:k + (pi:k. gk-1:j)-----------------------------(5)
These propagate and generate signals given in equation (4) and (5) are calculated using black cell,
gray cell and white cell as defined in Figure 3. The last stage ie.post processing stage includes
computation of sum bits which is given by the equation (6).
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
6
si = pi⊕ci------------------------------------------(6)
7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
7 : 0 6 : 0 5 : 0 4 : 0 3 : 0 2 : 0 1 : 0 0 : 0
Input
Output
c 3 : 2 = g 3 : 2 + p 3 : 3 g 2 : 2
c 4 : 0 = g 4: 4 + p 4 : 4 g 3 : 0
Figure 4 : 8-bit Brent-Kung adder(BKA) tree [13]
Figure 4 shows 8-bit BKA tree adder which has lower delay as compared to Ripple Carry
Adder(RCA).Also, apart from using BKA, a new efficient comparator based on modified 1’s
complement principle and conditional sum adder scheme is used in this architecture as discussed
below.
Modified Comparator:
The proposed architecture apart from using BKA, also uses comparator (based on modified 1’s
complement principle and conditional sum adder scheme). In this modified 1's complement
scheme, if A>B, bit Cout = 1 and if A≤B bit Cout =0 so the only concern is about carry out bit
information as shown in Figure 5.This method always adds a fixed carry after modification, so if
A ≥ B, bit Comp = 1 and if A < B, bit Comp = 0 [15].For realization of this comparator
conditional sum adder scheme has been used that provides a logarithmic increase in speed for
addition[16].
The principle behind this scheme is to generate two sets of outputs for a given group of k bits
operands. Each set includes k sum bits and an outgoing carry. The one set assumes that the
eventual incoming carry will be zero, while the other assumes that it will be one. Depending upon
the incoming carry correct set of outputs (out of the two sets) is selected without waiting for the
carry to further propagate through the k positions as shown in Figure 6.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
7
Y = A – A = 0 1 0 1 0 1 0 0 - 0 1 0 1 0 1 0 0 = (84 = 84)
0 1 0 1 0 1 0 0 = A
1 0 1 0 1 0 1 1 = 1's compliment of A
Z = B – A = 0 1 0 0 0 0 1 1 - 0 1 0 1 0 1 0 0 = (67 < 84)
0 1 0 0 0 0 1 1 = B
1 0 1 0 1 0 1 1 = 1's compliment of A
X = A - B = 0 1 0 1 0 1 0 0 - 0 1 0 0 0 0 1 1 = (84 > 67)
0 1 0 1 0 1 0 0 = A
1 0 1 1 1 1 0 0 = 1's compliment of B
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
1Comp
1 1 1 1 1 1 1 1Cout 0
+ 1 = Fixed carry-in bit
0Comp 1 1 1 0 1 1 1 1
0 0 0 1 0 0 0 0
A = 0 1 0 1 0 1 0 0 = 84
B = 0 1 0 0 0 0 1 1 = 67
Cout 1
+ 1 = Fixed carry-in bit
1Comp 0 0 0 1 0 0 0 0
Status bit Cout Comp
A>B 1 1
A=B 0 1
A<B 0 0
Figure 5 Modified l's complement method for improved comparator design
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
8
k – bit adder
k – bit adder
Multiplexer
1
CinCout
0
k k
k
Figure 6 Conditional sum adder scheme
However, this scheme is generally not applied to long n-bits operand at the beginning of the add
operation, since it will add delay as the carry propagates through all n positions before making the
selection for correct output. Generally, the given n bits are divided into smaller groups to apply
this conditional adder scheme separately. Thus, the serial carry-propagation inside the separate
groups can be done in parallel, reducing the overall execution time. The outputs of the subgroups
are then combined to generate the output of the final output.
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
B1
A1
B0
A0
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
Comp = 1 ,A>= B,
= 0, A<B
Comp
Stage 0 Stage 1 Stage 2 Stage 3
Figure 7 Modified comparator architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
9
Figure 7 shows comparator architecture using modified 1’s complement and conditional sum
adder design. The 8-bit comparator needs 11(=1+3+7) 2-to-1 multiplexers and eight inverters to
generate complementary values of input B. Originally Carry = AB + AC + BC = AB + (A + B) C
and if C= 0 then Carry =AB or if C=l then Carry = AB + (A+B) =A + B. The sum of MUX gates
of N-bit comparator is,
-----------------------(7)
This architecture reduces delay and also provides significant reduction in resource utilization in
FPGA.
4. SIMULATION AND RESULTS
Simulations of the conventional and proposed architecture for AD circuit implementation have
been carried out using Verilog HDL programming in Xilinx ISE 14.2 platform and implemented
on Virtex7 FPGA. Adequate testing of each design was done to verify correct operation. Figure 8
shows the simulation result of proposed AD circuit.
Figure 8 Simulation result of Absolute Difference Circuit
Figure 9 Graphical representation of delays in proposed architecture with conventional architecture
0
1
2
3
4
5
6
7
Logic
delay(ns)
Routing
delay(ns)
Total
Delay(ns)
Conventional
Proposed
Architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
10
Figure 10 Implementation of proposed AD architecture on FPGA
Architectu
re
Logic
delay(ns)
Routing
delay(ns)
Totalde
lay(ns)
Numbe
r of
slice
LUTs
Convention
al
0.484
(8.0%
logic)
5.673
(92.0%rout
e)
6.157 50
Proposed
Archiecture
0.391
(7.5%
logic)
4.849
(92.5%
route)
5.240 29
Table 2 Comparison of proposed AD architecture architecture with conventional architecture
Figure 9 shows graphical representation of delays in proposed architecture with conventional
architecture and Figure 10 shows implementation and area occupied of proposed AD architecture
on FPGA. Table 2 shows comparison of proposed AD circuit architecture in terms of time delay
and area. Referring to the above table, it can be seen that there is reduction in delay and number
of occupied resources on FPGA in the proposed architecture for AD circuit resulting in improved
performance for SAD computation in motion estimation.
5. CONCLUSIONS
VLSI architecture for SAD computations is proposed in this paper for reducing delay and area
and is implemented on Virtex7 FPGA. The proposed architecture reduces delay and provides
significant reduction in resource utilization in FPGA. Synthesis results shows that proposed
architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to
conventional architecture.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
11
REFERENCES:
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video
coding (HEVC)standard,” IEEE Trans. Circuits Syst. Video Technol., vol.22, no. 12, pp. 1649-1668,
December 2012.
[2] I. Richardson, “HEVC: An introduction to high efficiency video coding,” 2001,
https://www.vcodex.com/h265.html
[3] N. Purnachand, L. N. Alves, and A. Navarro, “Fast Motion Estimation Algorithm for HEVC”, IEEE
International Conference on Consumer Electronics-Berlin (ICCE-Berlin), September 2012.
[4] S.Wong, B. Stougie, and S. Cotofana, “Alternatives in FPGA-based SAD Implementations,” in IEEE
International Conference on Field-Programmable Technology (FPT),IEEE, 2002, pp. 449–452.
[5] S. Vassiliadis, E. A. Hakkennes, J. S. S. M. Wong, and G.G.Pechanek, “The Sum-Absolute-
Difference Motion Estimation Accelerator,” in Euromicro Conference, 1998. Proceedings, 24th.
IEEE, 1998, pp. 559–566.
[6] Ahmed Medhat, Ahmed Shalaby, Mohammed S. Sayed, Maha Elsabrouty and Farhad Mehdipour. “A
Highly Parallel SAD Architecture for Motion Estimation in HEVC Encoder”, Circuits and Systems
(APCCAS), IEEE Asia Pacific Conference, 280 - 283, 2014.
[7] Stefania Perri , Paolo Zicari , Pasquale Corsonello, “Efficient Absolute Difference Circuits in Virtex-
5 FPGAs”, IEEE, 2010.
[8] Martin Kumm, Marco Kleinlein and Peter Zipf, “Efficient Sum of Absolute Difference Computation
on FPGAs”,26thInternational Conference on Field Programmable Logic and Applications.
[9] Joaquin Olivares, Ignacio Benavides and et.al., “Minimum Sum of Absolute Differences
implementation in a single FPGA device”, Dept. of Electro-technics and Electronics, University of
Cordoba, Spain.
[10] P.Jayakrishnan and Harish. M. Kittur, “Pipelined Arch.for Motion Estimation in HEVC Video
Coding”, Indian Journal of Science and Tech,August 2016.
[11] D. V Manjunatha, Pradeep Kumar and R. Karthik, “FPGA Implementation of Sum of Absolute
Difference (SAD) for video applications”, ARPN Journal of Engineering and Applied Sciences, Vol.
12, No. 24, December 2017
[12] Geeta Rani and Sachin Kumar, “Delay analysis of parallel-prefix adders”, International Journal of
Science and Research (IJSR), 3(6):2339-2342, 2014.
[13] Nurdiani Zamhari, Peter Voon, Kuryati Kipli, Kho Lee Chin, Maimun Huja Husin,“Comparison of
Parallel Prefix Adder (PPA)”, Proceedings of the World Congress on Engineering 2012,Vol II.
[14] R. P Brent & H. T. Kung, “A Regular Layout for Parallel Adders”, IEEE Trans. Computers, Vol. C-
31, pp 260-264, 1982.
[15] Shun-Wen Cheng, “A High-Speed Magnitude Comparator with Small Transistor Count”, in
Proceedings of IEEE internationalconference ICECS, 1168 - 1171 Vol.3, Dec 2003.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
12
[16] J.Sklansky, “Conditional-Sum Addition Logic,” IRE Transactions on Electronic Computers, Vol. EC-
9, No. 2, pp. 226-231, June,1960
[17] S. Rehman; R. Young;C. Chatwin;P. Birch, “An FPGA Based Generic Framework for High Speed
Sum of Absolute Difference Implementation”, Europ. Jour Scient. Res., vol.33, no.1, 2009.
[18] Manjunatha, D. V., and G. Sainarayanan. “Low-Power Sum of Absolute Difference Architecture for
Video Coding”, Emerging Research in Electronics, Computer Science and Technology. Springer
India, 2014. 335-341.
[19] LiYufei,Feng Xiubo and Wang Q in, “A High-Performance Low Cost SAD Architecture for Video
Coding”, IEEE Transactions on Consumer Electronics, pp. 535-541, Vol. 53, No. 2, May 2007.
[20] Jarno, Vanne, Eero Aho, Timo D. Hamalainen and Kimmo Kuusilinna, “A High-Performance Sum of
Absolute Difference Implementation Motion Estimation”, IEEE Transactions on Circuits and Systems
for Video Technology, pp. 876-883, Vol. 16, No. 7, 2006.
[21] Elhamzi W., Dubois J., Miteran J “An efficient low-cost FPGA implementation of a configurable
motion estimation for H.264 video coding”, Springer Journal of Real-Time Processing,Vol:9, No:1,
pp. 19–30,2014.
[22] Moorthy T., Ye A, “A scalable architecture for variable block size motion estimation on field-
programmable gate arrays” , IEEE Canadian Conference of Electrical and Computer Engineering
(CCECE),Niagara Falls, May, pp.1303–1308,2008.
[23] Davis P., Sangeetha M. ,“Implementation of Motion Estimation Algorithm for H.265/HEVC”,
International Journal of Advanced Research in Electrical, Electronics
and Instrumentation Engineering. Vol:3,No:3, pp. 122–126,2014.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019

Mais conteúdo relacionado

Mais procurados

IRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable AdderIRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable AdderIRJET Journal
 
Design and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsDesign and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysisArun Joseph
 
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
 
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
 

Mais procurados (17)

IRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable AdderIRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable Adder
 
Id2514581462
Id2514581462Id2514581462
Id2514581462
 
Design and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsDesign and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAs
 
2012
20122012
2012
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
IJET-V3I1P14
IJET-V3I1P14IJET-V3I1P14
IJET-V3I1P14
 
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
 
Gs3511851192
Gs3511851192Gs3511851192
Gs3511851192
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 
05725150
0572515005725150
05725150
 
F1074145
F1074145F1074145
F1074145
 
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
 
G1034853
G1034853G1034853
G1034853
 
SFQ MULTIPLIER
SFQ MULTIPLIERSFQ MULTIPLIER
SFQ MULTIPLIER
 
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
 

Semelhante a EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA

IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
 
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAA LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
 
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGALOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
 
Design and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderDesign and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderIRJET Journal
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Coarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureCoarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureDhiraj Chaudhary
 
Coarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerCoarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
 
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
 
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
 
Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
 
Run time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingRun time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingeSAT Publishing House
 
Design of efficient reversible floating-point arithmetic unit on field progr...
Design of efficient reversible floating-point arithmetic unit on  field progr...Design of efficient reversible floating-point arithmetic unit on  field progr...
Design of efficient reversible floating-point arithmetic unit on field progr...IJECEIAES
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
 

Semelhante a EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA (20)

IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
 
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAA LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGALOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
 
Design and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderDesign and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adder
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
 
C0161018
C0161018C0161018
C0161018
 
C0161018
C0161018C0161018
C0161018
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Coarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureCoarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architecture
 
Coarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerCoarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c router
 
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
 
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
 
Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...
 
Run time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingRun time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration using
 
Design of efficient reversible floating-point arithmetic unit on field progr...
Design of efficient reversible floating-point arithmetic unit on  field progr...Design of efficient reversible floating-point arithmetic unit on  field progr...
Design of efficient reversible floating-point arithmetic unit on field progr...
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 

Último

US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Último (20)

US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA

  • 1. DOI : 10.5121/vlsic.2019.10201 1 EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA Jaya Koshta, Kavita Khare and M.K Gupta Maulana Azad National Institute of Technology, Bhopal ABSTRACT Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA. KEYWORDS HEVC, motion estimation, sum of absolute difference, parallel prefix adders, Brent Kung Adder. 1. INTRODUCTION To meet the technological demands such as low power, less memory and fast transfer rate for a wide range of applications including the growing demand of High Definition(HD)(1080p) to Ultra-High Definition(UHD)(4K and 8K), resulted in creation of stronger needs for better video compression efficiency. HEVC or H.265is a video compression standard designed to substantially improve coding efficiency when compared to its precedent, the Advanced Video Coding (AVC) or H.264.HEVC is a block-based video compression designed to support higher resolutions and it can achieve 50% bit rate saving compared to H.264/MPEG-4 AVC for the same video quality [1][2].This considerable increase in performance is due to the many enhanced techniques and methodologies that have been introduced in H.265/HEVC. Some of these enhancements are included in the motion estimation process, which is one the most complex and time consuming block in video encoding. The objective of the ME unit is to find the best matched block in the reference (past/future) frame search window (region of interest), for every block of the current frame such that there constructed frame contributes to the lowest residual information [3].The bottleneck for the ME system design lies in the implementation of an appropriate SAD architecture. For block based ME, the Sum of Absolute Difference(SAD) is the generally used metric which adds up the absolute differences between corresponding elements in a candidate and reference block in video frames. However with the increase in the coding block size to 64x64 in HEVC, compared to 16x16 in H.264/AVC, results in greater complexity in ME process. Thus in HEVC for ME process, the number of SAD computations vastly increases resulting in increase in International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 2. 2 the processing delay, power consumption and hardware complexity. Thus, a pragmatic strategy to elevate the performance of H.265/HEVC relies on enhancing the performance of the core component of the motion estimation engine which is the block matching unit. Different VLSI architectures for SAD computations are available in the literature where there is a lot of tradeoffs between the speed, power and area during the hardware implementation. Due to the increasing demand for the portable devices with low power consumption and high performance, the circuits are optimized according to the applications. In this paper, implementation of absolute difference circuit on FPGA is proposed to increase speed performance and to minimize the amount of occupied resources on FPGA for SAD calculation. 2. RELATED WORK SAD is one of the most popular techniques used for motion estimation in digital video encoding systems. There are large number of methods available for SAD computation where there are tradeoffs between speed, power and area during the hardware implementation. Typically, the SAD computation consists of first computing the absolute difference between corresponding pixels in current and reference video frame. The calculation of SAD from the current and reference block is performed using equation (1), SAD = -------------(1) Where, CB - Current Block, RB –Reference Block, N X M - block size of current and reference block, i,j - two dimensional coordinates of block. Generally, the SAD operation basically consists of first computing the absolute difference |CB(i,j) – RB(i,j)| and then summing up these in a multi-input addition. For absolute difference calculation, one method is to detect the smaller operand in the absolute difference computation |CB−RB| and to subtract it from the larger operand [4],[5].The other method comprises of complimenting the smaller of the two numbers and then performing addition of two numbers followed by plus one to compute the absolute difference[6]. To compute the absolute difference for SAD, in [7] a novel architecture is optimized for realizing efficient absolute difference circuits in Virtex-5 FPGA devices which uses the 6-input look-up tables available within the chosen devices family to maximize speed performance and to minimize the amount of occupied resources. In [8] an improved architecture for efficiently computing the sum of absolute differences (SAD) on FPGAs is proposed based on a configurable adder/subtractor implementation in which each adder input can be negated at runtime. The SAD architecture proposed in this paper provides a significant resource reduction on current FPGAs. In [9] FPGA design for fast computing of the minimum SAD is proposed. The hardware unit proposed is intended to augment a general-purpose core and with the use online arithmetic (OLA) it is International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 3. 3 possible to implement a full 16 X 16 macroblock SAD in a single FPGA device and it permits to speed up the computation by early truncation of the SAD calculation when the involved candidate is bigger than the current reference SAD. In [10] pipelined SAD architecture for efficient SAD calculations is proposed where consecutive pipeline stages performs the addition of absolute differences to obtain SAD of 8 X 8 block. In [11] Different architectures for binary addition were proposed for SAD computation on FPGA Table 1 summarises the above mentioned related work on SAD computation. Review work motivates for a proposal of high speed compact AD circuits on FPGA for SAD calculation. In this paper, AD circuit on FPGA is proposed to increase speed performance and to minimize the amount of occupied resources on FPGA for SAD calculation. Table 1: Related work on SAD computation 3. PROPOSED ARCHITECTURE Hardware architecture for computing the absolute difference between corresponding pixels in current and reference video block is proposed. The method used for AD calculation is as used in [6] where adder and comparator form the basic component as shown in Figure 1.The 8-bit comparator compares two numbers and returns the 1’s complement of the smaller number and the larger number as it is. 8-bit Comparator 8-bit Adder 8-bit Adder 8B’1 Reg Current Pixel Reference Pixel Figure 1: AD circuit block diagram Ref. Architecture Advantages Disadvantages [7] 6-input look-up tables on Virtex 5 High speed and compact Optimized for only Virtex 5 [8] Configurable adder/subtractor on Virtex 6 Resource reduction on current FPGAs SAD 1X2 implemented [9] Online arithmetic for 16 X 16 SAD on Virtex 2 High speed Large Area [10] Pipelined SAD architecture High speed and low power Large area [11] Different architectures for binary addition Low power Implemented on FPGA Spartan International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 4. 4 Proposed Architecture: In this architecture:  Ripple Carry Adders in AD circuit (Figure 1) are replaced by Brent Kung Adder (Type of Parallel Prefix Adders-PPA).  8-bit conventional comparator in Figure 1 is replaced by comparator based on modified 1’s complement principle and conditional sum adder scheme. These replacements results in reduction in delay and reduces LUTs count used in the circuit. PPA: PPA (Parallel Prefix Adder) circuits use a tree network to reduce the latency to O (log2n)where ‘n’ represents the number of bits.PPA employs 3-structural stages as shown in Figure 2. The first stage at the top is used for computing generate and propagate signals as given in equation (2) and (3)exactly as in Carry Look ahead Adder (CLA) where a and b represents binary digits. g=a.b--------------------------------------------(2) p=a⊕b------------------------------------------(3) The carry bits are calculated in the second stage. In this stage, to formulate the operation of the prefix adders, “prefix operator” which is represented as “.” is used. The “prefix operator” function has two essential properties to keep the computational operation faster. The first one is called the associative property and the second one is idempotency property. Using the advantages of these properties, carry-out can be found at a depth proportional to log2(n) [12]. The final summation is obtained in the last stage.The associative and idempotency properties of "." operator allow carry output to be computed in a different number of levels or simply depth. Therefore, various topologies of prefix adders can be designed which are mentioned in the literature and are inspiring to VLSI designers because of their minimum depth and delay. The logical structures used in prefix adders scheme consists of black cell, gray cell and white cell as defined in Figure 3[13]. Calculation of the carries This part is parallelizable to reduce time Pre-calculation of pi ,gi terms Simple adder to generate the sum Prefix graphs can be used to describe the structure that performs this part Straight forward as in the CLA adder Figure 2: Parallel Prefix Adder stages International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 5. 5 The logical structure of black cell can propagate and generate signals while the logical structure of gray cell can only generate signals. The white cell (buffer)is used for loading the signal out. In parallel prefix scheme, generate and propagate signals can be grouped in multiple ways to get the same correct carry signals. Based on different methodologies of grouping these signals, different prefix architectures can be created. The parallel form of obtaining the carry bit makes PPAs perform addition arithmetic faster. There are many types of PPA such as Brent Kung, Kogge Stone, Ladner Fisher, Hans Carlson and Knowles[12]. i : k i : k i : j i : j k-1: j k-1: j i : j Black cell Gray cell Buffer i : j p k-1:j g i:k g i:j p i:k g k-1:j p i:j g i:k p i:k g k-i:j g i:j g i:j g i:j p i:jp i:j Both propagate & generate Generate only Different load Figure 3 Cell definitions for parallel-prefix scheme [10] Out of many types of PPA, Brent Kung Adder(BKA)is used as for 8-bit implementation it has less delay compared to other types of PPA[12][13]. In BKA propagate signals and generate signals are combined into groups of two by using the associative property. The first stage in BKA includes computation of generate and propagate signals corresponding to each pair of bits in a and bas given by equation (2) and (3). The second stage ie. prefix carry tree includes computation of carries corresponding to each bit. Equations (4) and (5)below shows how propagate and generate signals are calculated in BKA, pi:j = pi:k. pk-1:j--------------------------------------(4) gi:j = gi:k + (pi:k. gk-1:j)-----------------------------(5) These propagate and generate signals given in equation (4) and (5) are calculated using black cell, gray cell and white cell as defined in Figure 3. The last stage ie.post processing stage includes computation of sum bits which is given by the equation (6). International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 6. 6 si = pi⊕ci------------------------------------------(6) 7 6 5 4 3 2 1 0 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 7 : 0 6 : 0 5 : 0 4 : 0 3 : 0 2 : 0 1 : 0 0 : 0 Input Output c 3 : 2 = g 3 : 2 + p 3 : 3 g 2 : 2 c 4 : 0 = g 4: 4 + p 4 : 4 g 3 : 0 Figure 4 : 8-bit Brent-Kung adder(BKA) tree [13] Figure 4 shows 8-bit BKA tree adder which has lower delay as compared to Ripple Carry Adder(RCA).Also, apart from using BKA, a new efficient comparator based on modified 1’s complement principle and conditional sum adder scheme is used in this architecture as discussed below. Modified Comparator: The proposed architecture apart from using BKA, also uses comparator (based on modified 1’s complement principle and conditional sum adder scheme). In this modified 1's complement scheme, if A>B, bit Cout = 1 and if A≤B bit Cout =0 so the only concern is about carry out bit information as shown in Figure 5.This method always adds a fixed carry after modification, so if A ≥ B, bit Comp = 1 and if A < B, bit Comp = 0 [15].For realization of this comparator conditional sum adder scheme has been used that provides a logarithmic increase in speed for addition[16]. The principle behind this scheme is to generate two sets of outputs for a given group of k bits operands. Each set includes k sum bits and an outgoing carry. The one set assumes that the eventual incoming carry will be zero, while the other assumes that it will be one. Depending upon the incoming carry correct set of outputs (out of the two sets) is selected without waiting for the carry to further propagate through the k positions as shown in Figure 6. International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 7. 7 Y = A – A = 0 1 0 1 0 1 0 0 - 0 1 0 1 0 1 0 0 = (84 = 84) 0 1 0 1 0 1 0 0 = A 1 0 1 0 1 0 1 1 = 1's compliment of A Z = B – A = 0 1 0 0 0 0 1 1 - 0 1 0 1 0 1 0 0 = (67 < 84) 0 1 0 0 0 0 1 1 = B 1 0 1 0 1 0 1 1 = 1's compliment of A X = A - B = 0 1 0 1 0 1 0 0 - 0 1 0 0 0 0 1 1 = (84 > 67) 0 1 0 1 0 1 0 0 = A 1 0 1 1 1 1 0 0 = 1's compliment of B 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1Cout 0 + 1 = Fixed carry-in bit 1Comp 1 1 1 1 1 1 1 1Cout 0 + 1 = Fixed carry-in bit 0Comp 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 A = 0 1 0 1 0 1 0 0 = 84 B = 0 1 0 0 0 0 1 1 = 67 Cout 1 + 1 = Fixed carry-in bit 1Comp 0 0 0 1 0 0 0 0 Status bit Cout Comp A>B 1 1 A=B 0 1 A<B 0 0 Figure 5 Modified l's complement method for improved comparator design International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 8. 8 k – bit adder k – bit adder Multiplexer 1 CinCout 0 k k k Figure 6 Conditional sum adder scheme However, this scheme is generally not applied to long n-bits operand at the beginning of the add operation, since it will add delay as the carry propagates through all n positions before making the selection for correct output. Generally, the given n bits are divided into smaller groups to apply this conditional adder scheme separately. Thus, the serial carry-propagation inside the separate groups can be done in parallel, reducing the overall execution time. The outputs of the subgroups are then combined to generate the output of the final output. A7 B7 A6 B6 A5 B5 A4 B4 A3 B3 A2 B2 B1 A1 B0 A0 MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX Comp = 1 ,A>= B, = 0, A<B Comp Stage 0 Stage 1 Stage 2 Stage 3 Figure 7 Modified comparator architecture International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 9. 9 Figure 7 shows comparator architecture using modified 1’s complement and conditional sum adder design. The 8-bit comparator needs 11(=1+3+7) 2-to-1 multiplexers and eight inverters to generate complementary values of input B. Originally Carry = AB + AC + BC = AB + (A + B) C and if C= 0 then Carry =AB or if C=l then Carry = AB + (A+B) =A + B. The sum of MUX gates of N-bit comparator is, -----------------------(7) This architecture reduces delay and also provides significant reduction in resource utilization in FPGA. 4. SIMULATION AND RESULTS Simulations of the conventional and proposed architecture for AD circuit implementation have been carried out using Verilog HDL programming in Xilinx ISE 14.2 platform and implemented on Virtex7 FPGA. Adequate testing of each design was done to verify correct operation. Figure 8 shows the simulation result of proposed AD circuit. Figure 8 Simulation result of Absolute Difference Circuit Figure 9 Graphical representation of delays in proposed architecture with conventional architecture 0 1 2 3 4 5 6 7 Logic delay(ns) Routing delay(ns) Total Delay(ns) Conventional Proposed Architecture International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 10. 10 Figure 10 Implementation of proposed AD architecture on FPGA Architectu re Logic delay(ns) Routing delay(ns) Totalde lay(ns) Numbe r of slice LUTs Convention al 0.484 (8.0% logic) 5.673 (92.0%rout e) 6.157 50 Proposed Archiecture 0.391 (7.5% logic) 4.849 (92.5% route) 5.240 29 Table 2 Comparison of proposed AD architecture architecture with conventional architecture Figure 9 shows graphical representation of delays in proposed architecture with conventional architecture and Figure 10 shows implementation and area occupied of proposed AD architecture on FPGA. Table 2 shows comparison of proposed AD circuit architecture in terms of time delay and area. Referring to the above table, it can be seen that there is reduction in delay and number of occupied resources on FPGA in the proposed architecture for AD circuit resulting in improved performance for SAD computation in motion estimation. 5. CONCLUSIONS VLSI architecture for SAD computations is proposed in this paper for reducing delay and area and is implemented on Virtex7 FPGA. The proposed architecture reduces delay and provides significant reduction in resource utilization in FPGA. Synthesis results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 11. 11 REFERENCES: [1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC)standard,” IEEE Trans. Circuits Syst. Video Technol., vol.22, no. 12, pp. 1649-1668, December 2012. [2] I. Richardson, “HEVC: An introduction to high efficiency video coding,” 2001, https://www.vcodex.com/h265.html [3] N. Purnachand, L. N. Alves, and A. Navarro, “Fast Motion Estimation Algorithm for HEVC”, IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), September 2012. [4] S.Wong, B. Stougie, and S. Cotofana, “Alternatives in FPGA-based SAD Implementations,” in IEEE International Conference on Field-Programmable Technology (FPT),IEEE, 2002, pp. 449–452. [5] S. Vassiliadis, E. A. Hakkennes, J. S. S. M. Wong, and G.G.Pechanek, “The Sum-Absolute- Difference Motion Estimation Accelerator,” in Euromicro Conference, 1998. Proceedings, 24th. IEEE, 1998, pp. 559–566. [6] Ahmed Medhat, Ahmed Shalaby, Mohammed S. Sayed, Maha Elsabrouty and Farhad Mehdipour. “A Highly Parallel SAD Architecture for Motion Estimation in HEVC Encoder”, Circuits and Systems (APCCAS), IEEE Asia Pacific Conference, 280 - 283, 2014. [7] Stefania Perri , Paolo Zicari , Pasquale Corsonello, “Efficient Absolute Difference Circuits in Virtex- 5 FPGAs”, IEEE, 2010. [8] Martin Kumm, Marco Kleinlein and Peter Zipf, “Efficient Sum of Absolute Difference Computation on FPGAs”,26thInternational Conference on Field Programmable Logic and Applications. [9] Joaquin Olivares, Ignacio Benavides and et.al., “Minimum Sum of Absolute Differences implementation in a single FPGA device”, Dept. of Electro-technics and Electronics, University of Cordoba, Spain. [10] P.Jayakrishnan and Harish. M. Kittur, “Pipelined Arch.for Motion Estimation in HEVC Video Coding”, Indian Journal of Science and Tech,August 2016. [11] D. V Manjunatha, Pradeep Kumar and R. Karthik, “FPGA Implementation of Sum of Absolute Difference (SAD) for video applications”, ARPN Journal of Engineering and Applied Sciences, Vol. 12, No. 24, December 2017 [12] Geeta Rani and Sachin Kumar, “Delay analysis of parallel-prefix adders”, International Journal of Science and Research (IJSR), 3(6):2339-2342, 2014. [13] Nurdiani Zamhari, Peter Voon, Kuryati Kipli, Kho Lee Chin, Maimun Huja Husin,“Comparison of Parallel Prefix Adder (PPA)”, Proceedings of the World Congress on Engineering 2012,Vol II. [14] R. P Brent & H. T. Kung, “A Regular Layout for Parallel Adders”, IEEE Trans. Computers, Vol. C- 31, pp 260-264, 1982. [15] Shun-Wen Cheng, “A High-Speed Magnitude Comparator with Small Transistor Count”, in Proceedings of IEEE internationalconference ICECS, 1168 - 1171 Vol.3, Dec 2003. International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019
  • 12. 12 [16] J.Sklansky, “Conditional-Sum Addition Logic,” IRE Transactions on Electronic Computers, Vol. EC- 9, No. 2, pp. 226-231, June,1960 [17] S. Rehman; R. Young;C. Chatwin;P. Birch, “An FPGA Based Generic Framework for High Speed Sum of Absolute Difference Implementation”, Europ. Jour Scient. Res., vol.33, no.1, 2009. [18] Manjunatha, D. V., and G. Sainarayanan. “Low-Power Sum of Absolute Difference Architecture for Video Coding”, Emerging Research in Electronics, Computer Science and Technology. Springer India, 2014. 335-341. [19] LiYufei,Feng Xiubo and Wang Q in, “A High-Performance Low Cost SAD Architecture for Video Coding”, IEEE Transactions on Consumer Electronics, pp. 535-541, Vol. 53, No. 2, May 2007. [20] Jarno, Vanne, Eero Aho, Timo D. Hamalainen and Kimmo Kuusilinna, “A High-Performance Sum of Absolute Difference Implementation Motion Estimation”, IEEE Transactions on Circuits and Systems for Video Technology, pp. 876-883, Vol. 16, No. 7, 2006. [21] Elhamzi W., Dubois J., Miteran J “An efficient low-cost FPGA implementation of a configurable motion estimation for H.264 video coding”, Springer Journal of Real-Time Processing,Vol:9, No:1, pp. 19–30,2014. [22] Moorthy T., Ye A, “A scalable architecture for variable block size motion estimation on field- programmable gate arrays” , IEEE Canadian Conference of Electrical and Computer Engineering (CCECE),Niagara Falls, May, pp.1303–1308,2008. [23] Davis P., Sangeetha M. ,“Implementation of Motion Estimation Algorithm for H.265/HEVC”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. Vol:3,No:3, pp. 122–126,2014. International Journal of VLSI design & Communication Systems (VLSICS) Vol.10, No.2, April 2019