Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
72
1. A Low Power Analog Channel Decoder
for Ultra Portable Devices in 65 nm Technology
¨
Reza Meraji, John B. Anderson, Henrik Sj¨ land, Viktor Owall
o
Dept. of Electrical and Information Technology, Box 118, Lund University, Sweden
Email: {reza.meraji, john.anderson, henrik.sjoland, viktor.owall}@eit.lth.se
Abstract—This paper presents the architecture and the cor- analog decoders were realized in hardware using bipolar
responding simulation results for a digitally interfaced ultra- transistors. Since CMOS devices biased in the weak inversion,
low power extended Hamming decoder implemented in analog referred to as the sub-threshold (sub-VT ) region, show similar
integrated circuitry. ST’s 65nm low power CMOS design library
was used to simulate the complete decoder including a serial I-V exponential behavior as bipolar transistors, in recent
input digital interface, an analog decoding core and a serial years they have been used to successfully implement iterative
output digital interface. The simulated bit error rate (BER) decoding algorithms in analog circuitry. Performance gain is
performance of the decoder is presented and compared to gained in terms of area and power consumption for the same
the ideal performance of the Hamming code. Transistor-level data throughput compared to digital implementations. As an
simulation results show that an ultra low power, high throughput
Hamming decoder up to 2.5 Mb/s can be implemented using example, an analog Hamming decoder working in sub-VT was
analog circuitry working in sub-threshold (sub-VT ) region with fabricated in 0.18µ CMOS and the measurement results are
a total power consumption below 40 µW. The decoder consumes reported in [7] and [8].
less than 16 µW when a lower throughput of 250 kb/s is desired. Since the introduction of the analog decoding concept in
1998 a handful of successfully operating analog decoders have
I. I NTRODUCTION
been implemented and reported in the academic literature, but
Error correcting codes (ECCs) have been extensively used so far they have not been able to find their way into real world
in various communication devices in order to provide better applications. Therefore, a key question in analog decoding is
performance for a certain level of transmit power. Although whether it can be applied to real world applications and what
employing ECC could be beneficial in a communication sys- gains can be expected in speed, area and power consumption
tem thanks to the offered coding gain, one should also consider compared to digital decoder implementations.
the underlying trade-offs in the design process. The decoding
algorithms in general are computationally complex and when II. S YSTEM CONSIDERATIONS
implemented in hardware require a noticeable amount of
power to decode the message. Using an ECC block might not Analog decoders are envisioned to reduce the total power
be efficient if the power saved by the provided coding gain is consumption of the receiver in two ways: First, computation
instead dissipated as power consumption of the ECC circuit and processing in the analog domain is much less power
itself [1]. That is even more crucial in modern portable or demanding than digital implementations. Second, they can
distributed wireless communications devices such as medical take the analog soft data and directly produce the decoded
implants and sensor networks. These devices need to be small, digital data. In this scheme there is no need for an analog-
inexpensive, have a reasonably long lifetime and operate on to-digital converter (ADC) which is normally an essential
an extremely limited power budget. block in wireless receivers. That is because analog decoders
Conventionally, it has been the common practice to imple- theoretically can perform as a joint decoder-ADC in the
ment the decoding algorithms using digital circuitry. However, system.
recently there have been several proposals in the literature On the other hand, wireless standardization is a costly
mentioning the advantages of implementing those algorithms and time-consuming process. In order to deploy the available
in continuous domain instead. It has been observed that resources in an efficient way, radio transmitters and receivers
analog computation is much faster and less power consuming are generally expected to conform to industry standards. If
than digital. In certain applications, analog computations can we take a bottom-up strategy, a suitable candidate sub-block
achieve the robustness of digital systems but consume several in a wireless system might not necessarily be one that offers
orders of magnitude less power. significant reduction in power consumption, but instead would
The concept of analog decoding was initially presented in be the one that fits well into the system without demanding
1998 by Hagenauer [2] and Loeliger [3]. The idea was then drastic changes in the currently agreed and well developed
pursued by other researchers to demonstrate the advantage of standard. Real world applications and industry standards typ-
implementing the soft iterative decoding algorithms in analog ically require that the decoder is integrated into a digital
circuitry over digital implementations in terms of silicon area, receiver where its input is provided in the form of quantized
speed and power consumption [4], [5], [6]. Initially, simple soft information and the decoder output is hard decisions.
978-1-4244-8971-8/10$26.00 c 2010 IEEE
2. To the authors’ best knowledge the analog decoders pro-
posed so far mainly concentrate on the decoder core and
do not specifically consider the system perspective. In order
to incorporate the analog decoder into a complete receiver
different alternatives should be investigated. There are two
main options: a) to apply the analog decoding directly on the
received signals and b) to use it after digital demodulation.
In a), synchronization and symbol detection still has to be
performed, most likely in the digital domain, and the complete
receiver has to be redesigned compared to a traditional digital
receiver. In b), a digital to analog conversion (DAC) is required
before the analog decoder introducing overhead. In this paper
we consider approach b) to investigate if the additional com-
plexity and power consumption of the DAC still makes the
analog decoder a feasible alternative.
The best technology for wireless communication would be
processing blocks that provide robustness and programmability
of digital designs but power and speed performance of analog
circuits. These points motivate the investigation of low power
digital-to-analog converters (DACs) and Analog-to-digital con-
verters (ADCs) combined with an analog decoding core. These
circuits are necessary for the analog decoders to interface with
the surrounding digital circuitry and they can also eliminate
the costly and inefficient storage capacitors which are normally
required in fully analog interfaces.
Fig. 1. Generalized Gilbert multiplier network for implementing the sum-
product algorithm shown with corresponding trellis representation
III. A NALOG D ECODING M ODEL
In digital implementations values are represented by dig-
its with limited word-lengths, while for analog computing the required exponential dependency only when operating in
continuous-time currents and/or voltages are used to represent the sub-VT region.
real values. This is intrinsically helpful in soft decision decod- The analog decoders are normally built on a network of
ing algorithms in which the strength of the received signals Gilbert vector multipliers. Despite the slow operation of the
in the coded block play an important role in decoding the transistors in the sub-VT operation, high throughput in the
transmitted message. In order to implement these algorithms analog decoders can be achieved by a highly parallel network
in analog, the probabilities of the received bits to be 0 or of transistors operating in continuous time. The decoding
1, also called “soft information”, are naturally represented process of a coded block starts by loading soft values from
by voltages or currents. Since the most commonly used soft the channel in parallel to the network. The soft data then
decoding algorithms require a significant number of additions stand as voltages or currents in the highly connected networks
and multiplications these tasks have to be realized in analog of the analog multipliers arranged within a certain topology.
circuits. Assuming that the network topology successfully represents
Implementing adders in analog is straightforward and is a soft iterative decoding algorithm, feedback loops in the
done by shorting wires together, assuming the data is rep- network make the levels of the voltages or currents converge
resented by current values. Thus, addition does not require to steady state levels which corresponds to the decoded data.
any power or dedicated area on silicon. Unlike the challenges Sine there is no need for any kind of memory in this scheme,
of implementing a large number of high precision multipliers the settling time needed for convergence of such networks is
in digital VLSI, costly in terms of area and power, the well only limited by intrinsic transistors speed and the parasitic
known Gilbert multiplier can be used to perform the required capacitances of the routing.
multiplications in the analog domain [9]. The transistors in
a Gilbert multiplier should have exponential relation between IV. G ENERIC S UM -P RODUCT A LGORITHM
gate-source voltage and drain current for proper multiplication. A commonly used iterative decoding algorithm is the sum-
In bipolar transistors such exponential behavior between col- product algorithm (SP). As the name suggests, the decoders
lector current and base-emitter voltage readily exists in normal computation is mainly composed of sum-product operations.
operation. For CMOS at high drain currents the current- The basic computations of the algorithm underlying iterative
voltage characteristic is quadratic; however, turning to the decoding can be expressed as
desirable exponential at low current levels in the sub-VT
operating region. In other words, the MOS transistors provide pz (z) = γ px (x)py (y)f (x, y, z), (1)
3. Fig. 2. Extended Hamming (8,4) analog decoder architecture
where px (x) and py (y) are probability distributions such V. D ECODER A RCHITECTURE
that x px (x) = 1, y py (y) = 1 and γ is a scaling factor In analog decoding circuits, because of continuous-time
to ensure z pz (z) = 1. Also, f is a function that takes parallel computations, every block of the received coded data
either 0 or 1 values. In decoding algorithms, f is conveniently must be applied simultaneously to the decoding core. That
illustrated by a trellis diagram. In a trellis representation means the outputs of the computation block are also available
f (x, y, z) = 1 if and only if an edge labeled y between the in parallel. The architecture of the decoder is shown in Fig. 2.
left-hand node x and the right-hand node z exists. It consists of an analog computational core, input and output
interfaces and a digital controller.
An example of a generic sum-product module based on the
A. Analog Decoding Core
Gilbert vector multipliers at transistor-level is shown in Fig.
1 together with the corresponding trellis representation. The The decoder core demonstrates the transistor-level imple-
input of the module is all the probabilities of the random mentation of the SP decoding algorithm of the tail-biting
variables x and y represented by the current vectors Ixi ; trellis of an extended Hamming code. The decoder for the
i ∈ 0, N and Iyj ; j ∈ 0, M respectively. The vector multiplier (8,4) Hamming code receives 8 parallel input samples from
generates all the possible probability products of the two the channel and decodes the 4 information bit estimates in
input variables labeled by currents Izij . The output currents parallel. Every analog input sample represents a soft bit which
representing the input probability products are summed in the is a function of probability of receiving 1 over the probability
connectivity network, by shorting the corresponding wires, or that the received bit is 0. This function is usually expressed
discarded if they are not needed in the sum via connecting the by the log-likelihood ratio (LLR) defined as
wire to V dd. p(r = 1)
LLR = ln (2)
p(r = 0)
A key parameter is the reference current of primitive blocks. The probability of variable x, using the currents on a pair of
Since the probabilities are represented by currents, there has wires, are introduced as the vector (Ix0, Ix1) corresponding to
to be a unique reference current in the network corresponding (p(x = 0), p(x = 1)). The probability of 1 therefore is denoted
to a probability of 1. Then all the real valued probabilities by the unit current Iu; thus, Ix0 + Ix1 = Iu. The chosen
can be defined by a fraction of this current, the so called amount for the unit current must ensure that all transistors are
unit current IU . Therefore, the input probability vectors must biased and stay in the sub-VT region. As one might notice,
satisfy i Ixi = IU and j Iyj = IU . Similarly, the same the integrity of the decoding process is highly dependent on
requirements are valid for the output current vector Izij , which the accuracy of the unit currents used throughout the network.
is the input for the next block in the network. If some of Deviation from the desired current values is common in the
the partial products are discarded in the multiplier, then all fabrication process and introduces mismatch between different
the currents to the next block must be renormalized in order unit current values. However, it has been shown that the error
to satisfy Izij = IU . This renormalization relates to the caused by mismatch in analog decoders has been negligible
scaling factor γ in (1). so far [10].
4. is robust and there is no need for any capacitor. Here, for the
Hamming decoder, an array of 8x6-bit registers are enough
to store a block of 8 soft information data each quantized
by 6 bits. Our preliminary simulations showed that 6 bit
quantization would be enough for the desired bit error rate
(BER) performance. In addition, an array of D flip-flops (DFF)
are placed between the registers and the DACs. The data is first
clocked in into the registers. Then the DFFs are simultaneously
clocked to transfer and hold the data for the DAC inputs. New
data (i.e. the next block of coded data) now can be clocked
into the registers. The DFFs will hold the DAC input words
so the decoding core can work while the new data is clocked
in. Each pair of differential inputs required for the decoding
core could be generated simultaneously in a current-steering
DAC with differential output. The DACs are built from arrays
of current sources directly injecting differential current into
the decoder inputs. The sum of electrical currents from each
DAC should match that of the decoding core; i.e. should match
the reference current Iu. Essentially each bit in the DAC
consists of a number of PMOS current source transistors and
a PMOS differential pair. The outputs of the differential pair
are connected to the two differential inputs of the core. In this
Fig. 3. Trellis representation of the core modules way the current is always on and steered to the core inputs.
C. Output Interface
Since the output of the decoder core is an analog vector
showing the probabilities of the decoded bits to be 0 or 1
by means of electrical currents, there is a need for an output
interface to decide on the value of each bit. For this purpose,
an array of latched current-mode comparators is used. The
comparators are based on a design using a pair of cross-
coupled inverters with a flip-flop latch. The design was first
introduced in [11] and more details can be found in [12].
Every comparator takes a pair of electrical currents rep-
resenting the probabilities of the output bit to be 0 or 1. If
the value of the current representing the probability of 1 is
Fig. 4. Block diagram of the decoder core greater than the other one then the comparator output voltage
reaches V dd; or zero whenever the condition is reversed. Thus,
The block diagram of the core is illustrated in Fig. 3 and the output interface translates the analog probability currents
Fig. 4. It is a direct implementation of the forward-backward to the digital decided bits. Finally, a parallel-to-serial shift
algorithm as a special case of the SP algorithm which is register feeds out the decoded bits in serial.
applied to tail-biting trellis diagram of the Hamming code.
D. Digital Controller
Each trellis section can be realized in analog circuitry as
illustrated in in Fig. 1. A digital timing circuit provides the required signaling for
the whole circuit. Signals provided by this section manage
B. Input Interface receiving the quantized soft information in serial and their
An input interface is needed to take the serially incoming storage in the embedded registers. Additionally, the controller
quantized digital soft information and temporarily store it in provides the required signaling to load the stored soft infor-
a memory. As soon as all the soft information for a block mation and sets the decoding time.
of 8 coded data has been received it needs to be translated
to differential electrical currents and applied in parallel to the VI. S IMULATION RESULTS
analog decoding core. In order to do so, a separate current The ST’s 65nm low-leakage-high-VT (LL-HVT) CMOS
steering DAC is required for each quantized data in the transistor library was used to simulate the analog Hamming
received block of coded data. Compared to the architectures decoder architecture. A wireless link with BPSK modulation
with fully analog interfaces where sample-and-hold blocks are and a memory-less AWGN channel is considered in order to
used to store the received values, in this scheme data storage evaluate the performance of the decoder.
5. TABLE II
A NALOG DECODER CHARACTERISTICS
−1 Uncoded BPSK (Theoretical)
10
Soft−Decision Extended Hamming (Ideal) Technology 65nm CMOS, LL-HVT
Analog Extended Hamming decoder
Analog supply voltage 1.2 V
Bit error rate probability
Digital supply voltage 0.8 V
−2
10 Clock frequency up to 5 MHz, max. coding gain
Decoder throughput up to 2.5 Mb/s, max. coding gain
Energy per decoded bit 16 pJ/b @ 2.5 Mb/s
−3
10 Coding gain @ BER=10−3 1.5 dB
TABLE III
E NERGY COMPARISON FOR A NALOG H AMMING DECODERS
−4
10 Reference Tech. Iu E/b Pcore Ptot
0 1 2 3 4 5 6 7 8
SNR [dB] CMOS [µm] [µW] [µW]
Fig. 5. Bit error rate performance, 2.5 Mb/s
[7] (sim.) 0.18 1 µA 640 pJ N/A 283
TABLE I
P OWER CONSUMPTION OF DIFFERENT SECTIONS OF THE DECODER [6] (meas.) 0.25 100 nA 140 nJ <5 55
Sub-Circuit Power Consumption [µW] [8] (meas.) 0.18 10 µA 102 pJ 150 229
2.5 Mb/s 250 Kb/s this paper 0.065 100 nA 16 pJ 6 40
DACs 5 <2 (sim.)
Analog decoding core 6 6 (rate independent)
for portable devices with limited power budget. Digital I/O
Digital circuitry 28 8 interfaces facilitate using the decoder the same way as an
Output comparators 1 <1 ordinary digital decoder without a need for changes in the
Total 40 16 receiver architecture. Furthermore, the introduced architecture
is quite general and can be applied to more complex analog
In the simulations, the reference current Iu = 100nA was decoders. Finally, it should be mentioned that no significant
chosen which ensures that all transistors in the analog core as effort was made in this paper to reduce the power consumption
well as in the current-steering DACs operate in the sub-VT for the digital part of the circuit. The power consumption of the
region. BER performance of the decoder that resulted from digital circuitry could therefore be further reduced by methods
accurate transistor level simulations is shown in Fig. 5. The like clock gating for the input register arrays.
curve closely follows the ideal performance that is expected ACKNOWLEDGMENT
from the extended Hamming decoder. The BER performance
The authors would like to thank Swedish Foundation for
of an uncoded system with a signal corrupted in an AWGN
Strategic Research (SSF) for funding the Wireless Communi-
channel is provided for comparison.
cation for Ultra Portable Devices at Lund University.
Power consumption estimates and characteristics for the
decoder are summarized in tables I and II respectively. The R EFERENCES
analog circuits and the input DACs use a 1.2 V supply, whereas [1] N. Sadeghi et al., “Analysis of error control code use in ultra-low-power
the digital circuitry operates on 0.8 V. The decoder converges wireless sensor networks,” ISCAS Island of Kos, Greece, 2006.
to a 4-bit codeword in less than 2 µs, which translates to a [2] J. Hagenauer and M. Winklhofer, “The analog decoder,” ISIT98, Cam-
bridge, MA., USA, 1998.
maximum decoding speed of 2.5 Mb/s without loss in the BER [3] H. A. Loeliger, M. Helfenstein, F. Lustenberger, and F. Tark¨ y, “Iterative
o
performance. The complete decoder consumes only about 40 sum-product decoding with analog VLSI,” ISIT98, Cambridge, MA.,
µW at a throughput of 2.5 Mb/s. The required power reduces USA, 1998.
[4] F. Lustenberger et al., “All analog decoder for a binary (18,9,5) tailbiting
to a total of 16 µW at a lower throughput of 250 Kb/s, mostly trellis code,” ESSCIRC, Duisburg, Germany, 1999.
thanks to power savings in the digital circuitry at lower clock [5] M. Moerz et al., “An analog 0.25 µm bicmos tailbiting map decoder,”
frequencies. ISSCC, San Francisco, CA., USA, 2000.
[6] M. Frey et al., “Two experimental analog decoders,” Int. Analog VLSI
The power consumption for the reported analog Hamming Workshop, Bordeaux, France, 2005.
decoders is provided in table III. Studying the table should [7] N. Nguyen et al., “A 0.8v cmos analog decoder for an (8,4,4) extended
be done with caution, since the power consumption heavily hamming code,” ISCAS, Vancouver, Canada, 2004.
[8] C. Winstead et al., “Low-voltage CMOS circuits for analog iterative
depends on different factors such as chosen technology and decoders,” TCAS I: Regular Papers, 2006.
decoder type. Required energy per decoded bit (E/b) is also [9] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond
included as an indicator for a crude comparison. response,” JSSC, 1968.
[10] F. Lustenberger and H. A. Loeliger, “On mismatch errors in analog-vlsi
error correcting decoders,” ISCAS, Sydney, Australia, 2001.
VII. C ONCLUSION [11] S. Yu, “Design and test of error control decoders in analog cmos,” PhD
In this paper we investigated the possibility of using ana- dissertation, Univ. of Utah, Logan, USA, 2004.
[12] C. Winstead, “Analog iterative decoders,” PhD dissertation, University
log decoder in the digital domain. Our preliminary results of Alberta, Edmonton, AB, Canada, 2005.
show that the proposed approach could be a viable option