A Modulo-Based Architecture for Analog-to-Digital Conversion

Or Ordentlich; Gizem Tabak; Pavan Kumar Hanumolu; Andrew C Singer; Gregory W Wornell

doi:10.1109/jstsp.2018.2863189

. Author manuscript; available in PMC: 2021 Mar 18.

Published in final edited form as: IEEE J Sel Top Signal Process. 2018 Aug 6;12(5):825–840. doi: 10.1109/jstsp.2018.2863189

A Modulo-Based Architecture for Analog-to-Digital Conversion

Or Ordentlich ¹, Gizem Tabak ², Pavan Kumar Hanumolu ³, Andrew C Singer ⁴, Gregory W Wornell ⁵

PMCID: PMC7970709 NIHMSID: NIHMS1508776 PMID: 33747333

Abstract

Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-to-digital conversion that aims at minimizing the number of bits per sample at the output of the converter. This is attained by reducing the dynamic range of the analog signal by performing a modulo operation on its amplitude, and then quantizing the result. While the converter itself is universal and agnostic of the statistics of the signal, the decoder operation on the output of the quantizer can exploit the statistical structure in order to unwrap the modulo folding. The performance of this method is shown to approach information theoretical limits, as captured by the rate-distortion function, in various settings. An architecture for modulo analog-to-digital conversion via ring oscillators is suggested, and its merits are numerically demonstrated.

I. INTRODUCTION

Analog-to-digital converters (ADCs) are an essential component in any device that manipulates analog signals in a digital manner. While digital systems have benefited tremendously from scaling, their analog counterparts have become increasingly challenging. Consequently, it is often the case that the ADC constitutes the main bottleneck in a system, both in terms of power consumption and real estate, and in terms of the quality of the system’s output. Developing more efficient ADCs is therefore of great interest [1], [2].

The quality of an ADC is measured via the tradeoff between various parameters such as power consumption, size, cost of manufacturing, and the distortion between the input signal and its digitally-based representation. For the sake of a unified, technology-independent, discussion, it is convenient to restrict the characterization of an ADC quality to three basic parameters: 1) The number of analog samples per second F_S; 2) The number of “raw” output bits R the ADC produces per sample (before subsequent possible lossless compression); 3) The mean squared error (MSE) distortion D between the input signal and a reconstruction that is based on the output of the ADC.

While different applications may require different tradeoffs between F_S, R and D, it is always desirable to design the ADC such that all three parameters are as small as possible.The focus of this work is on the quantization rate R. For a given sampling frequency F_S, and a given target distortion D, our goal is to design ADCs that use the smallest possible number of raw output bits per sample.

The problem of analog-to-digital conversion can be seen as an instance of the lossy source coding/lossy compression problem [3]–[5], as the output of an ADC is a binary sequence, which represents the analog source. A unique key feature of the analog-to-digital conversion problem is that the encoding of the source is carried out in the analog domain, while the decoding procedure is purely digital. Given the limitations of analog processing, it is therefore generally only practical to exploit the source structure at the decoder. Hence, the type of source coding schemes that are suitable for data conversion, are those that approach fundamental limits without requiring knowledge of the source structure at the encoder. In addition, latency and complexity constraints in data conversion, typically preclude the use of schemes other than those based on scalar quantization.

The input signal to an ADC is often known to have structure that could be exploited to reduce the overall bit rate of its representation, R. In our analysis, it will be convenient to express this structure using a stochastic model for the input. Consequently, throughout the paper, we will model the input to the ADC as a stationary stochastic Gaussian process X(t), whose power spectral density (PSD) encapsulates the assumed structure. More generally, we will sometimes also consider the problem of analog-to-digital conversion of a vector X(t) ={X₁(t),…,X_K(t)} of jointly stationary stochastic Gaussian processes, via K parallel ADCs, the input to each one of them is one of the K processes.

Under such stochastic modeling, rate-distortion theory [3] provides the fundamental lower bound F_s · R > R_X(D) for any ADC (and corresponding decoder) that achieves distortion D, where R_X(D) is the rate-distortion function of the process X(t) in bits per second. In general, achieving the rate-distortion function of a source requires using sophisticated high-dimensional quantizers, whereas analog-to-digital conversion is invariably done via scalar uniform quantizers. Thus, achieving this lower bound with ADCs seems overly optimistic. Nevertheless, as we shall see, approaching the ratedistortion bound, up to some inevitable loss due to the one-dimensional nature of the quantization, is sometimes possible by a simple modification of the scalar uniform quantizer, namely, a modulo ADC, followed by a digital decoder that efficiently exploits the source structure.

Instead of sampling and quantizing the process X(t), a modulo ADC samples and quantizes the process [X(t)] mod Modulo ADC Δ, where the modulo size Δ is a design parameter. See Figure 1. Equivalently, a modulo ADC can be thought of as a standard uniform scalar ADC with step-size δ and an arbitrarily large dynamic range/support, but that outputs only the R least significant bits in the description of each sample, where $2^{R} = \frac{Δ}{δ}$ , such that the encoding rate is R. The benefit of applying the modulo operation on X(t) is in reducing its dynamic range/support, which in turn enables a reduction of the number of bits per sample produced by the ADC, without increasing the quantizer’s step-size. This operation, which corresponds to disregarding coarse information about X(t), will otherwise substantially degrade the source reconstruction. However, by properly accounting for the modulo operation and appropriately choosing its parameter Δ, we can unwrap the modulo operation with high probability using previous samples of X(t) and exploiting the (redundant) structure in the signal.

Fig. 1. — A schematic illustration of the modulo ADC.

Following standard system design methodology, in the performance analysis of a modulo ADC, we distinguish between two events: 1) The no-overload event ${\bar{E}}_{OL}$ where the decoder was able to correctly unwrap the modulo operation. We require the MSE distortion, conditioned on this event, to be at most D; 2) The overload event $E_{OL}$ where the decoder fails in unwrapping the modulo operation. We require the probability of this event $Pr (E_{OL})$ to be small, but do not concern ourselves with the MSE distortion conditioned on the occurrence of this event.

A. Our Contributions

This work further develops the modulo ADC framework in three complementary directions, as specified below.

1). Oversampled Modulo ADC:

We show that a modulo ADC can be used as an alternative to ΣΔ converters. A ΣΔ converter is based on oversampling the input process X(t), i.e., sampling above the Nyquist rate, in conjunction with noise-shaping, which pushes much of the energy of the quantization noise to high frequencies, where there is no signal content. See Figure 2. The noise shaping operation requires incorporating an elaborate mixed signal feedback circuit. In particular, the circuit first generates the quantization noise, which necessitates using not only an ADC, but also an accurately-matched digital-to-analog converter (DAC), and then applies an analog filter. The analog nature of the signal processing makes it challenging to use filters of high-orders, which in turn limits performance.

Fig. 2. — Schematic architecture for oversampled ΣΔ converter. {X_n} is obtained by sampling the process X(t).

We develop an alternative architecture (Section III) that shifts much of the complexity to the decoder, whereas the “encoder” is simply a modulo ADC. See Figure 3. The parameter Δ in the modulo ADC, as well as the coefficients of the prediction filter in Figure 3, depend only on the bandwidth B of the input process X(t) and on its variance σ², and not on the other details of its PSD. Similarly, the MSE distortion between the input process and its reconstruction, depends only on B and σ². Thus, the developed architecture is as agnostic as ΣΔ converters to the statistics of the input process. Furthermore, for a flat-spectrum process, the distortion is within a small gap, due to one-dimensionality of the encoder, from the information theoretic limit.

Fig. 3. — Schematic architecture for oversampled modulo ADC. The same architecture, without the low-pass filter (LPF) is also suitable for modulo ADC for a general stationary process. {X_n} is obtained by sampling the X(t).

2). A Phase-Domain Implementation of Modulo ADC via Ring Oscillators:

We develop a modulo ADC implementation that performs the modulo reduction inherently as part of the analog signal acquisition process. As the phase of a periodic waveform is always measured modulo 2π, a natural class of candidates are ADCs that first convert the input voltage into phase, and then quantize that phase. A notable representative within this class, which has been extensively studied in the literature [6], [7], is the ring oscillator ADC.

Consider a closed-loop cascade of N inverters, where N is an odd number, all controlled with the same voltage V_dd = V_in, see Figure 4. This circuit, which will be described in detail in Section IV, oscillates between 2N states, corresponding to the values (‘low’ or ‘high’, represented by ‘0’ or ‘1’) of each of the N inverters. See Figure 5. The oscillation frequency is controlled by V_dd. Due to the oscillating nature of the circuit, if we sample its state every T_S seconds, we cannot tell how many “state changes” occurred between two consecutive samples, but we are able to determine this number modulo 2N. Thus, by setting V_dd to V_dd(t) = g(X(t)), where X(t) is the analog signal to be converted to a digital one and g(·) is a function to be specified, we obtain a modulo ADC. The input-output relation of this modulo ADC is characterized in Section IV, and depends on the response time of the inverters to change in their input, as a function of V_dd.

Fig. 4. — A schematic illustration of a ring oscillator with N = 5 inverters. The states of all N inverter are measured every T_S seconds.

Fig. 5. — An example of the evolution of the states of the inverters in a ring oscillator.

In practice, the modulo operation realized in this way deviates from the ideal characteristic of Figure 1 in a variety of ways. Accordingly, we perform several numerical experiments to evaluate and optimize the performance of an oversampled ring oscillator modulo ADC, and compare it to the performance of an ideal modulo ADC as well as to a ΣΔ converter. The results demonstrate that despite the non-idealities in the ring oscillator implementation, in some regimes, this architecture holds substantial potential for improvement over existing ADCs.

3). Modulo ADCs for Jointly Stationary Processes:

There is great interest in designing efficient ADCs for applications where the number of sensors/antennas observing a particular process is greater than the number of degrees-of-freedom (per time unit) governing its behavior. Thus, there is a redundancy at the receiver that can be exploited. However, as this redundancy can be spread across time and space, traditional ADC architectures, as well as the modulo ADC architectures described in Section II-A and II-B, are insufficient. In this part of the paper, we show how to address this problem via a natural extension of the modulo ADC framework.

As an example we will consider the problem of wireless communication. It is by now well established that using receivers, as well as transmitters, with multiple antennas, dramatically increases the achievable communication rates over wireless channels [8], [9]. However, adding antennas comes with the price of requiring multiple expensive and power hungry RF chains. For traditional ADC architectures, power and cost scale linearly with the number of receive antennas, which motivates an alternative solution.

It is often the case, that the signals observed by the different receive antennas are highly correlated, in time and in space. As an illustrative example, consider the case where the transmitter has one antenna, whereas the receiver has K > 1 antennas. We can model the signal observed at each of the antennas, after sampling, as

Y_{n}^{k} = h_{n}^{k} * X_{n} + Z_{n}^{k}, k = 1, \dots, K, n = 1, \dots, N,

(1)

where {Xⁿ} is the process emitted by the transmitter, ${h_{n}^{k}}$ is the kth channel impulse response, and ${Z_{n}^{k}}$ are independent additive white Gaussian noise (AWGN) processes.

Since all K output processes ${Y_{n}^{1}}, \dots, {Y_{n}^{K}}$ in (1) are noisy and filtered versions of of the same input process, they will typically be highly correlated. However, this correlation may be spread in time (the n-axis) and in space (the k-axis). As an extreme example, assume {X_n} is an iid process, and the filters simply incur different delays, i.e., $h_{n}^{k} = δ_{n - k}$ for k = 1,…,K. While each individual process ${Y_{k}^{n}}$ is white, and each vector $(Y_{n}^{1}, \dots, Y_{n}^{K}), n = 1, \dots, N$ has a scaled identity covariance matrix, the vector process ${{Y_{n}^{1}}, \dots, {Y_{n}^{K}}}$ is highly correlated. One must therefore jointly process the time and the spatial dimensions in order to exploit this correlation.

This phenomenon, where the signals observed by the different ADCs are highly correlated, is not unique to the wireless communication setup, and appears in many other applications, e.g., multi-array radar. It is, however, taken to the extreme in massive MIMO [10], where the number of antennas at the base station is of the order of tens or even hundreds, while the number of users it supports may be substantially fewer.

In Section VI we develop an architecture that uses modulo ADCs, one for each receive antenna, in order to exploit the space-time correlation of the processes. We develop a lowcomplexity decoding algorithm for unwrapping the modulo operations. This algorithm combines the idea of performing prediction in time, of the quantized vector process from its past, with that of integer-forcing source decoding [11], which is used for exploiting spatial correlations in the prediction error vector. See Figure 6. In the limit of small D, the excessrate of the developed analog-to-digital conversion scheme with respect to the information theoretic lower bound, is shown to reduce to that of the integer-forcing source decoder.

Fig. 6. — Schematic architecture for Modulo ADCs for jointly stationary processes.

B. Related Work

The idea of using modulo ADCs/quantizers for exploiting temporal correlations within the input process X(t) towards reducing the quantization rate R, dates back, at least, to [12], where a quantization scheme, called modulo-PCM, was introduced. A decoding scheme for unwrapping the modulo operation, based on maximum-likelihood sequence detection [13], was further proposed in [12], and a heuristic analysis was performed, based on prediction of X(t) from its past, which shows that modulo-PCM can approach the Shannon lower bound under the high-resolution assumptions. In Section II-A, we develop a more complete analysis of modulo quantization, the details of which are required for the application we discuss in Section III.

The architecture from Figure 3 is based on using a prediction filter at the decoder, as a part of the modulo unwrapping process, as was hinted at in [12] (see also [14]). In agreement with the literature on differential pulse-code modulation (DPCM) at the late 1970s (see e.g. [15]), the authors in [12] proposed to design the prediction filter as the optimal one-step predictor of the unquantized process {Xⁿ} from its past. As shown in [16], this design criterion is sub-optimal, and the “correct” design criterion is to take this filter as the one-step predictor of the quantized process from its past. The difference between the two design criteria is significant for oversampled processes, which are the focus of Section III, whose PSD is zero at high frequencies, as in those frequencies the signal-todistortion ratio is zero, no matter how small the quantization noise is. Our analysis in Section III reveals that designing the modulo size Δ and the prediction filter with respect to a quantized flat-spectrum input process, results in a universal system. This means, that this system attains the same distortion D for all input processes that share the same support for the PSD and the same variance.

The use of modulo ADCs/quantizers was also studied by Boufounos in the context of quantization of oversampled signals [17] (see also [18]). In particular, it is shown in [17] that by randomly embedding a measurement vector in ℝ^K onto an M ≫ K dimensional subspace, and using a modulo ADC for quantizing each of the coordinates of the result, one can attain a distortion that decreases exponentially with the oversampling ratio, with high probability. In Section III we consider a similar setup, where an oversampled analog signal, with oversampling ratio L > 1, i.e. F_s is L times greater than the Nyquist frequency, is digitized by a modulo ADC. In the language of [17], this corresponds to embedding X ∈ ℝ^K to an M = LK dimensional space by zero-padding followed by interpolation, which is indeed a linear operation. We show that for this particular “embedding” not only is the decay of MSE distortion exponential in the oversampling ratio, but the attained distortion is information-theoretically optimal, up to a constant loss, which is explicitly characterized, due to the scalar nature of the quantizer. Moreover, under this “embedding”, a simple low-complexity decoding algorithm exists, whereas for the random projection case studied in [17], no computationally efficient decoding algorithm was given. One advantage, on the other hand, of the approach from [17], is that it is applicable to 1-bit modulo ADCs, whereas the performance of the scheme from Section III typically becomes attractive starting from R ≳ 2 bits per sample, due to reasons that will become clearer in the sequel (see discussion around eq. (12)).

Very recently, Bhandari et al. have addressed the question of what is the minimal sampling rate that allows for exact recovery of a bandlimited finite-energy signal, from its moduloreduced sampled version [19] (see also [20]). They have found that a sufficient condition for correct reconstruction is sampling above the Nyquist rate by a factor of 2πe, regardless of the size of the modulo interval. The analysis in [19] did not take quantization noise into account, which corresponds to R = ∞ and D = 0 in our setup.

The merits of a modulo ADC for distributed analog-to-digital conversion of signals correlated in space, but not in time, were demonstrated in [11]. A low-complexity decoding algorithm, for unwrapping the modulo operation, was proposed and its performance was analyzed. It was demonstrated via numerical experiments that the performance is usually quite close to the information theoretic lower bounds (See also [21]). In Section II-B, we summarize the decoding scheme from [11] and the corresponding performance analysis, as those will be needed in Section VI, where we develop a modulo ADC architecture for analog-to-digital conversion of jointly stationary processes. The decoding algorithm for this setup, as well as its performance analysis, is inspired by the ideas and techniques from Sections II-A and II-B.

As modulo reduction can be viewed as a one dimensional deterministic instance of binning, in a broader sense, modulo quantization is closely related to Wyner-Ziv’s source coding with side information setup and to its channel coding dual, which is the Gel’fand-Pinsker setup [22]. In the latter context, we further note that modulo quantization is widely used for communication over intersymbol interference channels [23], [24]. Recently, Hong and Caire [25] considered modulo ADCs as potential candidates for the front end of receivers in a cloud radio access network (CRAN), employing compute-andforward [26] based protocols.

Note that the although the concept of modulo ADC is reminiscent of folding ADCs [27], an important difference is that unlike the latter, the former does not keep track of the number of folds that occurred and, moreover, its functionality does not depend on this number, i.e., it does not saturate for large inputs. In unwrapping the modulo operation at the decoder, the missing information about number of folds is recovered, and we are able to attain the same D with smaller rate.

Finally, another related line of work, is that of compressed sampling, see, e.g., [28]–[30], where the goal is to design universal and efficient ADCs with a small sampling frequency F_S, under the assumption that the input signal occupies only a small portion of its total bandwidth, but the exact support is unknown.

C. Organization

The rest of the paper is organized as follows. In Section II we formally define the modulo ADC and study its performance for stationary scalar input processes, and for random vectors (spatial correlation). Section III develops the use of oversampled modulo ADCs as a substitute for ΣΔ converters, and analyzes the tradeoffs this architecture achieves. In Section IV we introduce an implementation of modulo ADCs via ring oscillators and establish the corresponding input-output mathematical model. Numerical experiments for evaluating the performance of ring oscillators based oversampled modulo ADCs are performed in Section V. Section VI proposes to use parallel modulo ADCs for digitizing jointly stationary processes. The paper concludes in Section VII.

II. Preliminaries on Ideal Modulo ADC

Let Δ ∈ ℝ⁺ be a positive number, and define the modΔ operation as

[x] \mod Δ ≜ x - Δ ⌊ \frac{x}{Δ} ⌋ \in [0, Δ),

where the floor operation ⌊x⌋ returns the largest integer smaller than or equal to x. By definition, we have that for any x,y ∈ ℝ and Δ > 0

[[x] \mod Δ + y] \mod Δ = [x + y] \mod Δ .

(2)

An R-bit modulo ADC with resolution parameter α, or (R,α) mod-ADC, maps a real input x ∈ ℝ to R bits, by computing

{[x]}_{R, α} ≜ [⌊ α x ⌋] \mod 2^{R} \in {0, 1, \dots, 2^{R} - 1},

and producing a binary representation of it. Note that ⌊αx⌋ is the output of an infinite support scalar quantizer with step size1/_,and [x]R,αis a wrapped version of it. In the sequel we will demonstrate that in various scenarios an appropriately designed decoder can recover ⌊αx⌋ from its wrapped version[x]R,α,with high probability, based on temporal/spatial correlations of the ADCs input signal.

We can write [x]_R,α as

{[x]}_{R, α} = [α x + (⌊ α x ⌋ - α x)] \mod 2^{R} = [α x + z] \mod 2^{R} .

(3)

The term z = ⌊αx⌋ − αx ∈ (−1,0] in (3), is the quantization error of a uniform scalar quantizer ⌊αx⌋, and is clearly a deterministic function of x. Nevertheless, throughout this paper we will model z as additive uniform noise Z ~ Unif((−1,0]) statistically independent of x, such that the (R,α) mod-ADC will be modeled as a stochastic channel with input x and output Y, related as

Y = [α x + Z] \mod 2^{R} .

(4)

The modulo additive noise channel model (4) for an (R,α) mod-ADC can be rigorously justified via the use of subtractive dithers. Specifically, we can use a random variable U ∼ Unif([0,1)), statistically independent of x, which we refer to as a dither, and feed $\tilde{x} = x + U / α$ to the (R,α) mod-ADC instead of feeding x. The output of the modulo ADC in this case will be

{[\tilde{x}]}_{R, α} = [α \tilde{x} + (⌊ α \tilde{x} ⌋ - α \tilde{x})] \mod 2^{R} = [α x + U + (⌊ α x + U ⌋ - (α x + U))] \mod 2^{R} .

Subtracting U from ${[\tilde{x}]}_{R, α}$ and reducing the result modulo 2^R, we obtain

[{[\tilde{x}]}_{R, α} - U] \mod 2^{R} = [[α x + U + (⌊ α x + U ⌋ - (α x + U))] \mod 2^{R} - U] \mod 2^{R} = [α x + (⌊ α x + U ⌋ - (α x + U))] \mod 2^{R},

where the last equality follows from the distributive law of Zmodulo (2). Note that for every x ∈ ℝ, the random variable Z = ⌊αx + U⌋ − (αx + U) is uniformly distributed over (−1,0], and is therefore independent of x [31, Lemma 1]. Thus, with subtractive dithers, the additive noise model (4) is exact. We note that even when dithering is not used, under suitable conditions this model predicts performance quite accurately [32].

Although the modulo operation entails loss of information in general, in many situations it is possible to unwrap it, i.e., reconstruct αx + Z from Y = [αx + Z] mod 2^R with high probability.¹ In particular, let

\tilde{Y} = [Y + \frac{1}{2} 2^{R}] \mod 2^{R} - \frac{1}{2} 2^{R},

(5)

and note that conditioned on the no-overload event

E_{\bar{OL}} ≜ {α x + Z \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})},

we have that $\tilde{Y} = α x + Z$ . Thus, if $Pr (E_{\bar{OL}})$ is close to 1, the modulo operation has no effect with high probability. Note that $\Pr (E_{OL}) = \Pr (| α x + Z | > \frac{1}{2} 2^{R})$ is identical to the probability that a standard uniform quantizer with dynamic range (support) 2^R/α is in overload. Thus, when thinking of x as a single observation, it is unclear what the advantages of a modulo ADC are with respect to a traditional uniform ADC. However, as we illustrate below, the modulo ADC allows exploitation of the statistical structure of the acquired signal in a much more efficient manner than the standard ADC.

The following lemma is proved using Chernoff’s bound, and will be useful in the sequel for bounding $Pr (E_{\bar{OL}})$ in various scenarios.

Lemma 1 ( [33, Lemma 4], [34, Theorem 7]): Consider the random variable $Z_{eff} = \sum_{l = 1}^{L} α_{l} Z_{l} + \sum_{k = 1}^{K} β_{k} U_{k}$ where ${Z_{l}}_{l = 1}^{L}$ are iid Gaussian random variables with zero mean and some variance $σ_{z}^{2}, {U_{k}}_{k = 1}^{K}$ are iid random variables, statistically independent of ${Z_{l}}_{l = 1}^{L}$ , uniformly distributed over the interval [−ρ/2,ρ/2) for some ρ > 0,and ${α_{l}}_{l = 1}^{L}$ and ${β_{k}}_{k = 1}^{K}$ are arbitrary real (deterministic) numbers. Let $σ_{eff}^{2} ≜ E (Z_{eff}^{2}) = σ_{z}^{2} \sum_{l = 1}^{L} α_{l}^{2} + \frac{ρ^{2}}{12} \sum_{k = 1}^{K} β_{k}^{2}$ . Then for any τ > 0

\Pr (Z_{eff} > τ) = \Pr (Z_{eff} < - τ) \leq \exp {- \frac{τ^{2}}{2 σ_{eff}^{2}}} .

A. Modulo ADCs for Scalar Stationary Processes

In this subsection we consider the case where an (R,α)mod ADC, as described above, is applied on a scalar stationary process. We develop a corresponding decoder and analyze its performance, including the effects of the choices of α and R.

Let{X_n}be a zero-mean discrete-time stationary Gaussian stochastic process, obtained by sampling a stationary Gaussian process X(t) every T_S seconds. Let

Y_{n} = [α X_{n} + Z_{n}] \mod 2^{R}, n = 1, 2, \dots

be the process obtained by applying an (R,α) mod-ADC on the process {X_n}, where {Z_n} is a Unif((−1,0]) iid noise, and let

V_{n} = α X_{n} + Z_{n}, n = 1, 2, \dots

be its non-folded version. Our goal is to design a decoder that recovers V_n from the outputs of the modulo ADC, {Y_n}, with high probability. To that end, we assume the decoder has access to {V_n−1,…,V_n−p}, an assumption that will be justified in the sequel, and that it knows the auto-covariance function $C_{X} [r] = E [X_{n} X_{n - r}]$ of {Xⁿ}. We apply the following algorithm (See also Figure 3 for a schematic illustration):

Inputs: Y_n,{V_{n − 1},…,V_n−p}, {C_X[r]}, R, α.

Output: Estimates ${\hat{V}}_{n}$ , ${\hat{X}}_{n}$ , for V_n and X_n, respectively.

Algorithm:

1)
Compute the optimal linear MMSE predictor for V_n from its last p samples
${\hat{V}}_{n}^{p} = \sum_{i = 1}^{p} h_{i} \cdot (V_{n - i} + \frac{1}{2}) - \frac{1}{2},$ (6)
where {h_n} is a p-tap prediction filter, computed based on {C_X[r]} and α, and the shift by 1/2 compensates for $E (Z_{n})$ .
2)
Compute
$W_{n} = [Y_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R} {\tilde{W}}_{n} = [W_{n} + \frac{1}{2} 2^{R}] \mod 2^{R} - \frac{1}{2} 2^{R} .$
3)
Output ${\hat{V}}_{n} = {\hat{V}}_{n}^{p} + {\tilde{W}}_{n},$ and ${\hat{X}}_{n} = \frac{{\hat{V}}_{n} + \frac{1}{2}}{α} .$

Remark 1: Note that {h_n} is the p-tap prediction filter for the quantized process {V_n} from its past, rather than for {X_n} from its past. While the loss for using the latter, instead of the former, becomes insignificant when high-resolution assumptions apply, it can be arbitrarily large for over sampled processes, for which high-resolution assumptions never hold [16], [35]. The filter coefficients {h_n} need only be computed once, and can then be used for all times.

The following proposition characterizes the performance of the algorithm above. All logarithms in this paper are taken to base 2, unless stated otherwise.

Proposition 1: Let ${\hat{V}}_{n}^{p}$ , ${\hat{V}}_{n}$ and ${\hat{X}}_{n}$ be as defined in the algorithm above, and let $σ_{p}^{2} = E {(V_{n} - {\hat{V}}_{n}^{p})}^{2}$ . We have that

\Pr (E_{{OL}_{n}}) ≜ \Pr ({\hat{V}}_{n} \neq V_{n}) \leq 2 \exp {- \frac{3}{2} 2^{2 (R - \frac{1}{2} \log (12 σ_{p}^{2}))}},

(7)

and

D = E [{(X_{n} - {\hat{X}}_{n})}^{2} | E_{{\bar{OL}}_{n}}] \leq \frac{1}{12 α^{2} (1 - \Pr (E_{{OL}_{n}}))},

(8)

where the event $E_{{\bar{OL}}_{n}} = {{\hat{V}}_{n} = V_{n}}$ is the complement of the event $E_{{OL}_{n}} = {{\hat{V}}_{n} \neq V_{n}}$ .

Proof: Let $E_{n}^{p} ≜ V_{n} - {\hat{V}}_{n}^{p}$ be the pth order prediction error of the process {V_n}, and note that its variance $σ_{p}^{2} = E {(E_{n}^{p})}^{2}$ is invariant to n due to stationarity. We have that

W_{n} = [Y_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [[V_{n}] \mod 2^{R} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [V_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [E_{n}^{p}] \mod 2^{R},

(9)

where equation (9) follows from the modulo distributive law (2), and constitutes the key advantage of the modulo operation for exploiting temporal correlations. Note that ${\tilde{W}}_{n} \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})$ is a cyclically shifted version of W_n ∈ [0,2^R), as in (5). Therefore, conditioned on the event

E_{{\bar{OL}}_{n}} = {| E_{n}^{p} | < \frac{1}{2} 2^{R}}

we have that ${\tilde{W}}_{n} = E_{n}^{p}$ .

Note that $E_{n}^{p}$ is a zero-mean linear combination of statistically independent Gaussian and uniform random variables, such that Lemma 1 applies, and we have that

\Pr (E_{{OL}_{n}}) ≜ \Pr ({\tilde{W}}_{n} \neq E_{n}^{p}) = \Pr (| E_{n}^{P} | > \frac{1}{2} 2^{R}) \leq 2 \exp {- \frac{2^{2 R}}{8 σ_{p}^{2}}} = 2 \exp {- \frac{3}{2} 2^{2 (R - \frac{1}{2} \log (12 σ_{p}^{2}))}},

(10)

Whenever $E_{{\bar{OL}}_{n}}$ occurs, we have that ${\hat{V}}_{n} = V_{n}$ , and consequently

{\hat{X}}_{n} = X_{n} + \frac{Z_{n} + \frac{1}{2}}{α}

and

E [{(X_{n} - {\hat{X}}_{n})}^{2} | E_{{\bar{OL}}_{n}}] = E [{(\frac{Z_{n} + \frac{1}{2}}{α})}^{2} | E_{{\bar{OL}}_{n}}] = \frac{1}{α^{2}} \frac{E {(Z_{n} + 1 / 2)}^{2} - \Pr (E_{{OL}_{n}}) E [{(Z_{n} + 1 / 2)}^{2} | E_{{OL}_{n}}]}{\Pr (E_{{\bar{OL}}_{n}})} \leq \frac{1}{12 α^{2} (1 - \Pr (E_{{OL}_{n}}))} .

(11)

Proposition 1 shows that we can make $\Pr (E_{{OL}_{n}})$ as small as $2 e^{- \frac{3}{2} 2^{2 δ}}$ by choosing

R = \frac{1}{2} \log (12 σ_{p}^{2}) + δ .

(12)

For example, taking δ = 2 bits, results in an overload probability smaller than 10⁻¹⁰. In particular, unless we take a very small δ, we have that $1 - \Pr (E_{{OL}_{n}}) \approx 1$ , and consequently, by Proposition 1, we will have D ≈ 1/12α². Thus, to simplify expressions in the analysis that follows, we assume D ≈ 1/12α². We note the tradeoff in choosing α: on the one hand, increasing α decreases the MSE distortion D, but on the other hand the prediction error variance $σ_{p}^{2}$ of the process V_n = αX_n + Z_n increases with α such that the required rate R for avoiding overload errors increases. Thus, the tradeoff between D and the required quantization rate is controlled through the parameter α. We now turn to characterize the tradeoff the developed scheme achieves.

Let h(A) denote the differential entropy of the random variable A, and h(A|B) the conditional differential entropy of A given the random variable B [5]. Recall that for a stationary Gaussian process {X_n} with PSD S_X(e^jω) we have that [36]

h (X_{n} | X_{n - 1}, \dots) = \frac{1}{2 π} \int_{π}^{π} \frac{1}{2} \log (2 π e S_{X} (e^{j ω})) d ω,

(13)

and in particular h(X_n|X_n−1,…) = −∞ if and only if S_X(e^jω) = 0 over a measurable subset of [−π,π). Shannon’s lower bound [3], states that the number of bits per sample R produced by any quantizer that attains an MSE distortion D must satisfy

R (D) \geq R_{SLB} (D) ≜ h (X_{n} | X_{n - 1}, \dots) - \frac{1}{2} \log (2 π e D) .

It is well-known [3] that for Gaussian processes with finite h(X_n|X_n−1,…), Shannon’s lower bound is asymptotically tight, i.e., lim_D→0 R(D) − R_SLB(D) = 0.

Proposition 2: If h(X_n|X_n−1,…) > −∞, then

\lim_{D \to 0} \lim_{p \to \infty} \frac{1}{2} \log (12 σ_{p}^{2}) = R_{SLB} (D) .

Proof: We can write

\frac{1}{2} \log (12 σ_{p}^{2}) = \frac{1}{2} \log (\frac{\frac{σ_{p}^{2}}{α^{2}}}{\frac{1}{12 α^{2}}}) = \frac{1}{2} \log (\frac{E {(E_{n}^{' p})}^{2}}{D}) .

(14)

where $E_{n}^{' p}$ is the pth order prediction error of the process $X_{n} + Z_{n} / α = X_{n} + \sqrt{D} {\tilde{Z}}_{n}$ , where ${\tilde{Z}}_{n} \sim Unif ([- \sqrt{12}, 0))$ iid.

For a Gaussian process {X_n}, the condition h(X_n|X_n−1,…) > −∞ is equivalent to

\frac{1}{2 π} \int_{- π}^{π} \frac{1}{2} \log (S_{X} (e^{j ω})) d ω > - \infty .

(15)

As a consequence of (15), we have that

\lim_{D \to 0} \frac{1}{2 π} \int_{- π}^{π} \frac{1}{2} \log (2 π e (S_{X} (e^{j ω}) + D)) d ω = h (X_{n} | X_{n - 1}, \dots) .

(16)

By Paley-Wiener’s theorem [37], we have that

\lim_{p \to \infty} E {(E_{n}^{' p})}^{2} = 2^{\frac{1}{2 π} \int_{- π}^{π} \log (S_{X} (e^{j ω}) + D) d ω} .

(17)

Combining (16) and (17), we obtain that

\lim_{D \to 0} \lim_{p \to \infty} E {(E_{n}^{' p})}^{2} = 2^{2 h (X_{n} | X_{n - 1}, \dots) - 2 π e},

for processes with finite entropy rate h(X_n|X_n−1,…). The result now follows by rearranging terms.

For the practically important case where {X_n} is obtained by oversampling the process {X(t)}, which is studied in Section III, the assumption h(X_n|X_n−1,…) > −∞ of Proposition 2 does not hold. Nevertheless, we will show that the modulo ADC achieves performance that is close to the information theoretic limits.

Above, we have assumed that the decoder has access to the non-folded samples {V_n−1,…,V_n−p}. To justify this assumption, an initialization step is needed, where the decoder acquires the first p consecutive samples {V₁,…,V_p}, or estimates of these samples. Once those are obtained, we can apply the algorithm described above, sample-by-sample, and assume the estimate ${\hat{V}}_{n}$ produced by the algorithm at time n is correct, and can be used as an input for the algorithm in the next p steps. All samples V_p+1,…,V_N will be recovered correctly, as long as no overload error occurred within the N −p decoding steps. Thus, by the union bound, we see that the first N−p samples are recovered correctly with probability at least $1 - 2 N e^{- \frac{3}{2} 2^{2 δ}} .$ ²

One conceptually simple way of performing the initialization, i.e., obtaining {V₁,…,V_p} is by using a standard scalar quantizer with high-rate for the first p samples. Although the high power consumption of such a quantizer will have a negligible effect on the total power consumption, due to the fact it is used only for a small fraction of the time, this approach has the disadvantage of having to include two ADCs, a high-rate standard ADC and a modulo ADC withing the system. Alternatively, one can perform the initialization using only an R bit modulo ADC in one of the two following ways:

1)
Increase α gradually until it reaches its final value. For the first sample, α₁ will be chosen such that V₁ = α₁X₁+Z₁ is w.h.p. within the modulo interval, such that no prediction is needed. Next, we can use V₁ in order to predict V₂ = α₂X₂ +Z₂, which allows to use α₂ > α₁ such that the prediction error is still within the modulo interval. Continuing this way, we can keep increasing α until convergence.
2)
We can collect a long vector of outputs from the modulo ADC, say {Y₁,…,Y_K}, K > p, and unwrap the modulo operation via the integer-forcing source coding scheme described in the next subsection. The amount of computations per sample required in this method is greater than that of the “steady state”, i.e., after initialization is complete, but since initialization is rarely performed, the effect on the total complexity is negligible.

B. Modulo ADCs for Random Vectors

In this subsection we consider the case where K identical (R,α)-mod ADCs, as described above, are applied on a random vector, one on each component of the vector. We develop a corresponding decoder and analyze its performance, including the effects of the choices of α and R.

Let $X \sim N (0, Σ)$ be a K-dimensional Gaussian random vector with zero mean and covariance matrix Σ. Let

Y_{k} = [α X_{k} + Z_{k}] \mod 2^{R}, k = 1, \dots K,

be obtained by applying K identical (R,α) mod-ADCs, each applied to a different coordinate of the vector X, where the quantization noises Z_k ∼ Unif((−1,0]), k = 1,…,K, are iid, and let

V_{k} = α X_{k} + Z_{k}, k = 1, \dots K,

be its non-folded version. Our goal is to recover $V ≜ {[V_{1}, \dots, V_{K}]}^{T}$ from the outputs $Y ≜ {[Y_{1}, \dots, Y_{K}]}^{T}$ of the modulo ADCs with high probability.

To that end, we now review a sub-optimal low-complexity decoder, proposed in [11], dubbed the integer-forcing (IF) source decoder, see Figure 7. Let $\frac{1}{2}$ be a K-dimensional vector with all entries equal to $\frac{1}{2}$ , and Ibe the identity matrix. The decoding algorithm works as follows.

Fig. 7. — Schematic architecture for modulo ADC for random vectors.

Inputs: Y, Σ, R, α.

Output: Estimates ${\hat{V}}_{IF}$ , and ${\hat{X}}_{IF}$ , for V and X, respectively.

Algorithm:

1) Solve

A = {[a_{1} | \dots | a_{K}]}^{T} = \underset{\underset{| \bar{A} | \neq 0}{\bar{A} \in ℤ^{K \times K}}}{\arg \min} \max_{k = 1, …, K} \frac{1}{2} \log ({\bar{a}}_{k}^{T} (I + 12 α^{2} Σ) {\bar{a}}_{k}),

(18)

where |A| denotes the absolute value of det(A).

2) For k = 1,…,K, compute

{\bar{g}}_{k} ≜ [a_{k}^{T} (Y + \frac{1}{2})] \mod 2^{R} = [g_{k}] \mod 2^{R}, {\tilde{g}}_{k} ≜ [{\bar{g}}_{k} + \frac{1}{2} 2^{R}] \mod 2^{R} - \frac{1}{2} 2^{R},

(19)

and set $\tilde{g} = {[{\tilde{g}}_{1}, \dots, {\tilde{g}}_{K}]}^{T}$ .

3) Output ${\hat{V}}_{IF} = A^{- 1} \tilde{g}$ , and ${\hat{X}}_{IF} = \frac{{\hat{V}}_{IF}}{α}$ .

Remark 2: The optimization problem (18) requires a computational complexity exponential in K, in general (unless P=NP). However, the problem of finding the optimal integer matrix A, need only be solved once for each covariance matrix Σ and α. Thus, even if the solution to this problem is computationally expensive, its cost is normalized by the number of times this solution is used. In practice, one can apply the LLL algorithm [38] in order to obtain a sub-optimal A with polynomial complexity in K.

The next proposition, adapted from [11, Theorem 2] characterizes the performance of modulo ADCs with the decoder above.

Proposition 3: Let A = [a₁|···|a_K]^T be the matrix found in step 1 of the algorithm above, and define

R_{IFSC} (A) = \max_{k = 1, \dots, K} \frac{1}{2} \log (a_{k}^{T} (I + 12 α^{2} Σ) a_{K}^{T}) .

(20)

We have that

\Pr (E_{OL}) = \Pr ({\hat{V}}_{IF} \neq V) \leq 2 K \exp {- \frac{3}{2} \cdot 2^{2 (R - R_{IFSC} (A))}},

and

D_{k} = E [{(X_{k} - {\hat{X}}_{k, IF})}^{2} | E_{\bar{OL}}] \leq \frac{1}{12 α^{2} (1 - \Pr (E_{OL}))},

for all k = 1,…,K, where the event $E_{\bar{OL}} = {{\hat{V}}_{IF} = V}$ is the complement of the event $E_{OL} = {{\hat{V}}_{IF} \neq V}$ .

The main idea behind the decoder above is the simple observation that for any vector a = [a₁,…,a_k]^T ∈ ℤ^K and any vector h = [h₁,…,h_K]^T ∈ ℝ^K we have that

[\sum_{k = 1}^{K} a_{k} [h_{k}] \mod 2^{R}] \mod 2^{R} = [\sum_{k = 1}^{K} a_{k} h_{k}] \mod 2^{R} .

(21)

Proof: By the identity (21), we have that the quantities ${\bar{g}}_{k}$ , computed in step 2 of the algorithm, satisfy

{\bar{g}}_{k} = [a_{k}^{T} (Y + \frac{1}{2})] \mod 2^{R} = [g_{k}] \mod 2^{R},

where

g_{k} ≜ a_{k}^{T} (V + \frac{1}{2}) .

Furthermore, ${\tilde{g}}_{k} \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})$ is merely a cyclically shifted version of ${\bar{g}}_{k} \in [0, 2^{R})$ . Thus, ${\tilde{g}}_{k} = g_{k}$ if and only if $g_{k} \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})$ . Consequently, ${\hat{V}}_{IF} \neq V$ if and only if the event

E_{OL} = ⋃_{k = 1}^{K} {| g_{k} | \geq \frac{1}{2} 2^{R}},

occurs. Thus, by the union bound,

\Pr (E_{OL}) = \Pr ({\hat{V}}_{IF} \neq V) \leq \sum_{k = 1}^{K} \Pr (| g_{k} | \geq \frac{1}{2} 2^{R}) .

(22)

The random variable g_k has zero mean, variance $σ_{k}^{2} = a_{k}^{T} (α^{2} Σ + \frac{1}{12} I) a_{k}$ , and satisfies the conditions of Lemma 1. We therefore have that

\Pr (| g_{k} | \geq \frac{1}{2} 2^{R}) \leq 2 \exp {- \frac{2^{2 R}}{8 σ_{k}^{2}}} = 2 \exp {- \frac{3}{2} \cdot 2^{2 (R - \frac{1}{2} \log (12 σ_{k}^{2}))}} = 2 \exp {- \frac{3}{2} \cdot 2^{2 (R - \frac{1}{2} \log (a_{k}^{T} (I + 12 α^{2} Σ) a_{k}))}} .

Substituting this into (22) and recalling the definition of R_IFSC(A), gives

P_{e} \leq 2 K \exp {- \frac{3}{2} \cdot 2^{2 (R - R_{IFSC} (A))}} .

(23)

Conditioned on the event $E_{\bar{OL}}$ , i.e., the event that $E_{OL}$ did not occur, we have that for all k = 1,…,K

D_{k} = E [{(X_{k} - {\hat{X}}_{k, IF})}^{2} | E_{\bar{OL}}] = E [{(\frac{Z_{k} + \frac{1}{2}}{α})}^{2} | E_{\bar{OL}}] \leq \frac{1}{12 α^{2} (1 - \Pr (E_{OL}))},

where the last inequality follows similarly to (11).

As in the previous subsection, we set

R = R_{IFSC} (A) + δ,

(24)

such that

\Pr (E_{OL}) \leq 2 K \exp {- \frac{3}{2} \cdot 2^{2 δ}},

(25)

and set D = 1/12α², which is a good approximation for the upper bound we derived on D_k, provided that δ is not too small. Consequently, we can write

R_{IFSC} (A, D) ≜ \max_{k = 1, \dots, K} \frac{1}{2} \log (a_{k}^{T} (I + \frac{1}{D} Σ) a_{k}) .

(26)

The tradeoff between rate, distortion and error probability achieved by the (R,α) mod-ADC with an integer-forcing decoder is therefore characterized by equations (24), (25), and (26). To put this result in context, we recall the information theoretic benchmark [11]

R_{bench}^{BT} (D) ≜ \frac{1}{2 K} \log | I + \frac{1}{D} Σ |,

that approximates the minimal quantization rate, per quantizer, required by any computationally and delay unlimited system in order to achieve MSE of at most D in the reconstructions of each X_k, k = 1,…,K. Thus,

R_{IFSC} (A, D) - R_{bench}^{BT} (D) = \frac{1}{2} \log (\frac{\max_{k = 1, \dots, K} a_{k}^{T} (I + \frac{1}{D} Σ) a_{k}}{{| I + \frac{1}{D} Σ |}^{\frac{1}{K}}}) .

(27)

It is easy to show that the right handside of (27) is non-negative [11]. However, typically it is possible to find an integer matrix A for which the gap is quite small, and under certain distributions of practical interest on Σ, the cumulative distribution function (CDF) of this gap can be characterized [21]. A comprehensive comparison between R_IFSC(D) and $R_{bench}^{BT} (D)$ was performed in [11], and it was demonstrated that they are usually quite close.

III. OVERSAMPLED MODULO-ADC

In Section II-A we have demonstrated the effectiveness of the modulo ADC architecture for acquiring stochastic processes that are correlated in time. In particular, we have shown that the performance of a modulo ADC depends on the variance of the prediction error of the process {V_n = αX_n + Z_n}, rather than the variance of V_n itself. However, when designing an ADC, it is desirable to impose as few constraints as possible on the signals that will be fed to the ADC. Therefore, assuming that {X_n} is such that {V_n} is highly predictable may be too restrictive.

Nevertheless, recalling that the process {X_n} is obtained by sampling a continuous-time process X(t), we observe that if the sampling rate is higher than Nyquist’s rate, {X_n} will be bandlimited,³ and consequently, {V_n} will be highly predictable no matter what the precise PSD of {X_n} happens to be. In fact, this observation can be viewed as the rationale underlying ΣΔ-conversion. In particular, a ΣΔ-converter is information theoretically equivalent to a differential pulse-code modulator (DPCM) whose input is a bandlimited signal with flat spectrum [35].

While having many advantages, the implementation of ΣΔ converters is more involved than that of traditional scalar uniform quantizers. The main challenge in the design of ΣΔ converters is the need to produce the quantization error, and then apply a filter to this analog signal. A major obstacle is that the generation of the quantization error requires to first quantize the current sample, then apply a digital-to-analog converter (DAC) to produce the analog representation of the quantizer’s output, and finally to subtract this representation from the original sample. See Figure 2. The quantizer and the DAC need to be matched as otherwise the produced quantization error is inaccurate. This, however, turns out to be quite difficult to achieve, unless the quantizer is a simple sign detector (1-bit quantizer).

To circumvent the challenges listed above, we develop an oversampled modulo ADC architecture, as an alternative to ΣΔ-conversion. The only assumptions made on the input process {X(t)}, is that it is bandlimited with maximal frequency at most B, and that its variance is at most σ². The developed universal architecture is as follows. See Figure 3.

Analog-to-digital conversion: The process X(t) is uniformly sampled every T_S = 1/2LB seconds, L > 1, such that the sampling rate is L times above Nyquist’s rate. Each sample of the obtained discrete-time process {X_n} is then discretized using an (R, α) mod-ADC, resulting in the quantized process {Y_n = [αX_n + Z_n] mod 2^R}.

As above, we define the unfolded process {V_n = αX_n + Z_n}. The decoding procedure assumes {V_n−1,…,V_n−p} are given, and computes an estimate for V_n, based on Y_n.

Inputs: Y_n, {V_n−1,…,V_n−p}, σ², L, R, α.

Outputs: Estimates ${\hat{V}}_{n}$ and ${\hat{X}}_{n}$ for V_n and X_n, respectively.

Algorithm: The algorithm is exactly the same as that in Section II-A, with only one difference. Here {C_X[r]} is unknown. Thus, for the computation of the p-tap prediction filter {h_n}, we assume the PSD of {X_n} is

S_{X} (e^{j ω}) = {\begin{array}{l} L σ^{2} & ω \in [- \frac{π}{L}, \frac{π}{L}) \\ 0 & ω \notin [- \frac{π}{L}, \frac{π}{L}) \end{array},

(28)

even though this assumption may, and is most likely to, be wrong.

Final post-processing: After collecting a long sequence of estimates ${{\hat{X}}_{1}, \dots, {\hat{X}}_{N}}$ we apply a non-causal low pass filter

G (e^{j ω}) = {\begin{array}{l} \frac{12 α^{2} L σ^{2}}{1 + 12 α^{2} L σ^{2}} & if ω \in [- \frac{π}{L}, \frac{π}{L}] \\ 0 & if ω \notin [- \frac{π}{L}, \frac{π}{L}] \end{array}

on them, to obtain the sequence ${{\hat{X}}_{1}^{LPF}, \dots, {\hat{X}}_{N}^{LPF}}$ .

The advantages over ΣΔ conversion are clear: the only processing done in the analog domain is sampling and applying a modulo ADC, whereas all filtering operations are done digitally at the decoder.

Proposition 1 provides an upper bound on the error probability $\Pr (E_{{OL}_{n}}) = \Pr ({\hat{V}}_{n} \neq V_{n})$ in terms of $R - \frac{1}{2} \log (12 σ_{p}^{2}) .$ However, Proposition 2, which characterizes the scaling of $\frac{1}{2} \log (12 σ_{p}^{2})$ with D, does not apply here for two reasons. The first is that we use a mismatched prediction filter here, due to the unknown PSD of {X_n}, and the second is that whatever the exact PSD truns out to be, it is assumed to be supported on the frequency interval $[- \frac{π}{L}, \frac{π}{L}]$ , such that h(X_n|X_n−1,…) = −∞, and the high-resolution assumption never holds. Instead, we prove the following.

Proposition 4: Let {X_n} be a zero-mean stationary process with variance $E (X_{n}^{2}) \leq σ^{2}$ and PSD supported inwhere frequency interval $[- \frac{π}{L}, \frac{π}{L}]$ . Let $V_{n} = α X_{n} + Z_{n}$ where Z_n ∼ Unif([−1,0)), and ${\hat{V}}_{n}^{p}$ be as in (6), where {h_n} is the optimal linear MMSE p-tap prediction filter for V_n, from its past samples {V_n−1,…,V_n−p}, designed under the assumption that S_X(e^jω) is as in (28). Then

\lim_{p \to \infty} 12 σ_{p}^{2} \leq {(1 + 12 α^{2} L σ^{2})}^{\frac{1}{L}} .

Proof: Let

S_{\tilde{V}} (e^{j ω}) = {\begin{array}{l} α^{2} L σ^{2} + 1 / 12 & ω \in [- \frac{π}{L}, \frac{π}{L}] \\ 1 / 12 & ω \notin [- \frac{π}{L}, \frac{π}{L}] \end{array},

(29)

and let H_p(e^jω) be the frequency response of the prediction filterlet {h_n}, which is designed with respect to (29). Further, let $H (e^{j ω}) = \lim_{p \to \infty} H_{p} (e^{j ω})$ . By the basic principles of optimal linear MMSE prediction, we have that

S_{\tilde{V}} (e^{j ω}) {| 1 - H (e^{j ω}) |}^{2} = 2^{\frac{1}{2 π} \int_{- π}^{π} \log (S_{\tilde{V}} (e^{j ω})) d ω} .

(30)

Therefore, combining (29) and (30), we see that

{| 1 - H (e^{j ω}) |}^{2} = {\begin{array}{l} {(1 + 12 α^{2} L σ^{2})}^{\frac{1}{L} - 1} & ω \in [- \frac{π}{L}, \frac{π}{L}] \\ {(1 + 12 α^{2} L σ^{2})}^{\frac{1}{L}} & ω \notin [- \frac{π}{L}, \frac{π}{L}] \end{array} .

(31)

Applying this filter on the “actual” process V_n = αX_n + Z_n, whose PSD is

S_{V} (e^{j ω}) = {\begin{array}{l} α^{2} S_{X} (e^{j ω}) + 1 / 12 & ω \in [- \frac{π}{L}, \frac{π}{L}] \\ 1 / 12 & ω \notin [- \frac{π}{L}, \frac{π}{L}] \end{array},

we get

\lim_{p \to \infty} 12 σ_{p}^{2} = \lim_{p \to \infty} 12 E {(V_{n} - {\hat{V}}_{n}^{p})}^{2} = \frac{1}{2 π} \int_{- π}^{π} S_{V} (e^{j ω}) {| 1 - H (e^{j ω}) |}^{2} d ω = \frac{{(1 + 12 α^{2} L σ^{2})}^{\frac{1}{L}}}{2 π} [\int_{ω \notin [- π / L, π / L]} 1 d ω + \int_{- π / L}^{π / L} {(1 + 12 α^{2} L σ^{2})}^{- 1} (1 + 12 α^{2} S_{X} (e^{j ω})) d ω] \leq {(1 + 12 α^{2} L σ^{2})}^{\frac{1}{L}},

(32)

where the last inequality follows from our assumption that $\frac{1}{2 π} \int_{- π / L}^{π / L} S_{X} (e^{j ω}) d ω = E (X_{n}^{2}) \leq σ^{2} .$

It follows from Proposition 1 combined with Proposition 4, that for large p and a quantization rate of roughly

R = δ + \frac{1}{L} \frac{1}{2} \log (1 + 12 α^{2} L σ^{2}),

(33)

the proposed system achieves $\Pr (E_{{OL}_{n}}) \leq 2 \exp {- \frac{3}{2} 2^{2 δ}}$ , for all input processes with bandwidth ≥ B and variance ≥ σ².

After low-pass filtering with G(e^jw), we get by a similar analysis to that done in Section II-A and in [35], that for long enough N such that the discrete Fourier transform (DFT) of N consecutive samples of {X_n} have negligible energy in frequencies above π/L, we have that

D = E [(X_{n} - {\hat{X}}_{n}^{LPF})^{2} | ⋂_{n = 1}^{N} {{\hat{V}}_{n} = V_{n}}] \leq \frac{σ^{2}}{1 + 12 α^{2} L σ^{2}} \frac{1}{1 - \Pr (⋂_{n = 1}^{N} {{\hat{V}}_{n} = V_{n}})} \leq \frac{σ^{2}}{1 + 12 α^{2} L σ^{2}} \frac{1}{1 - N \Pr (E_{{OL}_{n}})} \leq \frac{σ^{2}}{1 + 12 α^{2} L σ^{2}} \frac{1}{1 - 2 N \exp {- \frac{3}{2} 2^{2 δ}}} .

(34)

Thus, for large enough δ such that the total overload probability is small, i.e.,

2 N \exp {- \frac{3}{2} 2^{2 δ}} ≪ 1,

(35)

we have that our system achieves distortion ≈ D with

R = \frac{1}{L} \frac{1}{2} \log (\frac{σ^{2}}{D}) + δ .

(36)

The term $\frac{1}{L} \frac{1}{2} \log (\frac{σ^{2}}{D})$ is the rate-distortion function of a source with PSD as in (28). Thus, up to the loss of δ bits per sample, due to the one dimensional quantizer we are using, whose size is dictated by (35), our system is optimal in the following minimax sense: no system can attain a better tradeoff between R and D simultaneously for all processes with bandwidth at most B and variance at most σ².

The multiplicative increase in quantization rate of the developed system, with respect to the fundamental rate-distortion limit, is $(\frac{1}{2} \log (\frac{σ^{2}}{D}) + L δ) / (\frac{1}{2} \log (\frac{σ^{2}}{D}))$ . If X(t) were sampled at its Nyquist rate, rather than L times above it, standard uniform scalar quantization would have achieved similar overload probability and distortion with only a $(\frac{1}{2} \log (\frac{σ^{2}}{D}) + δ) / (\frac{1}{2} \log (\frac{σ^{2}}{D}))$ multiplicative increase in rate with respect to the fundamental limit. Thus, oversampling combined with the architecture developed here produces a total number of bits per-second which is greater than that required by an ADC operating at the Nyquist rate. The disadvantage of the latter approach is that it requires to use a high-resolution quantizer for each sample, whereas the scheme developed here, allows to reduce the number of quantization bits per sample, at the expanse of an increased sampling rate. Thus, just like ΣΔ conversion, the scheme developed here allows to replace slow but high-resolution ADCs, with fast low-resolution ones.

IV. IMPLEMENTATION VIA RING OSCILLATORS

In this Section we develop an architecture for a circuit implementing a modulo ADC, and provide a mathematical model for its input-output characteristic. Our implementation is essentially based on converting the input voltage into phase, which can naturally only be observed modulo 2π, and then quantizing the phase. To that end, we use ring oscillator ADCs, as described next.

Consider a closed-loop cascade of N inverters, where N is an odd number, all controlled with the same voltage V_dd, see Figure 4. This circuit, which is referred to as a ring oscillator can act as an ADC with sampling period T_s, when V_dd is set to V_in(t) = g(X(t)), where X(t) is the analog signal to be converted to a digital one and g(·) is a function to be specified, and the state (‘0’ or ‘1’, corresponding to ‘low’ or ‘high’) of each inverter is measured every T_s seconds.

It is well known that the time it takes for a non-ideal inverter’s output to respond to a change in its input is a function of V_dd [39], which we denote by Δ(V_dd) > 0. Taking this delay into account, a moment of reflection reveals that at each time instance, exactly one pair of adjacent inverters are at the same state whereas all other pairs of adjacent inverters are at distinct states. Denote by I ∈ {1,…,N} the index of the first inverter within the pair that shares the same state, and denote its state by B ∈ {0,1}, i.e., the adjacent pair of inverters with the same state are inverter I and inverter [I+1] mod N, and their state is B. With this notation, we can uniquely identify the states of all N inverters at time t with the number Q_t = (I_t−1)+N·[I_t+B_t] mod 2 ∈ {0,…,2N−1}. See Figure 5. By sampling the states of all N inverters every T_s seconds, we gain access to the discrete-time process {QnT_s}.

A crucial observation is that the process Q_t cyclically oscillates in increments of +1 modulo 2N. More formally stated, if t′ > t is the earliest time where Q_t′ ≠ Q_t, then

Q_t′ = [Q_t + 1] mod 2N. We designate by V_n the number of increments that occurred in the process {Q_t} within the time interval [nT_S,(n+1)T_s), and define the output of the induced modulo ADC as

Y_{n} ≜ [V_{n}] \mod 2 N = [Q_{(n + 1) T_{s}} - Q_{n T_{s}}] \mod 2 N .

Next, we relate V_n to the process V_in(t). To this end, we make the simplifying assumption that X(t) is constant within each time interval [nT_s,(n + 1)T_s), and consequently, so is V_in(t). This assumption can be made exact by adding a sample-and-hold circuit to the system. Assuming the function Δ(V_dd) is identical for all N inverters, we have that

Q_{n T_{s}} = [⌊ \sum_{k = - \infty}^{n - 1} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋] \mod 2 N,

and consequently,

Y_{n} = [[⌊ \sum_{k = - \infty}^{n} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋] \mod 2 N - [⌊ \sum_{k = - \infty}^{n - 1} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋] \mod 2 N] \mod 2 N = [⌊ \sum_{k = - \infty}^{n} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋ - ⌊ \sum_{k = - \infty}^{n - 1} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋] \mod 2 N,

where the last equality follows from the modulo distributive law (2). Defining the “quantization error”

Z_{n} = ⌊ \sum_{k = - \infty}^{n} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} ⌋ - \sum_{k = - \infty}^{n} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} \in (- 1, 0],

we can write

Y_{n} = [\sum_{k = - \infty}^{n} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} + Z_{n} - \sum_{k = - \infty}^{n - 1} \frac{T_{s}}{Δ (V_{i n} (k T_{s}))} - Z_{n - 1}] \mod 2 N = [\frac{T_{s}}{Δ (V_{i n} (n T_{s}))} + Z_{n} - Z_{n - 1}] \mod 2 N .

Let us now define the function

f (x) = \frac{1}{Δ (x)},

which corresponds to the oscillation frequency of our circuit, and is dictated by the characteristics of the inverters at hand, and let us also take the function g(·) to be affine, such that V_in(t) = a+bX(t). We further define the discrete time process X_n = X(nT_s), for all n ∈ ℕ. We have therefore obtained the model

Y_{n} = [T_{s} \cdot f (a + b X_{n}) + Z_{n} - Z_{n - 1}] \mod 2 N .

(37)

In general, the quantization noise process {Z_n} is a deterministic function of the process {X_n}. Nevertheless, as in the analysis of the ideal modulo ADC, in the sequel we make the simplifying assumption that it is an iid process with Z_n ~ Unif((−1.0]).

if f(·) were an affine function itself, with an appropriate choice of the parameters a,b we could have induced the model

Y_{n} = [α X_{n} + Z_{n} - Z_{n - 1}] \mod 2^{R},

where R = log(2N), which is identical to the ideal (R,α) mod-ADC, up to the fact that the quantization noise Z_n−Z_n−1 is now a first order moving-average (MA) process rather than a white process. In practice, however, it is difficult to construct inverters for which f(·) is approximately affine within a large range. The effect of nonlinearities of f(·) on the performance of the modulo ADC is numerically studied in the next section.

V. Numerical Experiments

We have conducted numerical simulations for the performance of a ring oscillator based modulo ADC, where the input is an oversampled process, as in Section III. In our simulations, we have assumed that the inverters were produced using a CMOS technology. The corresponding function f(V_in) relating the input voltage to the output frequency of the oscillator, which was introduced in Section IV, is shown in Figure 8, as obtained using a PSpice simulation.

Fig. 8. — The voltage to output frequency function f(V_in).

A. Design of System Parameters

In all our simulations, we have designed the modulo ADC and the corresponding decoder as described in Section III, i.e., under the assumption that the input signal X(t) is a Gaussian stationary process with zero mean and variance σ², whose PSD is flat within the frequency interval [−B, B] and zero outside this interval. The sampling rate is a factor of L < 1 above the Nyquist rate, such that the sampling period is $T_{s} = \frac{1}{2 L B}$ seconds.

Given the oversampling ratio L, the number of inverters N, and the above assumptions on the statistics of X(t), the design of the modulo ADC and its corresponding decoder consists of:

Choosing the shift and scaling parameters a and b for the modulo ADC such that V_in(t) = a + bX(t);
Designing the p-tap prediction filter {h_n} for V_n = T_sf(a + bX_n) + Z_n − Z_n−1 given the past samples {V_n−1,...,V_n−p};
Designing a 2K + 1-tap noncausal smoothing filter {g_n} for estimating X_n from {V_n−k,...,V_n+k}.

The decoding procedure consists of recovering an estimate ${{\hat{V}}_{n}}$ for {V_n} from the modulo ADC’s outputs {Y_n = [T_sf(a + bX_n) + Z_n−Z_n−1] mod 2N}, by applying the decoding procedure described in Section III with the prediction filter{h_n}. Then, the estimate ${{\hat{X}}_{n}}$ is produced by applying the smoothing filter {g_n} to the process ${{\hat{V}}_{n}}$ , which is referred to as final post-processing in Section III. The filters {h_n} and {g_n} are chosen as the MMSE-optimal linear prediction and smoothing filters, respectively. Calculating the coefficients of {h_n} requires knowledge of the second-order statistics of the process {V_n}. This in turn, can be (numerically)calculated from the pair wise distribution of{X_n, X_n−m}, m = 0,…,p, which is fully characterized by our assumption that {X_n} is a Gaussian process with PSD S_X(e_jω)as in (28). Calculating the coefficients of {g_n} requires, in addition, the joint second-order statistics of the processes {X_n, V_n}, which can either be calculated numerically, orvia Buss gang’s Theorem [40].

We apply the developed modulo ADC architecture to processes of length T discrete samples. The parameter sand bare chosen as follows: Let $P_{e} = \Pr (\cup_{t = 1}^{T} {\hat{V}}_{t} \neq V_{t})$ be the block error probability of our decoder, and let ϵ be our target block error probability. For every a and b, we find the filters {h_n} and {g_n} as described above, and compute the corresponding P_e = P_e(a,b) via Monte Carlo simulation for a Gaussian input process with PSD as in (28). Among all (a,b) for which P_e(a,b) < ϵ, we choose the pair that resultsin the smallest MSE distortion $\frac{1}{T} \sum_{t = 1}^{T} E {(X_{t} - {\hat{X}}_{t})}^{2}$ . The target block error probability for all of these tups we consideris ϵ = 10⁻³, and the block length we consider is T = 2¹¹. Roughly,these parameters correspond to allowing aper-sample overload error probability of 10⁻³ · 2⁻¹¹ ≈ 4.89 · 10⁻⁷.

B. Evaluation Method

The system was designed for a bandlimited Gaussian process with a flat PSD. Nevertheless, we would like it to achieve approximately the same MSE distortion and error probability for all bandlimited processes with the same variance, regardless of the PSD within that band. For an ideal modulo ADC and large p, this is indeed the case, as shown in Section III. To test to what extent this remains the case also for the ring oscillator based modulo ADC, we apply our system on two types of processes: 1) A Gaussian process with variance σ² and bandwidth B, whose PSD is flat within this band, for which the system was designed; 2) A sinusoidal waveform, whose frequency is chosen at random, uniformly on [0,B), and whose amplitude is $\sqrt{2 σ^{2}}$ , such that its power is σ².

For each experiment, we also plot the theoretical performance of an ideal (R,α) mod-ADC, as well as those of a first-order ΣΔ (with the optimal 1-tap noise shaping filter) converter, both designed to achieve the same target block error probability for the bandlimited Gaussian stochastic process X(t). Although overload errors have a different effect on ΣΔ converters and modulo ADCs, both systems fail to achieve their target distortions unless those are avoided.

In the ADC literature, it is quite common to measure the performance of a particular ADC for a sinusoidal input. One drawback of this approach is that the deterministic nature of the input signal allows to design the ADC such that overload errors never occur, without significantly increasing its dynamic range above the standard deviation of its input. For stochastic processes, even if Gaussianity is assumed, the dynamic range must be as large as multiple standard deviations of its input, in order to ensure a small overload probability. In our derivations, this is manifested through the rate backoff parameter δ, which dictates the ratio between the quantizer’s dynamic range 2^R and the standard deviation of its input (which in our case is the prediction error processes).

In order to allow a unified presentation of the results for both Gaussian and sinusoidal processes, rather than plotting the rate R_mod-ADC(D) required by the modulo ADC in order to achieve an MSE distortion D with target block error probability ϵ, we plot R_mod-ADC(D) − δ, where

δ = \frac{1}{2} \log (- \frac{2}{3} \ln (\frac{ϵ}{2 T})) .

(38)

This is consistent with traditional converter analyses that separate saturation effects from granularity ones [4], [37]. For our parametersbits T = 2¹¹, ϵ = 10⁻³, (38) evaluates to δ ≈ 1.6717 bits. Note that by (12), δ is the rate backoff required in order to attain block error probability below ϵ by an ideal modulo ADC, when the input process is Gaussian. A similar analysis reveals that the same rate backoff is also required for a ΣΔ converter to attain the same block error probability, under the same assumptions on the input process [35]. Thus, in all figures we also plot RΣΔ(D)−δ rather than RΣΔ(D), where RΣΔ(D) is the rate needed by the ΣΔ converter to attain distortion D with block error probability below ϵ.

C. Results and Discussion

We have performed experiments for the parameters L = 3 and four different values of B: 100Hz, 44.1KHz, 100KHz and 1MHz. The value of σ² is immaterial, as it can be absorbed in the parameter b. The results are depicted in Figures 9a, 9b, 9c and 9d, respectively. The results are based on Monte Carlo simulation, with 10³ independent trials for each point in each figure. No overload errors were observed for the choices of a, b, {hⁿ} and {gⁿ} that correspond to each point in the figures, neither for the Gaussian processes and neither for the sinusoidal processes.

Fig.9 — Performance of ring oscillators based modulo ADC (RO-ADC). We plot SNR vs. quantization rate for a Gaussian process and for a sinusoidal waveform processes with a random frequency, uniformly distributed over [0,B). For comparison we also plot the performance of an ideal (R, α) mod-ADC, as well as those of an ideal first-order ΣΔ converter. For all curves, SNR is defined as σ²/D. The prediction filter has p = 25 taps, whereas the smoothing filter has 2k + 1 taps for k = 22.

In general, the results indicate that the ring oscillator implementation of a modulo ADC is closer to the ideal modulo ADC for small bandwidths B and quantization rates R. In all figures we observe the same trend: for small enough R the curve of the SNR as a function of R for the ring oscillator modulo ADC is parallel to that of the ideal modulo ADC, and has a slope of ≈ 6L= 18dB/bit, in agreement with(36). Then, for large enough R the system’s non-linearities “kick-in” and the slope significantly decreases. Eventually, for large enough R, the first-order ΣΔ converter outperforms the ring oscillator modulo ADC, as can be observed in Figure 9d. Nevertheless, for moderate values of R, even for B = 1MHz, the improvement over the ΣΔ converter can be as large as 17dB.

The trends above are to be expected. Recall that the output of the corresponding modulo ADC is given by (37). If b · σ is small enough, the function f(a + bX_n) resides in a small interval around f(a) with high probability, and is well approximated by the linear function f(a)+bf′(a)X_n. Consequently, the output of the modulo ADC can be well approximated as

Y_{n} \approx [T_{s} b f^{'} (a) X_{n} + Z_{n} - Z_{n - 1} + T_{s} f (a)] \mod 2 N .

Since T_sf(a) is known and can be removed, this is equivalent to a (T_sbf′(a),log(2N)) mod-ADC, albeit with quantization noise Z_n − Z_n−1 rather than Z_n.

Typically, however, in order to get a large gain from using a modulo ADC rather than a standard uniform quantizer, we would like to use an (R,α) mod-ADC with $α \cdot σ ≫ \frac{1}{2} 2^{R}$ . Thus, in order to get a “useful” modulo ADC that is close to ideal, the two conditions (i) b · σ ≪ 1; (ii) T_sf′(a) · b · σ ≫ N; should hold. These two conditions can only be satisfied simultaneously if T_sf′(a) ≫ 1, i.e., when the sampling rate is low, relative to f′(a).

For an ideal (R,α) mod-ADC with a given target overload error probability, as R increases α can also increase, resulting in a smaller distortion. Similarly, for the ring oscillator modulo ADC, the optimal choice of b should, in general, increase with R. For small rates, the optimal value of b is also small, such that the linear approximation for the function f(·) is not too bad. However, as R, and consequently b, increases, the nonlinearities start becoming significant and the slope of the SNR as a function of R becomes smaller.

VI. MODULO ADCS FOR JOINTLY STATIONARY PROCESSES

In this section we develop a scheme that uses K parallel modulo ADCs for digitizing K jointly stationary processes, provide a corresponding low-complexity decoding algorithm, and characterize its performance.

Let ${X_{n}^{1}}, \dots, {X_{n}^{K}}$ be K discrete-time jointly Gaussian stationary random processes, obtained by sampling the jointly Gaussian stationary processes $X_{1} (t), \dots, X_{K} (t)$ every T_s seconds. Let

Y_{n}^{k} = [α X_{n}^{k} + Z_{n}^{k}] \mod 2^{R}, k = 1, \dots, K, n = 1, 2, \dots

be the processes obtained by applying K parallel (R,α) mod-ADCs, on ${X_{n}^{1}}, \dots, {X_{n}^{K}}$ , where the input to the kth modulo ADC is the process ${X_{n}^{k}}$ , and ${Z_{n}^{k}}$ is a Unif((−1,0]) noise, iid in space and in time. Let

V_{n}^{k} = α X_{n}^{k} + Z_{n}^{k}, k = 1, \dots, K, n = 1, 2, \dots

be the non-folded version of $Y_{k}^{n}$ . Let $X_{n} = {[X_{n}^{1}, \dots, X_{n}^{K}]}^{T}$ , and define Y_n, Z_n and V_n similarly. Our goal is to recover the process {V_n} from the outputs of the modulo ADCs with high probability.

To achieve this goal, we employ a two-step procedure, combining the schemes from Section II-A and Section II-B: first we compute a predictor ${\hat{V}}_{n}^{p}$ based on previous p samples {V_n−1,…,V_n−p} whose error is the vector $E_{n}^{p} = V_{n} - {\hat{V}}_{n}^{p}$ . By the same derivation as in Section II-A, we can produce [ $E_{n}^{p}$ ] mod 2^R from Y_n ^and {V_n−1,…,V_n−p}, where the modulo operation applied to a vector is to be understood as reducing each coordinate modulo 2^R. Now, our task is to decode a modulo-folded correlated random vector, which can be done via the integer-forcing decoder described in Section II-B. This relatively simple decoding procedure allows to efficiently exploit both temporal and spatial correlations. Below we describe it in more detail. See Figure 6. For all ℓ,m ∈ {1,…,K}, let $C_{l m} [r] = E (X_{n}^{l} X_{n - r}^{m})$ .

Inputs: Y_n, {V_n−1,…,V_n−p}, {C_ℓm[r]} for all ℓ,m ∈ {1,…,K}, R, α.

Outputs: Estimates ${\hat{V}}_{n}$ and ${\hat{X}}_{n}$ for V_n and X_n, respectively.

Algorithm:

1)
Compute the optimal linear MMSE predictor for V_n from its last p samples
${\hat{V}}_{n}^{p} = \sum_{i = 1}^{p} H_{i} \cdot (V_{n - i} + \frac{1}{2}) - \frac{1}{2},$
where {H_n} is a p-tap matrix prediction filter, H_i ∈ ℝ^K×K, for i = 1,…, p, computed based on {C_ℓm[r]} for all ℓ,m ∈ {1,…,K} and α, and the shift by $\frac{1}{2}$ compensates for $E (Z_{n})$ .
2)
Compute
$W_{n} = [Y_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R},$
where the modulo reduction is to be understood as taken component-wise.
3)
Define the pth order prediction error $E_{n}^{p} ≜ V_{n} - {\hat{V}}_{n}^{p}$ , and compute its covariance matrix $Σ_{p} = E [E_{n}^{p} {(E_{n}^{p})}^{T}]$ based on {Cℓm[r]} for all ℓ,m ∈ {1,…,K} and α. Note that Σp is indeed invariant with respect to n due to stationarity.
4)
Solve
$A = {[a_{1} | \dots | a_{K}]}^{T} = \underset{| \bar{A} | \neq 0}{\underset{\bar{A} \in ℤ^{K \times K}}{argmin}} \max_{k = 1, \dots, K} \frac{1}{2} \log (12 {\bar{a}}_{k}^{T} Σ_{p} {\bar{a}}_{k}) .$ (39)
5)
For k = 1,…,K, compute
${\bar{g}}_{n}^{k} ≜ [a_{k}^{T} W_{n}] \mod 2^{R} {\tilde{g}}_{n}^{k} ≜ [{\bar{g}}_{n}^{k} + \frac{1}{2} 2^{R}] \mod 2^{R} - \frac{1}{2} 2^{R},$
and set ${\tilde{g}}_{n} = {[{\tilde{g}}_{n}^{1}, \dots, {\tilde{g}}_{n}^{k}]}^{T}$ .
6)
Compute
${\hat{E}}_{n}^{p} = A^{- 1} {\tilde{g}}_{n}, {\hat{V}}_{n} = {\hat{V}}_{n}^{p} + {\hat{E}}_{n}^{p}, {\hat{X}}_{n} = \frac{{\hat{V}}_{n} + \frac{1}{2}}{α} .$

Proposition 5: Let A = [a¹|⋯|a^K]^T be the matrix found in step 4 of the algorithm above, and define

R_{IFSC}^{ST} (A) = \max_{k = 1, \dots, K} \frac{1}{2} \log (12 a_{k}^{T} Σ_{p} a_{K}^{T}) .

(40)

We have that

\Pr (E_{{OL}_{n}}) = \Pr ({\hat{V}}_{n} \neq V_{n}) \leq 2 K \exp {- \frac{3}{2} \cdot 2^{2 (R - R_{IFSC}^{ST} (A))}},

and

D_{n}^{k} = E [{(X_{n}^{k} - {\hat{X}}_{n}^{k})}^{2} | E_{{\bar{OL}}_{n}}] \leq \frac{1}{12 α^{2} (1 - \Pr (E_{{OL}_{n}}))},

for all k = 1,…,K, where the event $E_{{\bar{OL}}_{n}} = {{\hat{V}}_{n} = V_{n}}$ is the complement of the event $E_{{OL}_{n}} = {{\hat{V}}_{n} \neq V_{n}}$ .

Proof: We first note that

W_{n} = [Y_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [[V_{n}] \mod 2^{R} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [V_{n} - {\hat{V}}_{n}^{p}] \mod 2^{R} = [E_{n}^{p}] \mod 2^{R},

where the second equality follows from the modulo distributive law (2). By (21), we have that

{\bar{g}}_{n}^{k} ≜ [a_{k}^{T} W_{n}] \mod 2^{R} = [a_{k}^{T} E_{n}^{p}] \mod 2^{R} = [g_{n}^{k}] \mod 2^{R},

where

g_{n}^{k} = a_{k}^{T} E_{n}^{p} .

(41)

Furthermore, ${\tilde{g}}_{n}^{k} \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})$ is merely a cyclically shifted version of ${\bar{g}}_{n}^{k} \in [0, 2^{R})$ . Thus, ${\tilde{g}}_{n}^{k} = g_{n}^{k}$ if and only if $g_{n}^{k} \in [- \frac{1}{2} 2^{R}, \frac{1}{2} 2^{R})$ . Consequently, ${\hat{E}}_{n}^{p} \neq E_{n}$ , and therefore ${\hat{V}}_{n} \neq V_{n}$ , if and only if the event

E_{{OL}_{n}} = ⋃_{k = 1}^{K} {| g_{n}^{k} | \geq \frac{1}{2} 2^{R}},

occurs. Now, repeating the same steps from the proof of Proposition 3, we arrive at the claimed bounds.

Using Shannon’s lower bound, and applying similar arguments as in [41], one can show that any quantization scheme for the source {X_n} that produces R bits/sample/coordinate and attains $E (X_{n}^{k} - {\hat{X}}_{n}^{k})^{2} \leq D, k = 1, \dots, K, n = 1, \dots$ , must have $R \geq \frac{1}{K} h (X_{n} | X_{n - 1}, \dots) - \frac{1}{2} \log (2 π e D)$ . Let $E_{n}^{p *} = X_{n} - {\hat{X}}_{n}^{p}$ , where $X_{n}^{p}$ is the optimal pth order MMSE (linear) predictor of X_n from {X_n−1,…,X_n−p}, and let $Σ_{p}^{*} = E [E_{n}^{p *} {(E_{n}^{p *})}^{T}]$ . We have that

h (X_{n} | X_{n - 1}, \dots, X_{n - p}) = h (E_{n}^{p *} | X_{n - 1}, \dots, X_{n - p}) \overset{(a)}{=} h (E_{n}^{p *}) \overset{(b)}{=} \frac{1}{2} \log ({(2 π e)}^{K} | Σ_{p}^{*} |),

where (a) follows from the orthogonality principle of MMSE estimation [37], and (b) from the fact that $E_{n}^{p *}$ is a Gaussian random vector [5]. Thus, for any quantization scheme we must have

R (D) \geq R_{SLB} (D) ≜ \frac{1}{2} \log (\frac{\lim_{p \to \infty} {| Σ_{p}^{*} |}^{\frac{1}{k}}}{D}) .

Similarly to previous subsections, we set D = 1/12α², which is a good approximation for $D_{k}^{n}, k = 1, …, K$ , provided that $δ = R - R_{IFSC}^{ST} (A)$ is not too small. The rate required by our scheme, as given in Proposition 5, depends on 12Σ_p, which corresponds to the prediction error covariance of the process ${\tilde{X}}_{n} = \sqrt{12 α^{2}} X_{n} + {\tilde{Z}}_{n} = \frac{1}{\sqrt{D}} (X_{n} + \sqrt{D} {\tilde{Z}}_{n})$ , where ${\tilde{Z}}_{n} = \sqrt{12} Z_{n}$ is a random vector with unit variance iid entries. Let ${\tilde{Σ}}_{p}$ be the pth order prediction error covariance of the process $X_{n} + \sqrt{D} {\tilde{Z}}_{n}$ . We can rewrite the rate required by our scheme as

R_{IFSC}^{ST} (A, D) ≜ \frac{1}{2} \log (\frac{\max_{k = 1, \dots, K} a_{k}^{T} {\tilde{Σ}}_{p} a_{k}}{D}) .

Now, noting that if h(X_n|X_n−1,…) > −∞, we have that ${\tilde{Σ}}_{p} \to Σ_{p}^{*}$ as D → 0, we obtain the following proposition.

Proposition 6: Assume h(X_n|X_n−1,…⁾ ≥ −∞, and let $Σ^{*} ≜ \lim_{p \to \infty} Σ_{p}^{*}$ . We have that

\lim_{D \to 0} \lim_{p \to \infty} R_{IFSC}^{ST} (A, D) - R_{SLB} (D) = \frac{1}{2} \log (\frac{\max_{k = 1, \dots, K} a_{k}^{T} Σ^{*} a_{k}}{{| Σ^{*} |}^{\frac{1}{K}}}) .

(42)

Thus, in the high-resolution regime, when taking large enough p, the gap between $R_{IFSC}^{ST} (A, D)$ and the information theoretic lower bound is dictated by the loss of IFSC for a source whose covariance vector is Σ∗. The right hand side of (42) is non-negative [11], but is typically quite small. To illustrate this, we generate two correlated processes ${X_{n}^{1}}$ and ${X_{n}^{2}}$ as follows: let ${W_{n}^{1}}, {W_{n}^{2}}, {W_{n}^{3}}$ be three iid $N (0, 1)$ random processes. Let $X_{n}^{1} = \sum_{i = 0}^{L - 1} h_{i} W_{n - i}^{3} + W_{n}^{1}$ , and $X_{n}^{2} = \sum_{i = 0}^{L - 1} g_{i} W_{n - i}^{3} + W_{n}^{2}$ , where {h_n} and {g_n} are two filters, each with L taps. Clearly, when the filters have sufficiently strong taps the process ${X_{n}} = {[{X_{n}^{1}}, {X_{n}^{2}}]}^{T}$ will be highly correlated in time and in space. In Figure 10 we plot the average rate required by the developed scheme, as well as RSLB (D),and the rate required by a standard ADC that ignores spatial and temporal correlations entirely, denoted R_naive(D), with respect to an iid $N (0, 100)$ distribution on the 2L taps of {h_n} and {g_n}. In the simulations performed, we took L = 5 and p = 24.

Fig. 10. — Comparison between the average quantization rates $R_{IFSC}^{ST} (D)$ , R_SLB(D), and R_naive(D). The setup is that of quantizing vector of stationary processes ${X_{n}^{1}}, {X_{n}^{2}}$ described in the end of Section VI, with L = 5 and _{p = 24}.

VII. Conclusions and Outlook

We have studied the modulo ADC architecture as an alternative approach for analog-to-digital conversion. The modulo ADC allows exploitation of the statistical structure of the input process digitally at the decoder without requiring the ADC to adapt itself to the input statistics. We have demonstrated the effectiveness of oversampled modulo ADCs as a simple substitute to ΣΔ converters, allowing an increase in the filter’s order far beyond that which is possible in current ΣΔ converters, since for modulo ADC filtering is done digitally. Moreover, we have shown that, when used for digitizing jointly stationary processes, parallel modulo ADCs can efficiently exploit both temporal and spatial correlations.

An implementation of modulo ADCs via ring oscillators was developed, and the corresponding input-output function for the obtained modulo ADC was characterized in terms of the delay–V_dd profile of the inverters that construct the ring oscillator. We have then numerically studied the performance this implementation can attain for oversampled input processes, and compared it to those of ΣΔ converters.

There are several important challenges for future research. Perhaps most important is building a modulo ADC chip protomeasured from an actual (PSpice model of a) ring oscillatortype. Although our simulations are based on the function f(·) device, a hardware implementation is needed to fully assess the benefits of modulo ADCs. Furthermore, we would like to see whether it is possible to construct inverters with more favorable properties for ring oscillator-based modulo ADCs. In particular, we would like them to have a larger range where they are well approximated by an affine function. Another interesting avenue for future research is finding functions g(·) that can be implemented in the analog domain, such that the composition of function f ◦ g = f(g(·)) is more linear.

ACKNOWLEGEMENT

The authors are deeply grateful to Uri Erez, whose humbleness is the only reason for his absence from the authors list.

Footnotes

Here, the term “high probability” is used to state that this probability can be made as high as desired by increasing R. We explicitly quantify the relation between R and the desired “no-overload” probability.

Note that conditioning on the event that no overload error occurred until time n, changes the statistics of $E_{n}^{p}$ . Thus, applying the union bound correctly here requires some more care. See [35] for more details.

We say that a discrete-time process {X_n} is bandlimited, if there exists some γ < π such that S_X(e^jω) = 0 for all ω ∈ (−π, −γ) ∪ (γ,π). Since our analysis takes quantization noise into account, it is quite robust to slight deviations from the assumption that S_X(e^jω) is strictly band limited. In particular, as long as S_X (e^jω) ≪ D, for all ω ∈ (−π,−γ) ∪ (γ,π), where D is the target MSE distortion, our analysis remains valid.

Contributor Information

Or Ordentlich, Email: or.ordentlich@mail.huji.ac.il, Hebrew University of Jerusalem, Israel.

Gizem Tabak, Email: tabak2@illinois.edu, University of Illinois, Urbana-Champaign, USA.

Pavan Kumar Hanumolu, Email: hanumolu@illinois.edu, University of Illinois, Urbana-Champaign, USA.

Andrew C. Singer, Email: acsinger@illinois.edu, University of Illinois, Urbana-Champaign, USA.

Gregory W. Wornell, Email: gww@mit.edu, Massachusetts Institute of Technology, MA, USA.

REFERENCES

[1].Walden R, “Analog-to-digital converter survey and analysis,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, 1999. [Google Scholar]
[2].Le B, Rondeau TW, Reed JH, and Bostian CW, “Analog-to-digital converters,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 69–77, November 2005. [Google Scholar]
[3].Berger T, Rate distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971. [Google Scholar]
[4].Jayant NS and Noll P, Digital coding of waveforms: principles and applications to speech and video. Englewood Cliffs, NJ: Prentice-Hall, 1984. [Google Scholar]
[5].Cover T and Thomas J, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006. [Google Scholar]
[6].Hovin M, Olsen A, Lande TS, and Toumazou C, “Delta-sigma modulators using frequency-modulated intermediate values,” IEEE Journal of Solid-State Circuits, vol. 32, no. 1, pp. 13–22, January 1997. [Google Scholar]
[7].Straayer MZ and Perrott MH, “A 12-bit, 10-MHz bandwidth, continuous-time ΣΔ ADC with a 5-bit, 950-MS/s VCO-based quantizer,” IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 805–814, Apr. 2008. [Google Scholar]
[8].Telatar E, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, Nov-Dec 1999. [Google Scholar]
[9].Tse D and Viswanath P, Fundamentals of Wireless Communication. Cambridge: Cambridge University Press, 2005. [Google Scholar]
[10].Larsson EG, Edfors O, Tufvesson F, and Marzetta TL, “Massive MIMO for next generation wireless systems,” IEEE Communications Magazine, vol. 52, no. 2, pp. 186–195, February 2014. [Google Scholar]
[11].Ordentlich O and Erez U, “Integer-forcing source coding,” IEEE Transactions on Information Theory, vol. 63, no. 2, pp. 1253–1269, February 2017. [Google Scholar]
[12].Ericson T and Ramamoorthy V, “Modulo-PCM: A new source coding scheme,” in ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, April 1979, pp. 419–422. [Google Scholar]
[13].Forney G, “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” IEEE Transactions on Information Theory, vol. 18, no. 3, pp. 363–378, May 1972. [Google Scholar]
[14].Ramamoorthy V, “A novel speech coder for medium and high bit rate applications using modulo-PCM principles,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 356–368, April 1985. [Google Scholar]
[15].Noll P, “On predictive quantizing schemes,” The Bell System Technical Journal, vol. 57, no. 5, pp. 1499–1532, May 1978. [Google Scholar]
[16].Zamir R, Kochman Y, and Erez U, “Achieving the Gaussian ratedistortion function by prediction,” IEEE Transactions on Information Theory, vol. 54, no. 7, pp. 3354–3364, July 2008. [Google Scholar]
[17].Boufounos PT, “Universal rate-efficient scalar quantization,” IEEE Transactions on Information Theory, vol. 58, no. 3, pp. 1861–1872, March 2012. [Google Scholar]
[18].Valsesia D and Boufounos PT, “Universal encoding of multispectral images,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 4453–4457. [Google Scholar]
[19].Bhandari A, Krahmer F, and Raskar R, “On unlimited sampling,” in 2017 International Conference on Sampling Theory and Applications (SampTA), July 2017, pp. 31–35. [Google Scholar]
[20].Bhandari A, Krahmer F, and Raskar R, “Unlimited sampling of sparse signals,” in IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2018. [Google Scholar]
[21].Domanovitz E and Erez U, “Outage probability bounds for integerforcing source coding,” in Proceedings of the IEEE Information Theory Workshop (ITW 2017), Kaohsiung, Taiwan, Nov. 2017. [Google Scholar]
[22].Zamir R, Shamai (Shitz) S, and Erez U, “Nested linear/lattice codes for structured multiterminal binning,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1250–1276, June 2002. [Google Scholar]
[23].Tomlinson M, “New automatic equalizer employing modulo arithmetic,” Electron. Lett, vol. 7, pp. 138–139, Mar. 1971. [Google Scholar]
[24].Harashima H and Miyakawa H, “Matched-transmission technique for channels with intersymbol interference,” IEEE Transactions on Communications, vol. 20, no. 4, pp. 774–780, Aug. 1972. [Google Scholar]
[25].Hong S-N and Caire G, “Compute-and-forward strategies for cooperative distributed antenna systems,” Information Theory, IEEE Transactions on, vol. 59, no. 9, pp. 5227–5243, September 2013. [Google Scholar]
[26].Nazer B and Gastpar M, “Compute-and-forward: Harnessing interference through structured codes,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 6463–6486, Oct. 2011. [Google Scholar]
[27].van Valburg J and van de Plassche RJ, “An 8-b 650-MHz folding ADC,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pp. 1662–1666, December 1992. [Google Scholar]
[28].Venkataramani R and Bresler Y, “Perfect reconstruction formulas and bounds on aliasing error in sub-Nyquist nonuniform sampling of multiband signals,” IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 2173–2183, 2000. [Google Scholar]
[29].Vetterli M, Marziliano P, and Blu T, “Sampling signals with finite rate of innovation,” IEEE transactions on Signal Processing, vol. 50, no. 6, pp. 1417–1428, 2002. [Google Scholar]
[30].Mishali M and Eldar YC, “From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 375–391, April 2010. [Google Scholar]
[31].Erez U and Zamir R, “Achieving log(1 + SNR) on the AWGN channel with lattice encoding and decoding,” IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2293–2314, Oct. 2004. [Google Scholar]
[32].Gray RM, “Quantization noise spectra,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1220–1244, November 1990. [Google Scholar]
[33].Ordentlich O and Erez U, “Precoded integer-forcing universally achieves the MIMO capacity to within a constant gap,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 323–340, January 2015. [Google Scholar]
[34].Feng C, Silva D, and Kschischang F, “An algebraic approach to physical-layer network coding,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7576–7596, November 2013. [Google Scholar]
[35].Ordentlich O and Erez U, “Performance analysis and optimal filter design for sigma-delta modulation via duality with DPCM,” IEEE Transactions on Information Theory, Submitted June., under revision 2015. [Google Scholar]
[36].Gray RM et al. , “Toeplitz and circulant matrices: A review,” Foundations and Trends⃝R in Communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006. [Google Scholar]
[37].Gersho A and Gray RM, Vector quantization and signal compression. Springer Science & Business Media, 2012, vol. 159. [Google Scholar]
[38].Lenstra AK, Lenstra HW, and Lovász L, “Factoring polynomials with rational coefficients,” Mathematische Annalen, vol. 261, no. 4, pp. 515–534, 1982. [Google Scholar]
[39].Rabaey JM, Chandrakasan A, and Nikolic B, Digital integrated circuits: a design perspective . Pearson Education, 2003. [Google Scholar]
[40].Bussgang JJ, “Crosscorrelation functions of amplitude-distorted Gaussian signals,” 1952. [Google Scholar]
[41].Zamir R and Berger T, “Multiterminal source coding with high resolution,” IEEE Transactions on Information Theory, vol. 45, no. 1, pp. 106–117, 1999. [Google Scholar]

[R1] [1].Walden R, “Analog-to-digital converter survey and analysis,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, 1999. [Google Scholar]

[R2] [2].Le B, Rondeau TW, Reed JH, and Bostian CW, “Analog-to-digital converters,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 69–77, November 2005. [Google Scholar]

[R3] [3].Berger T, Rate distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971. [Google Scholar]

[R4] [4].Jayant NS and Noll P, Digital coding of waveforms: principles and applications to speech and video. Englewood Cliffs, NJ: Prentice-Hall, 1984. [Google Scholar]

[R5] [5].Cover T and Thomas J, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006. [Google Scholar]

[R6] [6].Hovin M, Olsen A, Lande TS, and Toumazou C, “Delta-sigma modulators using frequency-modulated intermediate values,” IEEE Journal of Solid-State Circuits, vol. 32, no. 1, pp. 13–22, January 1997. [Google Scholar]

[R7] [7].Straayer MZ and Perrott MH, “A 12-bit, 10-MHz bandwidth, continuous-time ΣΔ ADC with a 5-bit, 950-MS/s VCO-based quantizer,” IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 805–814, Apr. 2008. [Google Scholar]

[R8] [8].Telatar E, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, Nov-Dec 1999. [Google Scholar]

[R9] [9].Tse D and Viswanath P, Fundamentals of Wireless Communication. Cambridge: Cambridge University Press, 2005. [Google Scholar]

[R10] [10].Larsson EG, Edfors O, Tufvesson F, and Marzetta TL, “Massive MIMO for next generation wireless systems,” IEEE Communications Magazine, vol. 52, no. 2, pp. 186–195, February 2014. [Google Scholar]

[R11] [11].Ordentlich O and Erez U, “Integer-forcing source coding,” IEEE Transactions on Information Theory, vol. 63, no. 2, pp. 1253–1269, February 2017. [Google Scholar]

[R12] [12].Ericson T and Ramamoorthy V, “Modulo-PCM: A new source coding scheme,” in ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, April 1979, pp. 419–422. [Google Scholar]

[R13] [13].Forney G, “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” IEEE Transactions on Information Theory, vol. 18, no. 3, pp. 363–378, May 1972. [Google Scholar]

[R14] [14].Ramamoorthy V, “A novel speech coder for medium and high bit rate applications using modulo-PCM principles,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 356–368, April 1985. [Google Scholar]

[R15] [15].Noll P, “On predictive quantizing schemes,” The Bell System Technical Journal, vol. 57, no. 5, pp. 1499–1532, May 1978. [Google Scholar]

[R16] [16].Zamir R, Kochman Y, and Erez U, “Achieving the Gaussian ratedistortion function by prediction,” IEEE Transactions on Information Theory, vol. 54, no. 7, pp. 3354–3364, July 2008. [Google Scholar]

[R17] [17].Boufounos PT, “Universal rate-efficient scalar quantization,” IEEE Transactions on Information Theory, vol. 58, no. 3, pp. 1861–1872, March 2012. [Google Scholar]

[R18] [18].Valsesia D and Boufounos PT, “Universal encoding of multispectral images,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 4453–4457. [Google Scholar]

[R19] [19].Bhandari A, Krahmer F, and Raskar R, “On unlimited sampling,” in 2017 International Conference on Sampling Theory and Applications (SampTA), July 2017, pp. 31–35. [Google Scholar]

[R20] [20].Bhandari A, Krahmer F, and Raskar R, “Unlimited sampling of sparse signals,” in IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2018. [Google Scholar]

[R21] [21].Domanovitz E and Erez U, “Outage probability bounds for integerforcing source coding,” in Proceedings of the IEEE Information Theory Workshop (ITW 2017), Kaohsiung, Taiwan, Nov. 2017. [Google Scholar]

[R22] [22].Zamir R, Shamai (Shitz) S, and Erez U, “Nested linear/lattice codes for structured multiterminal binning,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1250–1276, June 2002. [Google Scholar]

[R23] [23].Tomlinson M, “New automatic equalizer employing modulo arithmetic,” Electron. Lett, vol. 7, pp. 138–139, Mar. 1971. [Google Scholar]

[R24] [24].Harashima H and Miyakawa H, “Matched-transmission technique for channels with intersymbol interference,” IEEE Transactions on Communications, vol. 20, no. 4, pp. 774–780, Aug. 1972. [Google Scholar]

[R25] [25].Hong S-N and Caire G, “Compute-and-forward strategies for cooperative distributed antenna systems,” Information Theory, IEEE Transactions on, vol. 59, no. 9, pp. 5227–5243, September 2013. [Google Scholar]

[R26] [26].Nazer B and Gastpar M, “Compute-and-forward: Harnessing interference through structured codes,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 6463–6486, Oct. 2011. [Google Scholar]

[R27] [27].van Valburg J and van de Plassche RJ, “An 8-b 650-MHz folding ADC,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pp. 1662–1666, December 1992. [Google Scholar]

[R28] [28].Venkataramani R and Bresler Y, “Perfect reconstruction formulas and bounds on aliasing error in sub-Nyquist nonuniform sampling of multiband signals,” IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 2173–2183, 2000. [Google Scholar]

[R29] [29].Vetterli M, Marziliano P, and Blu T, “Sampling signals with finite rate of innovation,” IEEE transactions on Signal Processing, vol. 50, no. 6, pp. 1417–1428, 2002. [Google Scholar]

[R30] [30].Mishali M and Eldar YC, “From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 375–391, April 2010. [Google Scholar]

[R31] [31].Erez U and Zamir R, “Achieving log(1 + SNR) on the AWGN channel with lattice encoding and decoding,” IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2293–2314, Oct. 2004. [Google Scholar]

[R32] [32].Gray RM, “Quantization noise spectra,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1220–1244, November 1990. [Google Scholar]

[R33] [33].Ordentlich O and Erez U, “Precoded integer-forcing universally achieves the MIMO capacity to within a constant gap,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 323–340, January 2015. [Google Scholar]

[R34] [34].Feng C, Silva D, and Kschischang F, “An algebraic approach to physical-layer network coding,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7576–7596, November 2013. [Google Scholar]

[R35] [35].Ordentlich O and Erez U, “Performance analysis and optimal filter design for sigma-delta modulation via duality with DPCM,” IEEE Transactions on Information Theory, Submitted June., under revision 2015. [Google Scholar]

[R36] [36].Gray RM et al. , “Toeplitz and circulant matrices: A review,” Foundations and Trends⃝R in Communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006. [Google Scholar]

[R37] [37].Gersho A and Gray RM, Vector quantization and signal compression. Springer Science & Business Media, 2012, vol. 159. [Google Scholar]

[R38] [38].Lenstra AK, Lenstra HW, and Lovász L, “Factoring polynomials with rational coefficients,” Mathematische Annalen, vol. 261, no. 4, pp. 515–534, 1982. [Google Scholar]

[R39] [39].Rabaey JM, Chandrakasan A, and Nikolic B, Digital integrated circuits: a design perspective . Pearson Education, 2003. [Google Scholar]

[R40] [40].Bussgang JJ, “Crosscorrelation functions of amplitude-distorted Gaussian signals,” 1952. [Google Scholar]

[R41] [41].Zamir R and Berger T, “Multiterminal source coding with high resolution,” IEEE Transactions on Information Theory, vol. 45, no. 1, pp. 106–117, 1999. [Google Scholar]

PERMALINK

A Modulo-Based Architecture for Analog-to-Digital Conversion

Or Ordentlich

Gizem Tabak

Pavan Kumar Hanumolu

Andrew C Singer

Gregory W Wornell

Abstract

I. INTRODUCTION

Fig. 1.

A. Our Contributions

1). Oversampled Modulo ADC:

Fig. 2.

Fig. 3.

2). A Phase-Domain Implementation of Modulo ADC via Ring Oscillators:

Fig. 4.

Fig. 5.

3). Modulo ADCs for Jointly Stationary Processes:

Fig. 6.

B. Related Work

C. Organization

II. Preliminaries on Ideal Modulo ADC

A. Modulo ADCs for Scalar Stationary Processes

B. Modulo ADCs for Random Vectors

Fig. 7.

III. OVERSAMPLED MODULO-ADC

IV. IMPLEMENTATION VIA RING OSCILLATORS

V. Numerical Experiments

Fig. 8.

A. Design of System Parameters

B. Evaluation Method

C. Results and Discussion

Fig.9.

VI. MODULO ADCS FOR JOINTLY STATIONARY PROCESSES

Fig. 10.

VII. Conclusions and Outlook

ACKNOWLEGEMENT

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases