Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 Oct 6:arXiv:2310.03956v1. [Version 1]

Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

Sara Fridovich-Keil 1, Fabrizio Valdivia 2, Gordon Wetzstein 1, Benjamin Recht 3, Mahdi Soltanolkotabi 4
PMCID: PMC10593065  PMID: 37873016

Abstract

In computed tomography (CT), the forward model consists of a linear Radon transform followed by an exponential nonlinearity based on the attenuation of light according to the Beer–Lambert Law. Conventional reconstruction often involves inverting this nonlinearity as a preprocessing step and then solving a convex inverse problem. However, this nonlinear measurement preprocessing required to use the Radon transform is poorly conditioned in the vicinity of high-density materials, such as metal. This preprocessing makes CT reconstruction methods numerically sensitive and susceptible to artifacts near high-density regions. In this paper, we study a technique where the signal is directly reconstructed from raw measurements through the nonlinear forward model. Though this optimization is nonconvex, we show that gradient descent provably converges to the global optimum at a geometric rate, perfectly reconstructing the underlying signal with a near minimal number of random measurements. We also prove similar results in the under-determined setting where the number of measurements is significantly smaller than the dimension of the signal. This is achieved by enforcing prior structural information about the signal through constraints on the optimization variables. We illustrate the benefits of direct nonlinear CT reconstruction with cone-beam CT experiments on synthetic and real 3D volumes. We show that this approach reduces metal artifacts compared to a commercial reconstruction of a human skull with metal dental crowns.

1. Introduction

Computed tomography (CT) is a core imaging modality in modern medicine (Food & Administration, 2023). X-ray CT is used to diagnose a wide array of conditions, plan treatments such as surgery or chemotherapy, and monitor their effectiveness over time. It can image any part of the body, and is widely performed as an outpatient imaging procedure.

CT systems work by rotating an X-ray source and detector around the patient, measuring how much of the emitted X-ray intensity reaches the detector at each angle. Because different tissues absorb X-rays at different rates, each of these measurements records a projection of the patient’s internal anatomy along the exposure angle. Algorithms then combine these projection measurements at different angles to recover a 2D or 3D image of the patient. This image is then interpreted by a medical professional (e.g. physician, radiologist, or medical physicist) to help diagnose, monitor, or plan treatment for a disease or injury.

CT scanners in use today typically consider the image reconstruction task as a linear inverse problem, in which the measurements are linear projections of the signal at known angles. Omitting measurement noise, we can write this standard linear measurement model as:

yˆi=aiTx, (1.1)

where x is a vectorized version of the unknown signal (which commonly lies in 2D or 3D) and ai is a known, nonnegative measurement vector that denotes the weight each entry in the signal contributes to the integral yˆi along measurement ray i. Computed over a set of regularly-spaced ray angles, this is exactly the Radon transform (Radon, 1917). This linear measurement model is quite convenient, as it enables efficient computations using the Fourier slice theorem—which equates linear projections in real-space to evaluation of slices through Fourier space—as well as strong recovery guarantees from compressive sensing (Kak & Slaney, 2001; Foucart & Rauhuti, 2013; Bracewell, 1990). This linear projection model is accurate for signals of low density, for which the incident X-rays pass through largely unperturbed.

However, consider the common setting in which the signal contains regions of density high enough to occlude X-rays, such as the metal implants used in dental crowns and artificial joints. Such high-density regions produce nonlinear measurements for which the Fourier slice theorem, and standard compressed sensing results, no longer hold. In practice, tomographic reconstruction algorithms that assume a linear projection as the measurement model produce streak-like artifacts around high-density regions, potentially obscuring otherwise measurable and meaningful signal.

To avoid such artifacts, in this paper we consider a nonlinear measurement model, which correctly models signals with arbitrary density. Equation (1.1) then becomes:

yi=1exp(aiTx), (1.2)

where the exponential nonlinearity accounts for occlusion and is due to the Beer-Lambert Law. In practice, the partial occlusions captured by eq. (1.2) are commonly incorporated into a linear model by inverting the nonlinearity, converting raw measurements yi from eq. (1.2) into processed measurements yˆi=ln(1yi) for which eq. (1.1) holds. Indeed, this logarithmic preprocessing step is built into commercial CT scanners (Fu et al., 2016), though some additional preprocessing for calibration and denoising is often performed before the logarithm. The logarithm is well-conditioned for yi0 but becomes numerically unstable as yi approaches unity, which corresponds to total X-ray absorption. This is particularly problematic for rays that pass through high-density materials, such as metal, as well as for very low-dose CT scans that use fewer X-ray photons.

Instead, we study reconstruction through direct inversion of eq. (1.2) via iterative gradient descent. We optimize a squared loss function over these nonlinear measurements yi, which is optimal for the case of Gaussian measurement noise—though extending this analysis to more realistic noise models is also of interest (Fu et al., 2016). By avoiding the ill-conditioned logarithm of a near-zero measurement, this approach is well-suited to CT reconstruction with low-dose X-rays as well as CT reconstruction with reduced metal artifacts. However, direct reconstruction through eq. (1.2) is more challenging than reconstruction through the linearized eq. (1.1), because the resulting loss function is nonconvex. While the linear inverse problem defined by eq. (1.1) can be solved in closed form by methods such as Filtered Back Projection (Radon, 1917; Kak & Slaney, 2001; Natterer, 2001), the nonlinear inverse problem defined by eq. (1.2) requires an iterative solution to a nonconvex optimization problem. We show that gradient descent with appropriate stepsize successfully recovers the global optimum of this nonconvex objective, suggesting that direct optimization through eq. (1.2) is a viable and desirable alternative to current methods that use eq. (1.1).

Concretely, we make the following contributions:

  • We propose a Gaussian model of eq. (1.2) and show that gradient descent converges to the global optimum of this model at a geometric rate, despite the nonconvex formulation. These results hold with a near minimal number of random measurements. To prove this result we utilize and build upon intricate arguments for uniform concentration of empirical processes.

  • We also extend our result to a compressive sensing setting to show that a structured signal can be recovered from far fewer measurements than its dimension. In this case prior information about the signal structure is enforced via a convex regularizer; our result holds for any convex regularizer. We show that the required number of measurements is commensurate to an appropriate notion of statistical dimension that captures how well the regularizer enforces the structural assumptions about the signal. For example, for an s-sparse signal and an 1 regularizer our results require on the order of s log(n/s) measurements, where n is the dimension of the signal. This is the optimal sample complexity even for linear measurements.

  • We perform an empirical comparison of reconstruction quality in 3D cone-beam CT on both synthetic and real volumes, where our real dataset consists of a human skull with metal dental crowns. We show that direct reconstruction through eq. (1.2) yields reduction in metal artifacts compared to reconstruction by inverting the nonlinearity into eq. (1.1).

2. Problem Formulation

In practice, the measurement vectors ai are sparse, nonnegative, highly structured, and dependent on the rays i, as only a small subset of signal values in x will contribute to any particular ray. These vectors correspond to the weights in a discretized ray integral (projection) along the i‘th measurement ray in the Radon transform (Radon, 1917). We use these real, ray-structured measurement vectors in our synthetic and real-data experiments.

In our theoretical analysis, we make two simplifying alterations to eq. (1.2): (1) we model ai as a standard Gaussian vector, where the Gaussian randomness is an approximation of the randomness in the choice of ray direction, and (2) we wrap the inner product aiTx in a ReLU, to capture the physical reality that the raw integral of density along a ray, and the corresponding sensor measurement, must always be nonnegative. This nonnegativity is implicit in eq. (1.2) because ai represents a ray integral with only nonnegative weights as its entries, and the true density signal x is also nonnegative; in our model eq. (2.1) we make nonnegativity explicit (subscript + denotes ReLU):

yi=f(aiTx),  where  f()=1exp(()+). (2.1)

Here yi is a measurement corresponding to ray i,xn is the signal we want to recover, and ain are i.i.d. random Gaussian measurement vectors distributed as 𝒩(0,In). In this paper we consider a least-squares loss of the form

(z)=12mi=1m(yif(aiTz))2, (2.2)

which is optimal in the presence of Gaussian measurement noise. However, in our analysis we focus on the noiseless setting. We minimize this loss using subgradient descent starting from z0=0n, with step size μt in step t. More specifically, the iterates take the form

zt=zt1μt(zt1)=zt1μtmi=1maif(aiTzt1)(f(aiTzt1)yi).

Here, we use the following subdifferential of f:

f()={0, if <012, if =0exp(), if >0.

We also consider a regularized (compressive sensing) setting where the number of measurements m is significantly smaller than the dimension n of the signal. In this case we optimize the augmented loss function

(z)=12mi=1m(yif(aiTz))2+λ(z) (2.3)

via subgradient descent. Here, (z) is a regularizer enforcing a priori structure about the signal, with regularization weight λ. In our experiments, we use 3D total variation as , to encourage our reconstructed structure to have sparse gradients in 3D space.

We note that is a nonconvex objective, so it is not obvious whether or not subgradient descent will reach the global optimum. Do the iterates converge to the correct solution? How many iterations are required? How many measurements? How does the number of measurements depend on the signal structure and the choice of regularizer? In the following sections, we take steps to answer these questions.

3. Global Convergence in the Unregularized Setting

Our first result shows that in the unregularized setting, direct gradient-based updates converge globally at a geometric rate. We defer the proof of Theorem 1 to Appendix B.

Theorem 1 Consider the problem of reconstructing a signal xn from m nonlinear CT measurements of the form yi=1e(aiTx)+, where the measurement vectors ai are generated i.i.d. 𝒩(0,In). We consider a least-squares loss as in eq. (2.2) and run gradient updates of the form

zt=zt1μt(zt1)

starting from z0=0n with μ1=4 exp (x222)1erfc (x22) and μt=μe5x2 with μc0 for t>1. Here, erfc is the complementary error function. As long as the number of measurements obeys

mc1ec2x2x22n

then

ztx22(1μe10x2)tx22

holds with probability at least 15ec3n3em2. Here, c0,c1,c2, and c3 are fixed positive numerical constants.

Theorem 1 answers some of the key questions from the previous section in the affirmative. Even though the nonlinear CT reconstruction problem is a nonconvex optimization, gradient descent converges to the global optimum, the true signal, at a geometric rate.

Further, the number of required measurements m is on the order of n, the dimension of the signal, which is near-minimal even for a linear forward model. In Theorem 2 we prove global convergence with even fewer measurements in the compressive sensing setting, when some prior knowledge of the signal structure is enforced through a convex regularizer.

We note that the initial step size μ1 used in Theorem 1 is a function of the signal norm x2, which is a priori unknown. However, we briefly describe how this quantity can be estimated from the available measurements. By averaging over the m measurements, we have

1mi=1myi=1mi=1m(1e(aiTx)+)=11mi=1megi+x2

where gi are i.i.d. standard Gaussian random variables. Since egi+x2 is a 1-Lipschitz function of gi, this quantity concentrates around its mean

E[egi+x2]=12(1+exp (x222) erfc (x22)).

We can invert this relationship to get a close estimate of x2 from the average measurement value.

We also note that both the convergence rate and the number of measurements in Theorem 1 are exponentially dependent on x2. This is natural because as x2 increases towards infinity the measurements yi=1e(aiTx)+ approach the constant value 1 and the corresponding gradient of the loss approaches zero. Intuitively, this corresponds to trying to recover a CT scan of a metal box; if the walls of the box become infinitely absorbing of X-rays, we cannot hope to see inside it. Nonetheless, for real and realistic metal components in our experiments (Section 5) we do find good signal recovery following this approach.

4. Global Convergence in the Regularized Setting

We now turn our attention to the regularized setting. Our measurements again take the form yi=1e(aiTx)+ for i=1,2,,m, where xn is the unknown but now a priori “structured” signal. In this case we wish to use many fewer measurements m than the number of variables n, to reduce the X-ray exposure to the patient without sacrificing the resolution of the reconstructed image or volume x. Because the number of equations m is significantly smaller than the number of variables n, there are infinitely many reconstructions obeying the measurement constraints. However, it may still be possible to recover the original signal by exploiting knowledge of its structure. To this aim, let :n be a regularization function that reflects some notion of “complexity” of the “structured” solution. For the sake of our theoretical analysis we will use the following constrained optimization problem in lieu of eq. (2.3) to recover the signal:

min zn(z)=12mi=1m(yif(aiTz))2  subject to  (z)(x). (4.1)

We solve this optimization problem using projected gradient updates of the form

zt+1=𝒫𝒦(ztμt+1(zt)). (4.2)

Here, 𝒫𝒦(z) denotes the projection of zn onto the constraint set

𝒦={zn:(z)(x)}. (4.3)

We wish to characterize the rate of convergence of the projected gradient updates eq. (4.2) as a function of the number of measurements, the available prior knowledge of the signal structure, and how well the choice of regularizer encodes this prior knowledge. For example, if we know our unknown signal x is approximately sparse, using an 1 norm for the regularizer is superior to using an 2 regularizer. To make these connections precise and quantitative, we need a few definitions which we adapt verbatim from Oymak et al., (2017); Oymak & Soltanolkotabi (2017b); Soltanolkotabi (2019b).

Definition 1 (Descent set and cone) The set of descent of a function at a point x is defined as

𝒟(x){h:(x+h)(x)}.

The cone of descent, or tangent cone, is the conic hull of the descent set, or the smallest closed cone 𝒞(x) that contains the descent set, i.e. 𝒟(x)𝒞(x).

The size of the descent cone 𝒞 determines how well the regularizer captures the structure of the unknown signal x. The smaller the descent cone, the more precisely the regularizer describes the properties of the signal. We quantify the size of the descent cone using the notion of mean (Gaussian) width.

Definition 2 (Gaussian width) The Gaussian width of a set 𝒞p is defined as:

ω(𝒞)Eg[supz𝒞g,z],

where the expectation is taken over g~𝒩(0,Ip). Throughout we use n/Sn1 to denote the the unit ball/sphere of n.

We now have all the definitions in place to quantify how well the function captures the properties of the unknown signal x. This naturally leads us to the definition of the minimum required number of measurements.

Definition 3 (minimal number of measurements) Let 𝒞(z) be a cone of descent of at z. We define the minimal sample function as

(,z)ω2(𝒞(z)n).

We shall often use the short hand m0=(,z) with the dependence on ,z implied. Here we define m0 for an arbitrary point z, but we will apply the definition at the signal x.

We note that m0 is exactly the minimum number of samples required for structured signal recovery from linear measurements when using convex regularizers. Specifically, the optimization problem

arg minz 12mi=1m(yiaiTz)2  subject to  (z)(x), (4.4)

succeeds at recovering the unknown signal x with high probability from m measurements of the form yi=aiTx if and only if mm0.1 Given that in our Gaussian-approximated nonlinear CT reconstruction problem we have less information (we lose information when the input to the ReLU is negative), we cannot hope to recover structured signals from mm0 when using (4.1). Therefore, we can use m0 as a lower-bound on the minimum number of measurements required for projected gradient descent iterations eq. (4.2) to succeed in recovering the signal of interest. With these definitions in place we are now ready to state our theorem in the regularized/compressive sensing setting. We defer the proof of Theorem 2 to Appendix C.

Theorem 2 Consider the problem of reconstructing a signal xn from m nonlinear CT measurements of the form yi=1e(aiTx)+, where the measurement vectors ai are generated i.i.d. 𝒩(0,In). We consider a constrained least-squares loss as in eq. (4.1) and run projected gradient updates of the form in eq. (4.2) starting from z0=0n with μ1=4 exp (x222)1erfc (x22) and μt=μe5x2 with μc0(1+nm)2 for t>1. Here, erfc is the complementary error function. As long as the number of measurements obeys

mc1ec2x2x22m0,

with m0 denoting the minimal number of samples per Definition 3, then

ztx22(1μe10x2)tx22

holds with probability at least 15ec3m03em2. Here, c0,c1,c2, and c3 are fixed positive numerical constants.

Theorem 2 parallels Theorem 1, likewise showing fast geometric convergence to the global optimum despite nonconvexity. In this regularized setting, the sample complexity of our nonlinear reconstruction problem is on the order of m0, the number of measurements required for linear compressive sensing. In other words, the number of measurements required for regularized nonlinear CT reconstruction from raw measurements is within a constant factor of the number of measurements needed for the same reconstruction from linearized measurements. This is the optimal sample complexity for this nonlinear reconstruction task. For instance for an s sparse signal for which m0s log(n/s), the above theorem states that on the order of s log(n/s) nonlinear CT measurements suffices for our direct gradient-based approach to succeed. Finally, we would like emphasize that the above result is rather general as it applies to any type structure in the signal and can also deal with any convex regularizer.

5. Experiments

We support our theoretical analysis with experimental evidence that gradient-based optimization through the nonlinear CT forward model is effective for a wide range of signal densities, including signals that are dense enough that the same optimization procedure through the linearized forward model produces noticeable “metal artifacts.”

All of our experiments are based on the JAX implementation of Plenoxels (Sara Fridovich-Keil and Alex Yu et al., 2022), with a dense 3D grid of optimizable density values connected by trilinear interpolation. We use a cone-beam CT setup and optimize with mild total variation regularization. Our experiments do not focus on speed or measurement sparsity, though we fully expect our optimization objective to pair naturally with efficient ray sampling implementations and regularizers of choice.

5.1. Synthetic Data

Our synthetic experiments use a ground truth volume defined by the standard Shepp-Logan phantom (Shepp & Logan, 1974) in 3D, with the following modifications:

  • We scale down the voxel density values by a factor of 4, to more closely mimic the values in our real cone-beam CT skull dataset.

  • We adjust one of the ellipsoids to be slightly larger than standard (to make it more visible), and gradually increase its ground truth density to simulate a spectrum from soft tissue to bone to metal.

We simulate CT observations of this synthetic volume and then reconstruct using either the linearized forward model with the logarithm and eq. (1.1), or directly using eq. (1.2). We also use a small amount of total variation regularization, and constrain results to be nonnegative.

Results of this synthetic experiment are presented in Figure 1. As the density of the test ellipsoid increases, the linearized reconstruction experiences increasingly severe “metal artifacts,” while the nonlinear reconstruction continues to closely match the ground truth. PSNR values are reported over the entire reconstructed volume compared to the ground truth, where PSNR is defined as −10 log10 (MSE) and MSE is the mean squared voxel-wise error.

Figure 1:

Figure 1:

Synthetic experiments using the Shepp-Logan phantom, showing a slice through the reconstructed 3D volume. From top to bottom, we increase the density of the central test ellipsoid to simulate soft tissue, bone, and metal. Nonlinear reconstruction is robust even to dense “metal” elements of the target signal.

Note that this synthetic experiment does not include any measurement noise or miscalibration; the instability of the logarithm with respect to dense signals arises even when the only noise is due to numerical precision. We also note that even the densest synthetic “metal” ellipsoid we test is no denser than what we observe in our real CBCT skull dataset in Figure 2, with a real metal dental crown.

Figure 2:

Figure 2:

Real experiments using a 3D human skull phantom with a metal dental crown; here we show a cross-section of the reconstructed volume. Note the streak artifact to the left of the metal crown (annotated in red) and the X artifact below it (annotated in purple) in the reference linearized reconstruction, which are not present in the nonlinear reconstruction.

5.2. Real Data

Our real data experiment uses a cone-beam CT phantom made from a human skull with metal dental crowns on some of the teeth. In Figure 2 we show slices of our nonlinear reconstruction compared to a reference commercial linearized reconstruction, as no ground truth is available for the real volume. The nonlinear reconstruction exhibits reduced metal-induced streak artifacts compared to the commercial reconstruction, highlighted in red and purple in the leftmost panel.

6. Related Work

Tomographic reconstruction.

The measurement model in eq. (1.2) is a discretized corollary of the Beer-Lambert Law that governs the attenuation of light as it passes through absorptive media. Inverting the exponential nonlinearity in this model recovers the Radon transform summarized by eq. (1.1), in which measurements are linear projections of the signal at chosen measurement angles. The Radon transform has a closed-form inverse transform, Filtered Back Projection (Radon, 1917; Kak & Slaney, 2001; Natterer, 2001), that leverages the Fourier slice theorem (Bracewell, 1990; Kak & Slaney, 2001). Filtered Back Projection is a well-understood algorithm that can be computed efficiently, and is a standard option in commercial CT scanners, but its reconstruction quality can suffer in the presence of either limited measurement angles or metal (highly absorptive) signal components (Fu et al., 2016).

Many methods exist to improve the quality of CT reconstruction in the limited-measurement regime, such as limited baseline tomography, which is of clinical interest because not all viewpoints may be accessible and every measurement angle requires exposing the patient to ionizing X-ray radiation. These methods typically involve augmenting the data-fidelity loss function with a regularization term that describes some prior knowledge of the signal to be reconstructed. Such priors include sparsity (implemented through an 1 norm) in a chosen basis, such as wavelets (Foucart & Rauhuti, 2013; Chambolle et al., 1998), as well as gradient sparsity (implemented through total variation regularization) (Candes et al., 2006). Compressive sensing theory guarantees correct recovery with fewer measurements in these settings, as long as the true signal is well-described by the chosen prior (Foucart & Rauhuti, 2013). CT reconstruction with priors cannot be solved in closed form, but as long as the regularization is convex we are guaranteed that iterative optimization methods such as gradient descent, ISTA (Chambolle et al., 1998), and FISTA (Beck & Teboulle, 2009) will be successful.

Recently, reconstruction with even fewer measurements has been proposed by leveraging deep learning, through either neural scene representation (Rückert et al., 2022) or data-driven priors (Szczykutowicz et al., 2022). These methods may sacrifice convexity, and theoretical guarantees, in favor of more flexible and adaptive regularization that empirically reduces reconstruction artifacts in the limited-measurement regime. However, these methods are still based on the linear measurement model of eq. (1.1), making them susceptible to reconstruction artifacts near highly absorptive metal components. In some cases neural methods may reduce metal artifacts compared to traditional algorithms, but this reduction is achieved by leveraging strong and adaptive prior knowledge rather than the measurements of the present signal.

We propose to resolve these metal artifacts by reconstructing from raw nonlinear X-ray absorption measurements, rather than the preprocessed, linearized measurements produced by standard CT scanners. Our method may pair particularly well with new photon-counting CT scanners (Shikhaliev et al., 2005), which were approved by the FDA in 2021 (Food & Administration, 2021). These scanners measure raw X-ray photon counts, which should enable finer-grained noise modeling and correction as well as our method for principled reconstruction of signals with metal.

Signal reconstruction from nonlinear measurements.

There are a growing number of papers focused on reconstructing a signal from nonlinear measurements or single index models. Early papers on this topic focus on phase retrieval and ReLU nonlinearities (Oymak et al., 2018; Soltanolkotabi, 2017; Candes et al., 2015) and approximate reconstruction (Oymak & Soltanolkotabi, 2017a). These papers do not handle the compressive sensing/structured signal reconstruction setting. The paper (Soltanolkotabi, 2019a) deals with reconstruction from structured signals for intensity and absolute value nonlinearties but only achieves the optimal sample complexity locally. A more recent paper (Mei et al., 2018) deals with a variety of nonlinearities with bounded derivative activations. However, this paper does not handle non-differentiable activations and only deals with simple structured signals such as sparse ones. In contrast to the above paper, our activation is non-differentiable, we handle arbitrary structures in the signal, and our results apply for any convex regularizer.

7. Discussion

In this paper, we consider the CT reconstruction problem from raw nonlinear measurements of the form yi=1eaiTx for a signal x and random measurement weights ai. Although this nonlinear measurement model can be easily transformed into a linear model via a logarithmic preprocessing step yˆi=ln(1yi)=aiTx, and this transformation is common practice in clinical CT reconstruction, the logarithm is numerically unstable when the measurements approach unity. This occurs frequently in practice, notably when the signal x contains metal and especially for low-dose CT scanners that reduce radiation exposure. In this setting, traditional linear reconstruction methods tend to produce “metal artifacts” such as streaks around metal implants. Reconstruction directly through the raw nonlinear measurements avoids this numerically unstable preprocessing, in exchange for solving a nonconvex nonlinear least squares objective instead of convex linear least squares.

We prove that gradient descent finds the global optimum in CT reconstruction from raw nonlinear measurements, recovering exactly the true signal x despite the nonconvex optimization. Moreover, it converges at a geometric rate, which is considered fast even for convex optimization. This nonconvex optimization requires order n measurements, where n is the dimension of the unknown signal, the same order sample complexity as if we had reconstructed through a linear forward model.

We also extend our theoretical results to the compressive sensing setting, in which prior structural knowledge of the signal x, enforced through a regularizer, allows for reconstruction with far fewer measurements than the dimension of the signal. Our results in this setting again parallel standard results from the linear reconstruction problem, even though we consider a nonlinear forward model and optimize a nonconvex formulation.

We also compare linearized and nonlinear CT reconstruction experimentally in the setting of 3D cone-beam CT, using both a synthetic 3D Shepp-Logan phantom for which we know the ground truth volume as well as a real human skull phantom with metal dental crowns. In both cases, we find that nonlinear reconstruction reduces metal artifacts compared to linearized reconstruction, whether that linearized reconstruction is done by gradient descent or a commercial algorithm.

Our work is a promising first step towards higher-quality CT reconstruction in the presence of metal components and low-dose X-rays, offering both practical and theoretical guidance for trustworthy reconstruction. Future work may extend our results both theoretically and experimentally to consider more realistic measurement noise settings such as Poisson noise, which is particularly timely given the emergence of new photon-counting CT scanners.

8. Acknowledgments

We would like to thank Claudio Landi and Giovanni Di Domenico, and their CT company SeeThrough for providing cone-beam CT measurements and their reference reconstruction of a human skull phantom. This material is based upon work supported by the National Science Foundation under award number 2303178 to SFK. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. FV is partially supported by the SUPERB REU program. MS is supported by the NIH Director’s New Innovator award # DP2LM014564, the Packard Fellowship in Science and Engineering, a Sloan Fellowship in Mathematics, an NSF CAREER award #1846369 and DARPA FastNICS program, and NSF-CIF award #2008443.

A. APPENDIX

B. Proof of Theorem 1

In the following subsections we prove Theorem 1, beginning with a proof outline.

B.1. Proof Outline

The proof consists of the following four steps.

Step I: First iteration: Welcome to the neighborhood.

In the first step, we show that as long as the number of measurements is sufficiently large (m5n), the first iteration obeys

z1x214x2 (B.1)

with probability at least 12em2ecn, for a constant c. We prove this in Appendix B.2.

Step II: Local pseudoconvexity.

In the second step, we show that our nonconvex objective function is locally strongly pseudoconvex inside this local neighborhood of eq. (B.1). Specifically, we show the correlation inequality

(z)T(zx)αzx22 (B.2)

holds with α=12e-5x2+2, with probability at least 1-4e-n as long as the number of measurements m is at least c1ec2x2x22n for constants c1 and c2.

We compute the α in Equation (B.2) in two cases, where case 1 corresponds to xT(z-x)x2z-x2-0.6 and case 2 corresponds to xT(z-x)x2z-x2<-0.6. These cases are analyzed in Appendix B.3 and Appendix B.4, respectively. Combining cases 1 and 2, we have that the lower bound on α from case 2 lower bounds the bound from case 1, as shown in Figure 3, so we use that bound in eq. (B.2). The sample complexity in eq. (B.2) is the maximum over the sample complexities of cases 1 and 2, which are mcnα2x22 and mCn(2α-α)2, respectively, for constants c and C.

Figure 3:

Figure 3:

Case 2 provides a lower bound on the expected correlation for both cases.

Step III: Smoothness.

In the third step, we show

(z)2Lzx2 (B.3)

holds with probability at least 1e12(m+n). , for a constant L. This smoothness condition is proved in Appendix (B.5).

Step IV: Completing the proof via combining Steps I-III.

Finally, we combine the first three steps into a complete proof. At the core of the proof is the following lower bound on the correlation between the loss gradient and the error vector

(z)T(zx)Azx22+B(z)22 (B.4)

for positive constants A and B. This starting point is similar to the proof of Lemma 7.10 in Candes et al. (2015), with some modifications necessary for the nonlinear CT reconstruction problem. To prove eq. (B.4) we first combine eq. (B.2) and eq. (B.3) to conclude that

(z)T(zx)α2zx22+α2zx22α2zx22+α2L2(z)22.

Thus eq. (B.4) holds with A=α2 and B=C for a new constant C, with probability at least 14ene12(m+n)2em2ecn, for a constant c (by a union bound over the first three steps of the proof). Using eq. (B.4) with adequate choice of stepsize suffices to prove geometric convergence.

zt+1x22=ztμt+1(zt)x22=(a)ztx222μt+1(zt)T(ztx)+μt+12(zt)22(b)(12μt+1A)ztx22+μt+1(μt+12B)(zt)22(c)(12μt+1A)ztx22.

In (a) we expand the square. In (b) we apply eq. (B.4). In (c) we choose μt+1(0,2B], making the second term negative. Applying this relation inductively over T steps of gradient descent yields geometric convergence with rate 12μA, provided that the step size μt is less than 2B for t>1.

The number of measurements m required in theorem 1 is the maximum over the number of samples required for each of the first two steps of the proof. For the first step, m5n measurements are sufficient to reach our neighborhood of radius 14x2. For case 1 in the second step, mc1n¯nα2x22 measurements are sufficient for correlation concentration. For case 2 in the second step, mc2n(2αα)2 measurements are sufficient for concentration. Maximizing over these bounds yields the result in theorem 1 (note that the constants change during the maximization).

B.2. First Step: Welcome to the Neighborhood

We first consider what happens in expectation when we take our first gradient step starting from an initialization at z0=0. The expectation is over the randomness in the Gaussian measurement vector a. We have

Ea[z1]=0nμ1Ea[(z;a)|z=0]=μ1Ea[af(0)(f(0)y)]=(a)μ1Ea[12ay]=(b)μ12Ea[a(1exp((aTx)+))]=(c)μ12Ea[a exp((aTx)+))]=(d)μ12Ea[xxTx22a exp((aTx)+))]=(e)μ12xx2Eg[g exp(g+x2))]=(f)μ14 exp (x222)erfc(x22)x =(g)x.

In (a) we evaluate f(0)=0 and f(0)=12, where for the latter we use 12 as the sub-differential even though f is nondifferentiable at 0 due to the non-differentiability of ReLU (this choice is also justified as it is the expected gradient of f around a small random initialization around 0). In (b) we plug in the value of the measurement y. In (c) we use linearity of expectation and evaluate the first term, Ea[a]=0n. In (d) we separate the leading a into components parallel and orthogonal to x, and evaluate the expectation of the orthogonal term to zero. In (e) we replace aTx with gx2 for a scalar Gaussian g, as these have the same distribution. In (f) we evaluate the remaining expectation. In (g) we choose μ1=4exp(x222)1erfc(x22) so that in expectation, our first step exactly recovers the signal x.

Because in practice we do not have access to an expectation over infinite measurements, we also care about the concentration of this first gradient step. This first-step concentration determines the local neighborhood around the signal x that our gradient descent will operate within for the remaining iterations.

μ1(z=0)=μ12mi=1mai(1exp((aiTx)+))=(a)μ12mmi=1xxTx22ai(1exp((aiTx)+))+μ12mmi=1(⫿xxTx22)ai(1exp((aiTx)+))=(b)μ12mi=1mgi(1exp((gi)+x2))xx2+μ12mi=1m(⫿xxTx22)ai(1exp((gi)+x2))=(c)μ12mmi=1gi(1exp((gi)+x2))xx2+μ12mmi=1(1exp((gi)+x2))2a=(d)μ12mmi=1gˆixx2+μ12mmi=1g˜ia.

In (a) we separate ai into components parallel and orthogonal to x. In (b) we replace aiTx with x2gi for a standard scalar Gaussian gi, as these have the same distribution. In (c) we simplify the second term by rewriting it with a standard Gaussian vector a with n1 free dimensions (constrained to be orthogonal to x), and a independent of gi for all i. In (d) we write gˆigi(1exp((gi)+x2)) and g˜i(1exp((gi)+x2))2.

Note that the first term has expectation μ14 exp (x222) erfc (x22)x=x computed above, and the second term has expectation zero because g˜i and a are independent, and a is mean zero. The first term is aligned with the signal x but with a scaling factor μ2mx2i=1mgˆi; we can bound the deviation of this scaling from its mean using Hoeffding’s concentration bound for sums of sub-Gaussian random variables. We use the definition of sub-Gaussianity provided by Definition 2.2 in Wainwright (2019), for which gˆi has sub-Gaussian parameter 1. We omit the leading constant μ2mx2, and apply the Hoeffding bound as presented in Proposition 2.5 in Wainwright (2019) to conclude that

|i=1m(gˆiE[gˆ])|s

with probability at least 12es22m.

The second term is a nonnegative scaling μ2mi=1mg˜i times a standard n-dimensional Gaussian vector a with n1 degrees of freedom (because it is orthogonal to x). Since g˜i is bounded in [0, 1], we can upper bound this scaling by μ2m. Then the norm of the random vector a can also be bounded using Exercise 5.2.4 in Vershynin (2018), to conclude that

a2E[a22]+t=(a)n1+t(b)(1+t)n

with probability at least 1ect2n for a constant c, where in (a) we evaluate the expectation of a Gaussian norm with n1 degrees of freedom and in (b) we upper bound n1 by n and do a change of variables to replace t with tn. Putting these together with a union bound, we have that

z1E[z1]2=z1x2μ12m|i=1m(gˆiE[gˆ])|+μ12ma2μ12(sm+(1+t)nm)=2 exp(x222)1erfc(x22)(sm+(1+t)nm)=(a)2 exp (x222)1erfc(x22)(s+(1+t)nm)

with probability at least 12es2m2ect2n, where in (a) we change variables and replace s with sm, and c is a constant. If we choose s=t=1, this simplifies to

z1x22 exp (x222)1erfc (x22)(1+2nm)

with probability at least 12em2ecn, for a constant c. For our first step to lie within a distance of 14x2, we need the number of measurements to satisfy

mn(x216 exp (x222) erfc (x22)12)2.

The denominator in this expression is lower bounded by 0.2 for all x2, so we can also guarantee the first step concentration to this neighborhood using m5n measurements.

B.3. Correlation Concentration: Case 1

Consider the correlation

(z)T(zx)=1m1{aiTz0}eaiTz(e(aiTx)+eaiTz)(aiTh)

where hzx. Now note that if we have aiTx0 and aiTh0 it implies aiTx+aiTh=aiTz0. Thus we have

1{aiTz0}1{aiTx0}1{aiTh0}.

Using the above we can conclude that

(z)T(zx)(a)1mmi=11{aiTx0}1{aiTh0}eaiTz(eaiTxeaiTz)(aiTh)=(b)1mmi=11{aiTx0}1{aiTh0}e2aiTxeaiTh(1eaiTh)(aiTh).

In (a) we plug in the indicator inequality, and accordingly remove the now-superfluous ReLU. In (b) we use the h notation, and regroup terms. To continue, we divide both sides by h22 and use the notation hˆ=hh2.

1h22(z)T(zx)1mi=1m1{aiTx0}1{aiThˆ0}e2aiTxeh2aiThˆ(1eh2aiThˆ)h2aiThˆ.

To continue note that the function

g(x,s)=esx(1esx)s

has non-positive derivative as

gs=e2sx(2sxesx(sx+1)+1)s20

for all values of s and x. This implies that g(x,s) is a non-increasing function of s. Thus, we can conclude that

1h22(z)T(zx)1rmi=1m1{aiTx0}1{aiThˆ0}e2aiTxeraiThˆ(1eraiThˆ)aiThˆ.

Thus we can focus on lower bounding

1rmi=1m1{aiTx0}1{aiThˆ0}e2aiTxer(aiThˆ)+(1er(aiThˆ)+)(aiThˆ)+

over the set

{hˆSn1:xThˆx2ρ},

where we reintroduce superfluous ReLUs around aiThˆ as it will be convenient in the next steps. To continue, note that the function f(h)=erh(1erh)h has derivative

f(h)=e2rh(1erh(rh1)+2rh).

It is easy to verify numerically that for h0 this gradient is maximized around hmax0.402673r, with maximum value f(hmax)0.312334. Thus, for all h0

f(h)13.

As a result the function g(h)=eh+(1eh+)h+ is a 13-Lipschitz function of h. Thus for any h11d and h21d we have

|e(aiTh2)+(1e(aiTh2)+)(aiTh2)+e(aiTh1)+(1e(aiTh1)+)(aiTh1)+|13|aiT(h2h1)|.

We now define the random variable 𝒳i(hˆ)=1{aiTx0}e2aiTxe(aiThˆ)+(1e(aiThˆ)+)(aiThˆ)+ and note that the Lipschitzness of g implies that for any h1,h21d we have

|𝒳i(h2)𝒳i(h1)|=1{aiTx0}e2aiTx|e(aiTh2)+(1e(aiTh2)+)(aiTh2)+e(aiTh1)+(1e(aiTh1)+)(aiTh1)+|13|aiT(h2h1)|.

Since aiT(h2h1) is a sub-Gaussian random variable with sub-Gaussian norm on the order of h2h12, we have that

𝒳i(h2)𝒳i(h1)ψ2c2h2h12

for some constant c. We use c to denote any universal constant; note that this constant may vary between different lines. Thus using the centering rule for sub-Gaussian random variables, the centered processes 𝒳¯i(h)𝒳i(h)E[𝒳i(h)] obey

𝒳¯i(h2)𝒳¯i(h1)ψ2ch2h12.

Using the rotational invariance property of sub-Gaussian random variables, this implies that the stochastic process

𝒳(h)1mi=1m𝒳i(h)E[𝒳i(h)]

has sub-Gaussian increments. That is,

𝒳(h2)𝒳(h1)ψ2cmh2h12.

Thus using Exercise 8.6.5 of Vershynin (2018) we can conclude that

suphˆ2=1|𝒳(hˆ)|rcmr(n+u)

holds with probability at least 1-2e-u2. Thus, we conclude that for all h obeying h2r we have

1h22(z)T(zx)1rE[1{aTx0}1{aThˆ0}e2aTxeraiThˆ(1eraiThˆ)aiThˆ]crnm

with probability at least 1-2e-n, for a constant c.

We can estimate and lower bound the expectation above using a numerical average over many (50000) two-dimensional Gaussian samples, with the two dimensions corresponding to aThˆ and aTxx2, minimizing over all correlations between x and h at least ρ (i.e. all correlations in this case). We arrive at the following lower bound, for r=14x2 and ρ=-0.6.

1rE[1{aTx0}1{aThˆ0}e2aTxeraiThˆ(1eraiThˆ)aiThˆ]e10x2+7.

This bound is illustrated in a “proof by picture” in Figure 4.

Figure 4:

Figure 4:

Lower bound on the expected correlation in case 1.

B.4. Correlation Concentration: Case 2

In this case we will focus on controlling the correlation inequality in the region where

{h1n:h2r  and  xThx2h2ρ}.

Consider the correlation

(z)T(zx)=1mi=1m1{aiTz0}eaiTz(e(aiTx)+eaiTz)(aiTh) (a)1mi=1m1{aiTx0}1{aiTz0}eaiTz(eaiTxeaiTz)(aiTh) =(b)1mi=1m1{aiTx0}1{aiThaiTx}e2aiTxeaiTh(1eaiTh)(aiTh)(c)1mi=1m1{aiTx0}1{0aiThaiTx}e2aiTxeaiTh(1eaiTh)(aiTh).

In (a) we provide a lower bound by introducing an additional indicator function on aiTx, which allows us to remove the ReLU. In (b) we use the notation hz-x, and combine terms. In (c) we again provide a lower bound by adding an indicator to restrict aiTh0. By flipping the sign of h we can alternatively lower bound

1mi=1m1{aiTx0}1{0aiThaiTx}e2aiTxeaiTh(eaiTh1)(aiTh)

over the set

{h1n:h2r  and  xThx2h2ρ}.

To this aim note that for s0 we have es1 and es-1ss2. Thus,

(z)T(zx)1mi=1m1{aiTx0}1{0aiThaiTx}e2aiTx(aiTh)2.

To continue, we introduce the notation hˆ=hh2, and divide both sides by h22. Thus we have

1h22(z)T(zx)1mmi=11{aiTx0}1{0aiThˆaiTx/h2}e2aiTx(aiThˆ)21mmi=11{aiTx0}1{0aiThˆaiTxre2aiTx(aiThˆ)2.

Thus it suffices to lower bound

1mi=1m1{aiTx0}1{0aiThˆaiTxr}e2aiTx(aiThˆ)2

over

{hˆSn1:xThˆx2ρ}.

To continue using Jensen’s inequality we have

1h22(z)T(zx)(1mmi=11{aiTx0}1{0aiThˆaiTxr}eaiTxaiThˆ)2(1mmi=11{aiTx0}𝒮(aiThˆ;aiTxr)eaiTx)2

where we have defined the function

𝒮(v;w)={0v<0v0vw2wvw2vw0vw

which is a 1-Lipschitz function of v. We now define the random variable 𝒳i(hˆ)=1aiTx0𝒮aiThˆ;aiTxre-aiTx and note that the Lipschitzness of 𝒮 implies that for any h1,h2Rd we have

|𝒳i(h2)𝒳i(h1)||aiT(h2h1)|

Since aiTh2-h1 is a sub-Gaussian random variable with sub-Gaussian norm on the order of h2-h12, we have that

𝒳i(h2)𝒳i(h1)ψ2c2h2h12

for some constant c. We use c to denote any universal constant; note that this constant may vary between different lines. Using the centering rule for sub-Gaussian random variables, the centered processes 𝒳-i(h)𝒳i(h)-E𝒳i(h) obey

𝒳¯i(h2)𝒳¯i(h1)ψ2ch2h12.

Using the rotational invariance property of sub-Gaussian random variables, this implies that the stochastic process

𝒳(h)1mi=1m𝒳i(h)E[𝒳i(h)]

has sub-Gaussian increments. That is,

𝒳(h2)𝒳(h1)ψ2cmh2h12.

Thus using Exercise 8.6.5 of Vershynin (2018) we can conclude that

suphˆ2=1,cos1(xThˆx2)δ|𝒳(hˆ)|cm(nen cos2(δ)2+u)

holds with probability at least 1-2e-u2. In the last line we used the fact that the surface area of a spherical cap with distance at least ϵ away from the center is bounded by e-nϵ22. By using u=n, this implies that

1h22(z)T(zx)(E[1{aTx0}𝒮(aThˆ;aiTxr)eaTx]cnm)2

holds with probability at least 1-2e-n, for a constant c.

We can estimate and lower bound the expectation above using a numerical average over many (50000) two-dimensional Gaussian samples, with the two dimensions corresponding to aThˆ and aTxx2, minimizing over all correlations between x and h at most ρ (i.e. all correlations in this case). We arrive at the following lower bound, for r=14x2 and ρ=-0.6.

E[1{aTx0}𝒮(aThˆ;aiTxr)eaTx]2e(5x2+2).

This bound is illustrated in a “proof by picture” in Figure 5

Figure 5:

Figure 5:

Lower bound on the expected correlation in case 2.

B.5. Bounding the Gradient Norm

Consider the gradient and note that

(z)2=supuSn11mi=1m1{aiTz0}e(aiTz)+(e(aiTx)+e(aiTz)+)(aiTu),

where Sn-1 denotes the set of all real n-dimensional unit-norm vectors. To continue, we use the Cauchy-Schwarz inequality:

(z)21mi=1m1{aiTz0}e2(aiTz)+(e(aiTx)+e(aiTz)+)2supuSn11mi=1m(aiTu)2=1mi=1m1{aiTz0}e2(aiTz)+(e(aiTx)+e(aiTz)+)2Am1mi=1m(e(aiTx)+e(aiTz)+)2Am(a)1mi=1m(aiT(zx))2AmA2mzx2,

where A is the operator norm of a matrix comprised by stacking the vectors ai, and in (a) we used the fact that the function f(z)=e-(z)+ is 1-Lipschitz. Finally, using the fact that A2(m+n) with probability at least 1-e-0.5(m+n), we conclude that

(z)28(1+nm)zx2Lzx2

with probability at least 1-e-0.5(m+n), where L is a constant since we have the number of measurements at least a constant times the number of unknowns. This completes the proof of smoothness of the gradient towards the global optimum.

C. Proof of Theorem 2

The general strategy of the proof is similar to Theorem 1 but requires delicate modifications in each step. Concretely, we have the following four steps.

Step I: First iteration: Welcome to the neighborhood.

In the first step we show that the first iteration obeys

z1x214x2

with high probability as long as mcm0. We prove this in subsection C.1.

Step II: Local pseudoconvexity.

In this step we prove that the loss function is locally strongly pseudoconvex. Specifically we show that for all z𝒦 that also belong to the local neighborhoood 𝒩(x)zRn:z-x214x2 we have

(z),zxαzx22

with high probability as long as

mc1ec2x2x22m0.

Here, the value of α is the same as in Theorem 1 (see Appendix B.1). We prove this in subsection C.2 by again considering two cases.

Step III: Local smoothness.

We also use the fact that the loss function is locally smooth, that is,

(z)28(1+nm)zx2Lzx2

holds with probability at least 1e0.5(m+n).per Section B.5.

Step IV: Completing the proof via combining steps I-III.

In this step we show how to combine the previous steps to complete the proof of the theorem. First, note that by the first step the first iteration will belong to the local neighborhood 𝒩(x) and thus belongs to the set 𝒦𝒩(x). Next, note that

zt+1x2=𝒫𝒦(ztμt+1(zt))x2=𝒫𝒫(htμt+1(zt))2𝒫𝒞(htμt+1(zt))2htμt+1(zt)2.

Squaring both sides and using the local psudoconvexity inequality from Step II we conclude that

zt+1x22htμt+1(zt)22=ht222μt+1ht,(zt)+μt+12(zt)22ht222μt+1αht22+μt+12(zt)22.

Next we use the local smoothness from Step III to conclude that

zt+1x22ht222μt+1αht22+μt+12(zt)22ht222μt+1αht22+μt+12L2ht22NEW=(1μt+1α)ht22=(1μt+1α)ztx22,

where in the last line we used the fact that μt+1αL2, completing the proof.

C.1. Proof of Step I

The beginning of this proof is the same as the unregularized version where we note that

z1x2=𝒫𝒦(μ12mi=1mgˆixx2+μ12mi=1mg˜ia)x2=𝒫𝒫(μ12mi=1mgˆixx2+μ12mi=1mg˜iax)2=𝒫𝒫(μ12mi=1m(gˆiE[gˆ])xx2+μ12mi=1mg˜ia)2𝒫𝒞(μ12mi=1m(gˆiE[gˆ])xx2+μ12mi=1mg˜ia)2supv𝒞Sn1vT(μ12mi=1m(gˆiE[gˆ])xx2+μ12mi=1mg˜ia)μ12|1mi=1m(gˆiE[gˆ])|+μ12msupv𝒞Sn1vTa.

Now similar to the unregularized case we have

μ12m|mi=1(gˆiE[gˆ])|+μ12msupv𝒞Sn1vTaμ12(sm+(1+t)m0m)=2 exp (x222)1erfc (x22)(sm+(1+t)m0m)=(a)2 exp (x222)1erfc (x22)(s+(1+t)m0m)

with probability at least 1-2e-s2m2-e-ct2m0, where in (a) we change variables and replace s with sm, and c is a constant. If we choose s=t=1, this simplifies to

z1x22 exp (x222)1erfc (x22)(1+2m0m)

with probability at least 1-2e-m2-e-cm0, for a constant c. For our first step to lie within a distance of 14x2, we need the number of measurements to satisfy

mm0(x216 exp (x222) erfc (x22)12)2.

The denominator in this expression is lower bounded by 0.2 for all x2, so we can also guarantee the first step concentration to this neighborhood using m5m0 measurements.

C.2. Proof of Step II

The proof of this step is virtually identical to that of the unregularized case. The only difference is that when we apply Exercise 8.6.5 of Vershynin (2018) n is replaced with m0 (indeed this exercise is stated with m0). As a result in the two cases we conclude

Case I: In this case using the above yields

1h22(z)T(zx)1rE[1{aTx0}1{aThˆ0}e2aTxeraiThˆ(1eraiThˆ)aiThˆ]crm0m

holds with probability at least 1-2e-m0, for a constant c. Case II: In this case using the above yields

1h22(z)T(zx)(E[1{aTx0}𝒮(aThˆ;aiTxr)eaTx]cm0m)2

holds with probability at least 1-2e-m0, for a constant c.

Thus the remainder of the proof is identical and the only needed change in this entire step is to replace n with m0.

Footnotes

1

We would like to note that m0 only approximately characterizes the minimum number of samples required. A more precise characterization is ϕ-1ω2𝒞(x)nω2𝒞(x)n where ϕ(t)=2Γt+12Γt2t. However, since our results have unspecified constants we avoid this more accurate characterization.

References

  1. Beck Amir and Teboulle Marc. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009. [Google Scholar]
  2. Bracewell R. N.. Numerical Transforms. Science, 248(4956):697–704, May 1990. doi: 10.1126/science.248.4956.697. [DOI] [PubMed] [Google Scholar]
  3. Candes E.J., Romberg J., and Tao T.. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509, 2006. doi: 10.1109/TIT.2005.862083. [DOI] [Google Scholar]
  4. Candes Emmanuel J, Li Xiaodong, and Soltanolkotabi Mahdi. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015. [Google Scholar]
  5. Chambolle Antonin, De Vore Ronald A, Lee Nam-Yong, and Lucier Bradley J. Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Transactions on image processing, 7(3):319–335, 1998. [DOI] [PubMed] [Google Scholar]
  6. U.S. Food and Drug Administration. FDA Clears First Major Imaging Device Advancement for Computed Tomography in Nearly a Decade, 2021. URL https://www.fda.gov/news-events/press-announcements/fda-clears-first-major-imaging-device-advancement-computed-tomography-nearly-decade.
  7. U.S. Food and Drug Administration. Computed Tomography (CT), 2023. URL https://www.fda.gov/radiation-emitting-products/medical-x-ray-imaging/computed-tomography-ct.
  8. Foucart Simon and Rauhuti Holger. A Mathematical Introduction to Compressive Sensing. Birkhauser, 2013. [Google Scholar]
  9. Fu Lin, Lee Tzu-Cheng, Kim Soo Mee, Alessio Adam M, Kinahan Paul E, Chang Zhiqian, Sauer Ken, Kalra Mannudeep K, and De Man Bruno. Comparison between pre-log and post-log statistical models in ultra-low-dose ct reconstruction. IEEE transactions on medical imaging, 36(3):707–720, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kak Avinash C and Slaney Malcolm. Principles of computerized tomographic imaging. SIAM, 2001. [Google Scholar]
  11. Mei Song, Bai Yu, and Montanari Andrea. The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747–2774, 2018. [Google Scholar]
  12. Natterer Frank. The Mathematics of Computerized Tomography. Society for Industrial and Applied Mathematics, 2001. [Google Scholar]
  13. Oymak Samet and Soltanolkotabi Mahdi. Fast and reliable parameter estimation from nonlinear observations. SIAM Journal on Optimization, 27(4):2276–2300, 2017a. doi: 10.1137/17M1113874. URL 10.1137/17M1113874. [DOI] [Google Scholar]
  14. Oymak Samet and Soltanolkotabi Mahdi. Fast and reliable parameter estimation from nonlinear observations. SIAM Journal on Optimization, 27(4):2276–2300, 2017b. [Google Scholar]
  15. Oymak Samet, Recht Benjamin, and Soltanolkotabi Mahdi. Sharp time-data tradeoffs for linear inverse problems. IEEE Transactions on Information Theory, 64(6):4129–4158, 2017. [Google Scholar]
  16. Oymak Samet, Recht Benjamin, and Soltanolkotabi Mahdi. Isometric sketching of any set via the Restricted Isometry Property. Information and Inference: A Journal of the IMA, 7(4):707–726, 03 2018. ISSN 2049–8764. doi: 10.1093/imaiai/iax019. URL 10.1093/imaiai/iax019. [DOI] [Google Scholar]
  17. Radon Johann. Uber die bestimmung von funktionen durch ihre integralwerte langs gewissez mannigfaltigheiten, ber. Verh. Sachs. Akad. Wiss. Leipzig, Math Phys Klass, 69, 1917. [Google Scholar]
  18. Rückert Darius, Wang Yuanhao, Li Rui, Idoughi Ramzi, and Heidrich Wolfgang. Neat: Neural adaptive tomography. ACM Trans. Graph., 41(4), jul 2022. ISSN 0730–0301. doi: 10.1145/3528223.3530121. URL 10.1145/3528223.3530121 [DOI] [Google Scholar]
  19. Fridovich-Keil Sara and Yu Alex, Tancik Matthew, Chen Qinhong, Recht Benjamin, and Kanazawa Angjoo. Plenoxels: Radiance fields without neural networks. In CVPR, 2022. [Google Scholar]
  20. Shepp Lawrence A and Logan Benjamin F. The fourier reconstruction of a head section. IEEE Transactions on nuclear science, 21(3):21–43, 1974. [Google Scholar]
  21. Polad M Shikhaliev Tong Xu, and Molloi Sabee. Photon counting computed tomography: concept and initial results. Medical physics, 32(2):427–436, 2005. [DOI] [PubMed] [Google Scholar]
  22. Soltanolkotabi Mahdi. Learning relus via gradient descent. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 2004?2014, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964. [Google Scholar]
  23. Soltanolkotabi Mahdi. Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. IEEE Transactions on Information Theory, 65(4):2374–2400, 2019a. doi: 10.1109/TIT.2019.2891653. [DOI] [Google Scholar]
  24. Soltanolkotabi Mahdi. Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. IEEE Transactions on Information Theory, 65(4):2374–2400, 2019b. [Google Scholar]
  25. Szczykutowicz Timothy P, Toia Giuseppe V, Dhanantwari Amar, and Nett Brian. A review of deep learning ct reconstruction: concepts, limitations, and promise in clinical practice. Current Radiology Reports, 10(9):101–115, 2022. [Google Scholar]
  26. Vershynin Roman. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. [Google Scholar]
  27. Wainwright Martin J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019. [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES