Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 15.
Published in final edited form as: Magn Reson Chem. 2022 Jun 20;60(11):1076–1086. doi: 10.1002/mrc.5289

Input layer regularization for magnetic resonance relaxometry biexponential parameter estimation

Michael Rozowski 1,2, Jonathan Palumbo 3, Jay Bisen 3, Chuan Bi 3, Mustapha Bouhrara 3, Wojciech Czaja 2, Richard G Spencer 3
PMCID: PMC10185331  NIHMSID: NIHMS1890167  PMID: 35593385

Abstract

Many methods have been developed for estimating the parameters of biexponential decay signals, which arise throughout magnetic resonance relaxometry (MRR) and the physical sciences. This is an intrinsically ill-posed problem so that estimates can depend strongly on noise and underlying parameter values. Regularization has proven to be a remarkably efficient procedure for providing more reliable solutions to ill-posed problems, while, more recently, neural networks have been used for parameter estimation. We re-address the problem of parameter estimation in biexponential models by introducing a novel form of neural network regularization which we call input layer regularization (ILR). Here, inputs to the neural network are composed of a biexponential decay signal augmented by signals constructed from parameters obtained from a regularized nonlinear least-squares estimate of the two decay time constants. We find that ILR results in a reduction in the error of time constant estimates on the order of 15%–50% or more, depending on the metric used and signal-to-noise level, with greater improvement seen for the time constant of the more rapidly decaying component. ILR is compatible with existing regularization techniques and should be applicable to a wide range of parameter estimation problems.

Keywords: biexponentials, deep learning, MRI, neural network, parameter estimation, regularization, relaxometry

1 ∣. INTRODUCTION

1.1 ∣. Background: Biexponential analysis

Biexponential models occur throughout magnetic resonance relaxometry and related experiments[1-5] as well as in chemistry,[6-8] biomedicine,[9-14] population dynamics,[15] and physics.[16-18]

This model is of the form:

s(t;T2,1,T2,2,c1,c2)=c1etT2,1+c2etT2,2+η(t), (1)

with s(t) indicating signal amplitude at measurement time t. In other settings, the data acquisition variable may represent an indirectly detected time, or a diffusion b value. The model is linear in the initial population fractions, c1, c2>0, but nonlinear with respect to the corresponding decay constants 0 ms <T2,1<T2,2. The function η(t) is a random process representing noise. We will use biomedical magnetic resonance relaxometry (MRR), in which T2,1 and T2,2 are transverse relaxation times, as our illustrative example[5]; the two components then represent two distinct water compartments in tissue. We further restrict our analysis to the two-parameter problem of known c1 and c2 and unknown T2,1 and T2,2. This is somewhat restrictive but suffices to illustrate the current status of our developments.

The noiseless, discretized version of Equation (1) is

G(p)=def(c1etnT2,1+c2etnT2,2)n=1N, (2)

indicating signal acquisition at times {tn}n=1N where p=(T2,1,T2,2,c1,c2) is the vector of model parameters to be estimated. We take t1=0, although transverse relaxometry acquisition generally starts at a minimum echo time t1=TE. The model G is an explicit function of the vector of model parameters; its dependence on measurement times is understood but often omitted in what follows.

Many approaches have been developed to estimate p,[19-21] with nonlinear least-squares[22] (NLLS) remaining very popular. Incorporating constraints on the model's parameters, the problem is

p=defargminpFG(p)s22=argminpF[n=1N(c1etnT2,1+c2etnT2,2sn)2]. (3)

Above, s={sn}n=1N with sn=s(tn) is a vector containing the experimental signal. Also, ap(n=1Nanp)1p for p ≥ 1 is the p-norm of an N-vector a. In NLLS, the estimate of the true parameters p is one that minimizes the sum of squared residuals rn2=(c1etnT2,1+c2etnT2,2sn)2 over a feasible set of parameters F. The choice of F permits the introduction of prior information, such as nonnegativity bounds. For MRR, we define our estimated T2,1 and T2,2 values as T2,1est and T2,2est respectively.

More recently, neural networks (NNs) have been used for parameter estimation, including for biexponential models.[23-27] Training requires a large set of biexponential curves with known parameters; this is generally not available experimentally, but may readily be simulated in the case of a known signal model with well-understood noise properties. A choice must also be made for the distribution of the noise η(t), which can be taken as Gaussian for MRR obtained in a spectroscopic mode, as in this work, or Rician or higher-order chi-squared distributions for multi-coil image acquisition. Conveniently, knowledge of reasonable parameter ranges may be readily incorporated into construction of the training data.

Estimating the parameters of a biexponential model is inherently difficult, with distinct sets of parameters yielding nearly indistinguishable decay curves; this is one hallmark of an ill-posed inverse problem for parameter estimation. This results in a high degree of instability of derived solutions with respect to even small amounts of noise; stabilization of ill-posed inverse problems is therefore a major topic of investigation.[28] An important strategy is to replace the original parameter estimation problem with a related but more stable problem; this is one definition of regularization. Available methods exhibit a trade-off between estimation variance, which is decreased by the regularization procedure, and bias, which is increased.[29]

1.2 ∣. Regularization

Tikhonov regularization is accomplished through the addition of a penalty term to the objective function that is to be minimized, so that Equation (3) is replaced by the regularized nonlinear least-squares problem (R-NLLS):

pλ=defargminpR+2(G(p)s22+λ2p22), (4)

where λ>0 is called the regularization parameter. This balances the size of the 2-norm of the solution against the objective function, avoiding a fine-structured solution that fits the noisy data well but generalizes poorly to additional data sets.[30] Qualitatively, increasing λ biases the length of the parameter vector p2 to be small. In the limit λ the penalty term dominates, and the solution satisfies pλ0. Conversely, decreasing λ diminishes the impact of the penalty term, and in the limit λ0 the solution of Equation (4) converges to the solution of (3), that is, pλp0=p. A number of methods are available to analogously regularize NNs.[31-33]

Equation (4) is a conventional expression for Tikhonov regularization of a nonlinear inverse problem. However, it can be difficult to ascribe physical significance to the penalty term; for example, in our case, there is no physical basis for limiting the size of p22=T2,1est2+T2,2est2. This is in contrast to the common case in MR in which the recovered p represents a probability distribution function, describing in effect the number of spins relaxing at a given T2 over a grid of possible values. In that case, it is physically meaningful to impose properties such as sparsity or smoothness on the solution. Limitations on solution norm or total variation may also be imposed. These considerations do not hold for the problem at hand or for many nonlinear inverse problems with essentially uncorrelated parameters, but it remains true that regularization decreases parameter instability with respect to noise at the expense of introducing additional bias into parameter estimation. In effect, here we are incorporating a penalty term for mathematical rather than for physical reasons.

Our present work involves the incorporation of two separate concepts. First, we establish a conventional NN framework for decay time constant estimation in the biexponential model.[23] Second, we provide a novel means of regularization of the NN; we include a regularized version of the biexponential data to the input layer to improve stability.

An important perspective on our method is the recognition that NNs provide a natural means for combining distinct sources of information; rather than replacing the original ill-posed problem by one that is related but more well-posed, we instead train a NN that maps a concatenated pair of time series (s,G(pλ)), respectively the noisy signal decay and a noiseless decay resulting from regularized parameter estimation, to new parameter estimates. This augments, rather than replaces, the original problem. We will show that this leads to an improvement in parameter estimation in comparison to a NN that is trained using only the noisy decay s as input.

2 ∣. METHODS

The workflow for the construction of synthetic data sets, training and validation of NNs, and testing is illustrated in Figure 1, with a more detailed description found in Section S1 of the supporting information (SI).

FIGURE 1.

FIGURE 1

Our procedure consists of three steps: data set generation, NN training and validation, and NN testing. A conventional neural network for parameter estimation takes as input the biexponential decay curve. We call such a network (ND, ND), where ND stands for “noisy decay.” Our input layer regularized network is denoted (ND, Reg)

2.1 ∣. Neural network architectures

To demonstrate ILR, we separately trained and compared the performance of two NNs with the same architectures but differing in their input, denoted X. The first, which we call (ND, ND), takes as input a concatenated vector of two identical time series X=(s,s), where s is a synthetic noisy decay (ND). The second, called (ND, Reg), takes as input a concatenated vector of two different time series X=(s,G(pλ)), where G(pλ) is generated from parameters estimated with R-NLLS, that is, Equation (4). Thus, the input layers of (ND, Reg) and (ND, ND) have the same length, 2N, identical elements in the first N slots, and different elements in the second N slots.

Both networks, (ND, ND) and (ND, Reg), were structured with an input layer of 128 nodes, four hidden layers of 32, 256, 256, and 32 nodes, and an output layer with two nodes defining the parameter estimates T2,1est and T2,2est. This architecture was based on past work[25,34] that used NNs for related problems.

We compared (ND, Reg) with (ND, ND) rather than comparing (ND, Reg) with a network, called (ND), with input layer X=s, in order to avoid comparing NN's with different input layer widths. Nevertheless, we have found that (ND, ND) performs virtually identically to (ND).

We consider (ND, ND) to be the baseline NN for parameter estimation since it directly takes noisy experimental data as input and yields an estimate of the decay constants (T2,1est,T2,2est).[23,24,26,34,35] In contrast, (ND, Reg) represents a novel NN input layer architecture, and comparison of this with (ND, ND) is the topic of this manuscript.

2.2 ∣. Training and validation

Details regarding training and validation can be found in Section S1.3 of the SI. We note that we have selected c1=0.6 and c2=0.4 throughout, although other values yielded very similar results (not shown).

2.3 ∣. Testing

We evaluated the performance of each trained NN using four distinct metrics calculated from its predictions using the pairs of input and target samples (X,y) in the testing data set T. These were: mean percent error (MPE), bias, variance, and mean squared error (MSE). Each of these provides independent insight into performance. Mean percent error is an easily-interpreted metric of average performance expressed in relative terms. Bias and variance are essential elements in the analysis of regularization techniques, providing respectively a measure of accuracy and precision, with MSE=bias2+variance. We computed two versions of each of these four metrics. The first version is a function of two variables; it depends on the pair of target decay time constants (T2,1true, T2,2true). The second version is a function of one of these variables, obtained by averaging over the other. Such averaged results are presented in the SI for MPE and MSE. The functions of both T2,1true and T2,2true are denoted with uppercase letters. We use a subscript, 1 or 2, on the function name to indicate whether it describes T2,1est or T2,2est. The precise definitions of these metrics are found in Section S1.4 of the SI.

3 ∣. RESULTS

Comparisons of (ND, Reg) with (ND, ND) will be based on the four metrics defined above. The main results can be found below, with further details presented in Section S2 of the SI.

3.1 ∣. Mean percent error performance

We wish to compare the two NNs based on their combined performance in estimating T2,1 and T2,2; accordingly, we defined a metric MPEdiff:

MPEdiff(T2,1true,T2,2true)=def[MPE1(ND,ND)(T2,1true,T2,2true)]+[MPE2(ND,ND)(T2,1true,T2,2true)][MPE1(ND,Reg)(T2,1true,T2,2true)]+[MPE2(ND,Reg)(T2,1true,T2,2true)], (5)

where MPE1 and MPE2 are defined by Equation (S4).

Figure 2 shows results for the metric defined in Equation (5) as a heat map. Blue color indicates superiority of (ND, Reg), while red indicates superiority of (ND, ND). Each pixel represents an average over Nω=20 noise realizations. Finally, note that the convention T2,2trueT2,1true leads to results being generated only for the upper left half of the parameter space defined by the heat map axes. One region of particular interest is on the left, where T2,1true values are small and hence difficult to estimate due to the rapid decay of the corresponding signal component. The diagonal region, where T2,1trueT2,2true, also represents a zone of parameters where estimation is difficult.[36]

FIGURE 2.

FIGURE 2

Heat maps defining the relative mean percent error (MPE) performance of the two NNs, according to Equation (5) with averaging over Nω=20 noise realizations, as a function of (T2,1true, T2,2true). Blue regions indicate superiority of (ND, Reg) and red the superiority of (ND, ND). Results are shown for SNR = 1 (left), where neither method performs well, and SNR 30 (center) and 100 (right), where (ND, Reg) outperforms (ND, ND) over the majority of the heat map region. The regularization parameter was fixed at λ=1.6×104.

At this stage of development, a fixed λ is used throughout all of parameter space, and so represents a compromise to achieve the overall best results over the entire heat map. With this, it is clearly seen that the blue regions dominate the red regions for moderate-to-high SNR values, with the exception of certain regions near the diagonal. This indicates the fact that tuning the regularization parameter λ to this region would be particularly important. The results for SNR = 1 indicate that neither method can perform well in that regime.

3.2 ∣. Bias-variance performance

We examine the accuracy, represented by bias, and precision, represented by variance, of (ND, Reg) and (ND, ND) as estimators of the decay constants. Smaller variance indicates less sensitivity to noise.

Figure 3 shows results for T2,1est computed via Equations (S6) and (S7), respectively. Three SNR levels are shown with a fixed λ=1.6×104. For low SNR, (ND, ND) and (ND, Reg) perform virtually identically and poorly. However, for larger SNR, (ND, Reg) is very effective at limiting bias compared with (ND, ND) across the full range of target T2,1true values. Variance, on the other hand, is similar between the two NNs. We attribute these results to the fact that while the introduction of regularization generally leads to decreased variance at the expense of increased bias,[30] in the present analysis the NNs are, in effect, trained on bias as the loss function. Comparable results were obtained for bias and variance in the estimation of T2,1true, although results were more comparable between the two NNs. See SI, Figure S5.

FIGURE 3.

FIGURE 3

Bias (b1) (left) and variance (var1) (right) in T2,1est as defined by Equations (S6) and (S7) for (ND, ND) (green) and (ND, Reg) (blue) for three levels of SNR. The regularization parameter is fixed at λ=1.6×104. The location of the maximum error is indicated with a vertical line and the value of the and mean error is indicated by a horizontal line

3.3 ∣. Mean squared error performance

3.3.1 ∣. Mean squared error as a function of one target decay constant

Figure 4 shows the comparison of (ND, Reg) and (ND, ND) in terms of average mean squared error in T2,1est and T2,2est. As was also seen in Figure S2, both methods fail at the lowest values of SNR. For larger SNR, (ND, Reg) consistently outperforms (ND, ND), with improvements in T2,1est of ~20% for SNR = 10, ~50% for SNR = 30, and ~75% for SNR = 100. More modest, but nevertheless substantial, improvements were seen for T2,2est, of ~15% for SNR = 30 and ~ 33% for SNR = 100.

FIGURE 4.

FIGURE 4

Average mean squared error (mse) for T2,1est (top) and T2,2est (bottom) defined by Equation (S9) across five SNR values from 1 to 100. The regularization parameter was fixed at λ=1.6×104. Values on the blue (for (ND, Reg)) and green (for (ND, ND)) curves (left-side ordinate) correspond to the average value of mse1 (upper panel) over all target values of T2,1true or mse2 (lower panel) over all target values of T2,2true in the testing set. Average percent improvement (api) as defined by Equation (S10) of (ND, Reg) compared with (ND, ND) is displayed on the right-side ordinate. We see that MSE of (ND, Reg) is lower than that of (ND, ND), indicating improvement through ILR. Greater improvements are seen for larger values of SNR (red curve). Overall, the mean squared error metric shows results similar to those shown in Figure S2 for average mpe

3.2.2 ∣. Heat map of the sum of mean squared error over target decay constants

Analogous to Equation (5) in Section 3.1, we define a mean squared error difference metric MSEdiff to quantify the difference between the performance of the two NNs:

MSEdiff(T2,1true,T2,2true)=def[MSE1(ND,ND)(T2,1true,T2,2true)]+[MSE2(ND,ND)(T2,1true,T2,2true)][MSE1(ND,Reg)(T2,1true,T2,2true)]+[MSE2(ND,Reg)(T2,1true,T2,2true)] (6)

where MSE1 is the mean squared error in T2,1est and MSE2 is the mean squared error in T2,2est computed via Equation (S8), and the NN used is indicated by a superscript.

Corresponding heat maps as a function of (T2,1true, T2,2true) for three values of SNR are shown in Figure 5. At low SNR, both methods perform poorly and are comparable. At more realistic SNR, (ND, Reg) clearly outperforms (ND, ND) over most of the range of targets. However, (ND, Reg) performs worse than (ND, ND) over much of the diagonal region for SNR = 100. We attribute this to the fact that λ is fixed for all target values, and this region likely is over-regularized; see SI Figure S5. Similar results were seen in Figure 2.

FIGURE 5.

FIGURE 5

Heat maps defining the relative mean squared error (MSE) performance of the two NNs, MSEdiff according to Equation (6) with averaging over Nω=20 noise realizations. Blue regions indicate superiority of (ND, Reg) and red the superiority of (ND, ND). Results are shown for SNR = 1 (left), where neither method performs well, and SNR 30 (center) and 100 (right), where (ND, Reg) outperforms (ND, ND) over the majority of the heat map region. The regularization parameter was fixed at λ=1.6×104. Note that the scale of the color bars varies between the three subplots

3.4 ∣. Application to myelin T2 mapping of the human brain

We apply ILR to pixel-wise analysis of a sequence of T2 weighted human brain images. Each pixel represents the amplitude of a decaying biexponential as a function of echo time and was processed independently with an (ND, ND) NN and an (ND, Reg) NN. The networks were trained with an SNR of 241, which is the average of the SNR over the brain parenchyma in shortest echo-time image. Noise was added pixelwise to construct an additional data set with mean SNR = 30 for comparison purposes, and additional (ND, ND) and (ND, Reg) NNs were trained at this SNR. Further acquisition and processing details are found in Section S1.5.1 and Section S1.5.2, respectively, of the SI. Results are shown in Figure 6.

FIGURE 6.

FIGURE 6

T2,1 and T2,2 maps generated pixel-wise with (ND, ND) and (ND, Reg). Values for c1 and c2 were taken as the average of the estimates from Bi et al.[37] Comparision is made between maps obtained at high (left; mean SNR = 241) and low (center; noise added to achieve mean SNR = 30) SNR. The right-hand panel compares the sensitivity to noise of the two NNs, as defined by Equation (7). As seen, the (ND, Reg) NN maintains its performance to a substantially greater extent than (ND, ND) as SNR decreases

There is no gold standard for these in vivo images, so that comparison of bias is problematic. However, we can instead compare (ND, ND) and (ND, Reg) with respect to sensitivity to noise and evaluate whether the introduction of ILR improves the stability of the parameter estimation problem. We quantify this by determining, for each of these two NN separately, the absolute difference in the images obtained with SNR = 241 and SNR = 30. A smaller difference, as measured by the difference between these two difference maps, indicates a desirable lower sensitivity to noise for (ND, Reg). The expression used is

sdiff(i,j)=def[s1(ND,Reg)(i,j)+s2(ND,Reg)(i,j)][s1(ND,ND)(i,j)+s2(ND,ND)(i,j)], (7)

with s1(i,j) indicating the noise sensitivity of an estimate of T2,1 from either (ND, Reg) or (ND, ND) as defined by Equation (S11) in Section S1.5.2 of SI, and s2(i,j) the corresponding expression for T2,2; superscripts indicate the neural network that was used. A map of this quantity is displayed in the right-hand panel of Figure 6, with the negative values seen demonstrating that (ND, Reg) is less sensitive to noise than (ND, ND) across the majority of the figure. We note the obvious difference in the mean value of T2,1 for the two SNR levels; we attribute this to the introduction of additional bias for SNR = 30 due to the nonlinear estimation implemented by the NNs,[38] with T2,2 evidently being less sensitive to this effect.

4 ∣. DISCUSSION

The applicability of NNs to parameter estimation has been long-recognized,[23] and they are now widely used for this purpose.[24,26,35,39] Applications of ILR to functional forms other than the biexponential model studied here is straightforward. Models of current interest in MR include kurtosis,[40] the stretched exponential,[4,41] and mcDESPOT.[42,43] Multiexponential models with more than two components may also be evaluated with, for example, six independent parameters estimated for a three-component model. However, we expect such an extension to present substantial problems both in terms of training over a large hypercube in parameter space and due to the extreme ill-conditioning of such functional forms. The current analysis is also applicable to components defined by T1 values, which are less-frequently encountered due to the timescale of intercompartment exchange but still of importance. The constant offset to the exponential term in this signal model does not affect noise amplification,[36] the calculation of which involves derivatives,[5] so the behavior is expected to follow closely that of the biexponential T2 model investigated here. Further, we expect ILR to be of use in areas outside of MR.[44-49]

The biexponential model incorporates four unknown parameters, namely, compartment sizes and decay time constants. More practical applications would require estimation of all of these, or a three-parameter analysis when the data can be normalized. Nevertheless, the improved estimation of the two decay constants as presented here shows the potential value of the approach, and may itself have applications.[50-53] We note that we have performed simulations for c1 values ranging from 0.05 to 0.7, finding that in all cases (ND, Reg) outperforms (ND, ND) to roughly the same extent as in the results presented in detail.

The extension of our work to the estimation of all four parameters in the biexponential model, (T2,1, T2,2, c1, c2), is straightforward. The only step that changes in an essential way is data set generation, since all four parameters would require estimation by R-NLLS. Because of the disparate dimensions and scales of the time constants and component fractions, it may be worthwhile to introduce a diagonal matrix of weights to the penalty term of R-NLLS so that the magnitude of the parameters in the penalty term contribute in a roughly equal manner. A similar adjustment to the loss function during NN training may also be implemented. Similar comments apply to parameter estimation for other models.

ILR may also be useful for classification problems. In image classification, for example, a NN is trained on and applied to vectorized versions of the input image. ILR should improve the accuracy of matching this data structure to an output class or to output classes in the presence of noise.[54]

Overfitting, viewed as creation of a NN model with weights that are finely-tuned to a particular data set, is a major potential pitfall in NN design and leads to models that are not generalizable to new data sets. The most straightforward solution to this problem is to provide the training step with more data; since this can be the most problematic aspect of NN development, a variation on this, in which the input data is re-used following transformations, may instead be implemented.[55]

Regularization represents another approach to avoid overfitting, where the goal is to penalize models that have excessive complexity. Weight decay introduces a modified loss function that incorporates a term proportional to the 2-norm of the set of weights.[31] This is similar to Tikhonov regularization of conventional parameter estimation problems in that it penalizes large weights, which are characteristic of over-fit models. The 1-norm may instead be used to promote sparsity in the weight amplitudes.[56] Weight dropout regularization is unique to NN methods; with each training cycle of forward and backward propagation, a different random subset of weights within a single or multiple layers is set to zero with a pre-established probability.[32] This reduces the influence of any particular weight, acting to ensure substantial amplitude in all weights. Early stopping is also an effective form of regularization, and was employed in the present work. Here, NN performance is evaluated at each training epoch, with overfitting being characterized by an increase in error rate over a validation set.[33] As an alternative, performance may first improve and then become stable as a function of epoch number, with this stability then defining a suitable epoch number for training.

All of the regularization methods described above have in common that they act on NN weights. ILR, in contrast, alters the input layer by augmenting the raw data with data that is, in some sense, regularized. It therefore bears some similarity to input data transformations; there is no direct effect on weights, and the weight amplitudes resulting from this procedure remains to be investigated. ILR and the other methods outlined above may be combined; a systematic study of this also remains to be explored.

We based our hyperparameter choices on previous literature[25,34] and our empirical observations. It is likely that performance can be further improved by implementing a more systematic approach to network architecture and hyperparameter optimization.

The statistical distribution of the data set chosen for training, validation, and testing a NN should correspond to that of the real-world data to which it will be applied. In the present case, with synthetic data, we can incorporate prior knowledge by setting bounds on the two estimated time constants. For myelin mapping in the brain, for example, we may take 20ms T2,150 ms and 50msT2,2500 ms, with T2,2 considered to be the transverse relaxation time of brain parenchyma. In our application, we found it to be very useful to extend the time constant values in the training set beyond those in the testing set to accommodate the NLLS recovery of values outside this a priori range from the noisy data; this improved the performance of both the (ND, ND) and (ND, Reg) NN. Results for both MPE1 and MSE1 were degraded when the target (T2,1, T2,2) pair was near the boundary of the range of time constants in the training set. This behavior is independent of the use of ILR. Finally, the spacing between the values of T2,1 and T2,2 is determined by the required precision of the estimates, with the tradeoff of the increased training time required to maintain training set density.[57]

ILR as described requires knowledge of the noise level in the signal. In the present case, the noise level may be accurately estimated from data collected from the signal tail, but more general signal forms would present a more difficult challenge. One possibility would be to greatly expand the training set to incorporate a range of plausible SNR values; the effectiveness of this approach remains to be investigated.

In settings in which a discretized probability distribution function (PDF) is sought to define model parameters,[56] addition of a regularization term serves to e.g., reduce outliers (2-norm penalty) or promote sparsity (1 penalty) or promote smoothness (derivative or total variation penalty).[58,59] However, regularization of the NLLS problem according to Equation (4) does not have an apparent physical basis. The effects noted above, relevant for a PDF, make little sense when estimating parameters that are more-or-less independent. Thus, we have introduced regularization for the mathematical goal of reducing variance rather than on physical grounds. We have not formally studied the effect of implementing alternative penalties for the construction of ILR.

There are two particularly time consuming steps in implementing ILR. The first is the creation of the training set, which requires the solution of many R-NLLS problems. While this can be mitigated by vectorizing our computations to the greatest extent possible, for example in the construction of noisy decays and the reconstruction of discrete signals from their recovered time constants during training, it is less clear how to vectorize multiple R-NLLS problems. Reasonable alternatives include the use of OpenMP for multi-threading the solution of R-NLLS problems over noise realizations.[60] In addition, a message passing interface (MPI) implementation could partition the R-NLLS problems and assign them to separate nodes on a computing cluster.

The second time-intensive task is NN training. It may be possible to accelerate this through a systematic exploration of the minimal number of training samples that provides a high-performing and generalizable network. In addition, careful selection of the grid spacing of target time constants may serve to reduce training time by avoiding unnecessary precision in the predicted values. One potential aproach would be to increase the density of samples in regions of (T2,1, T2,2)-space where NLLS problems are known to be more ill-posed while decreasing density in better-conditioned regions.

There are several advantages to using a NN to estimate time constants as compared with classical methods. For example, Prony's method[20] requires sampling times to be equally spaced, and is known to be highly sensitive to noise.[61] In contrast, NN approaches have no requirement on the spacing of acquisition times, and can perform well even with limited SNR (Table S2). For curve stripping,[62,63] an estimate is required for the time constant of the more rapidly-decaying signal, and the underlying time constants T2,1, T2,2 should be separated by at least a factor of roughly three.[63] Neither of these limitations apply to NN estimation. Bayesian methods[64] yield a joint probability distribution for recovered parameters, with highdimensional marginalization often implemented to define the PDF for a single variable. In contrast, NN analysis provides direct parameter estimates. Further important advantages of the NN approach include the fact that once a network is trained, parameter estimation is extremely fast, as well as the ability to incorporate prior knowledge through the design of training data sets.

It is well-known that the efficacy of classical methods for biexponential analysis depend on separation of the underlying time constants and SNR, so it is natural to examine the dependence of ILR in terms of these quantities. While NNs can perform well even when decay constants are not well-separated, larger separation does decrease mean percent error, and to a greater extent in (ND, Reg) as compared with (ND, ND). Performance becomes less sensitive to the difference in time constants as SNR increases to the moderate regime (cf. Figures 2 and 5); we interpret this as indicating that the variability in the estimates is already fairly low, so that the dominant effect of regularization is to increase bias (cf. the bottom left panel in Figure 3), although this increase is somewhat ameliorated through use of the (ND, Reg) network as compared with (ND, ND) (cf. Figure 3) for a reasonable choice of the regularization parameter.

In this initial implementation of ILR, a single regularization parameter λ has been used for the entire training data set. This value will not be optimal for all pairs of relaxation times; for example, less regularization may be needed for widely-separated values of T2,1 and T2,2 than for closely-spaced values. In applications, the underlying relaxation time constants will be unknown, so that this approach is reasonable. Moreover, the optimal value of λ will change from noise realization to noise realization even for given values of the time constants. Training a NN to estimate an optimal regularization parameter for each individual input signal, as has been proposed for the introduction of regularization into a NN loss function,[65] may be effective for ILR as well.

In addition to demonstrating the performance advantages of ILR on a large range of simulated data, we have also demonstrated its applicability to in vivo brain imaging. In this case, we do not have a gold standard to assess accuracy, but we were able to demonstrate the greater resistance to noise in parameter estimation with (ND, Reg) as compared with (ND, ND); see Figure 6. This desirable property is one of the hallmarks of regularization in general. Additionally, only a single average SNR value was used when training (ND, Reg). In effect, this assumes that the T2 decay from each pixel exhibits the same SNR, while in fact the SNR map yielded after NESMA filtering (see S1.5 of SI) indicates there are decays with SNRs between 200 and 800. This may be addressed by training several (ND, Reg) networks for a variety of SNRs and analyzing the decay from a given pixel using the appropriate NN. Moreover, we developed NNs with c1=0.1, the average value across the c1 map obtained in a different study.[37] This is essentially required in our current implementation, in which we estimate only the two decay constants. Ongoing work is concerned with extending our analysis to a three- or four-parameter model that includes estimates of component fractions. Nevertheless, the provided experimental results serve to illustrate the application of ILR to brain imaging.

5 ∣. CONCLUSION

We have achieved improvements in performance of 15%-50% or more, depending on the metric used and SNR, for a NN designed to estimate relaxation constants of biexponential decay by incorporating a new type of regularization, which we call input layer regularization (ILR). Our studies were performed at SNR levels typical for biomedical magnetic resonance relaxometry. This new approach can readily be expanded to a greater number of parameters and to other signal models. Further, it is compatible with conventional methods for regularizing NN. Finally, we demonstrated the application of ILR on human brain relaxometry data by mapping T2,1, attributed to the transverse relaxation time of myelin water.

Supplementary Material

Supporting Materials

ACKNOWLEDGEMENTS

This work was supported in part by the Intramural Research Program of the National Institute on Aging of the National Institutes of Health (JP, JB, CB, and RGS). Wojciech Czaja was supported in part by the NSF DMS 1738003 grant.

Funding information

National Institute on Aging, Grant/Award Number: Intramural Research Program; National Science Foundation, Grant/Award Number: DMS 1738003; NSF DMS, Grant/Award Number: 1738003

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found in the online version of the article at the publisher's website.

REFERENCES

  • [1].Bouhrara M, Reiter DA, Celik H, Bonny J, Lukas V, Fishbein KW, Spencer RG, Magn. Reson. Med 2015, 73(1), 352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Celik H, Bouhrara M, Reiter DA, Fishbein KW, Spencer RG, J. Magn. Reson 2013, 236, 134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Neil JJ, Bretthorst GL, Magn. Reson. Med 1993, 29(5), 642. [DOI] [PubMed] [Google Scholar]
  • [4].Reiter DA, Magin RL, Li W, Trujillo JJ, Velasco MP, Spencer RG, Magn. Reson. Med 2016, 76(3), 953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Spencer RG, Bi C, NMR Biomed. 2020, 33(12). [DOI] [PubMed] [Google Scholar]
  • [6].Devos O, Ghaffari M, Vitale R, de Juan A, Sliwa M, Ruckebusch C, Anal. Chem 2021, 93(37), 12504. [DOI] [PubMed] [Google Scholar]
  • [7].Niesner R, Bülent P, Schlüsche P, Gericke K, ChemPhysChem 2004, 5(8), 1141. [DOI] [PubMed] [Google Scholar]
  • [8].Sun T, Zhang ZY, Grattan KTV, Palmer AW, Rev. Sci. Instrum 1997, 68(1), 58. [Google Scholar]
  • [9].Mulkern RV, Vajapeyam S, Robertson RL, Caruso PA, Rivkin MJ, Maier SE, Magn. Reson. Imaging 2001, 19(5), 659. [DOI] [PubMed] [Google Scholar]
  • [10].Sainsbury EJ, Ashley JJ, Eur. J. Clin. Pharmacol 1986, 30(2), 243. [DOI] [PubMed] [Google Scholar]
  • [11].Shan Q, Kuang S, Zhang Y, He B, Wu J, Zhang T, Wang J, Abdom. Radiol 2020, 45(1), 90. [DOI] [PubMed] [Google Scholar]
  • [12].Sheiner LB, Beal SL, J. Pharmacokinet. Biopharm 1981, 9(5), 635. [DOI] [PubMed] [Google Scholar]
  • [13].Urso R, Blardi P, Giorgi G, Eur. Rev. Med. Pharmacol. Sci 2002, 6(2), 33. [PubMed] [Google Scholar]
  • [14].Zhang J, Suo S, Liu G, Zhang S, Zhao Z, Xu J, Wu G, Korean J. Radiol 2019, 20(5), 791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Reynolds AM, Ropert-Coudert Y, Kato A, Chiaradia A, MacIntosh AJJ, Anim. Behav 2015, 108, 67. [Google Scholar]
  • [16].He Y, Matei L, Jung HJ, McCall KM, Chen M, Stoumpos CC, Liu Z, Peters JA, Chung DY, Wessels BW, MR Wasielewski, Nat. Commun 2018, 9(1), 1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Otero J, Jensen AJ, L'Abée-Lund JH, Stenseth NC, Storvik GO, Vøllestad LA, PLoS One 2011, 6(8), e24005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Rust B, O'Leary D, Mullen K, Modelling type ia supernova light curves, Exponential Data Fitting and its Applications, Bentham Science Publishers, Oak Park, IL: 2010. [Google Scholar]
  • [19].Balasubramanyam C, Ajay MS, Spandana KR, Shetty A, Seetharamu KN, WSEAS Trans. Math 2014, 13, 406. [Google Scholar]
  • [20].de Prony BGR, J. Ec. Polytech. (Paris), 1795, 1(22), 24. [Google Scholar]
  • [21].Liu F, Kijowski R, El Fakhri G, Feng L, Magn. Reson. Med 2021, 85(6), 3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Bromage GE, Comput. Phys. Commun 1983, 30(3), 229. [Google Scholar]
  • [23].Bishop CM, Roach CM, Rev. Sci. Instrum 1992, 63(10), 4450. [Google Scholar]
  • [24].Gambhir SS, Keppenne CL, Banerjee PK, Phelps ME, Phys. Med. Biol 1998, 43(6), 1659. [DOI] [PubMed] [Google Scholar]
  • [25].Liu H, Xiang Q, Tam R, Dvorak AV, MacKay AL, Kolind SH, Traboulsee A, Vavasour IM, Li DKB, Kramer JK, Laule C, NeuroImage 2020, 210, 116551. [DOI] [PubMed] [Google Scholar]
  • [26].Parasram T, Daoud R, Xiao D, J. Magn. Reson 2021, 325, 106930. [DOI] [PubMed] [Google Scholar]
  • [27].Smith JT, Yao R, Sinsuebphon N, Rudkouskaya A, Un N, Mazurkiewicz J, Barroso M, Yan P, Intes X, Proc. Natl. Acad. Sci. U.S.A 2019, 116(48), 24019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Kabanikhin SI, Inverse and ill-posed problems: Theory and applications, De Gruyter; 2011. [Google Scholar]
  • [29].Low MG, Ann. Stat 1995, 23(3), 824. [Google Scholar]
  • [30].Aster RC, Borchers B, Clifford TH, Parameter estimation and inverse problems, Elsevier; 2019. [Google Scholar]
  • [31].Krogh A, Hertz JA, A simple weight decay can improve generalization, in Advances in neural information processing systems 1992, 950. [Google Scholar]
  • [32].Labach A, Salehinejad H, Valaee S, Survey of dropout methods for deep neural networks, arXiv:1904.13310 [cs], 2019. [Google Scholar]
  • [33].Yao Y, Rosasco L, Caponnetto A, Constr Approx 2007, 26(2), 289. [Google Scholar]
  • [34].Liu H, Tam R, Kramer JK, Laule C, Analyzing multiexponential t2 decay data using a neural network, in Proc. intl. soc. mag. reson. med 27, Montreal, Québec, Canada: 2019. [Google Scholar]
  • [35].Kaandorp MPT, Barbieri S, Klaassen R, Laarhoven HWM, Crezee H, While PT, Nederveen AJ, Gurney-Champion OJ, Magn. Reson. Med 2021, 86(4), 2250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Bi C, Fishein K, Bouhrara M, Spencer RG, Sci. Rep 2022, 12(5773). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Bi C, Ou M-Y, Bouhrara M, Spencer RG, arXiv e-prints 2021, arXiv:2102.10039. [Google Scholar]
  • [38].Box MJ, Stat JR. Soc. Series B Stat. Methodol 1971, 33(2), 171. [Google Scholar]
  • [39].Antil H, Elman HC, Onwunta A, Verma D, arXiv: 2102.03974, 2021. [Google Scholar]
  • [40].Jensen JH, Helpern JA, NMR Biomed. 2010, 23(7), 698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Magin RL, Li W, Velasco MP, Trujillo J, Reiter DA, Morgenstern A, Spencer RG, J. Magn. Reson 2011, 210(2), 184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Bouhrara M, Spencer RG, NeuroImage 2016, 127, 456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Deoni SCL, Rutt BK, Arun T, Pierpaoli C, Jones DK, Magn. Reson. Med 2008, 60(6), 1372. [DOI] [PubMed] [Google Scholar]
  • [44].Gokus T, Cognet L, Duque JG, Pasquali M, Hartschuh A, Lounis B, J. Phys. Chem. C 2010, 114(33), 14025. [Google Scholar]
  • [45].Midtgard U, Hales JR, Fawcett AA, Sejrsen P, J. Appl. Physiol 1987, 63(3), 962. [DOI] [PubMed] [Google Scholar]
  • [46].Pereyra V, Exponential data fitting and its applications, 1, Bentham; 2010. [Google Scholar]
  • [47].Rust BW, O'Leary DP, Mullen KM, in Exponential data fitting and its applications, vol. 1, Bentham; 2010, 145. [Google Scholar]
  • [48].Shkolnik AS, Karachinsky LY, Gordeev NY, Zegrya GG, Evtikhiev VP, Pellegrini S, Buller GS, Appl. Phys. Lett 2005, 86(21), 211112. [Google Scholar]
  • [49].ter Horst G, Pratt DW, Kommandeur J, J. Chem. Phys 1981, 74(6), 3616. [Google Scholar]
  • [50].Bobrovnik SA, J. Biochem. Biophys. Methods 2000, 42(1), 49. [DOI] [PubMed] [Google Scholar]
  • [51].Peeters J, Li L, J. Appl. Phys 1993, 73(5), 2477. [Google Scholar]
  • [52].Vora JP, Burch A, Peters JR, Owens DR, Diabetes Care 1992, 15(11), 1484. [DOI] [PubMed] [Google Scholar]
  • [53].Field RW, Benoist d'Azy O, Lavollée M, Lopez-Delgado R, Tramer A, J. Chem. Phys 1998, 78(6), 2838. [Google Scholar]
  • [54].Noh H, You T, Mun J, Han B, Regularizing deep neural networks by noise: Its interpretation and optimization, arXiv: 1710.05179 [cs], 2017. [Google Scholar]
  • [55].Shorten C, Khoshgoftaar TM, J. Big Data 2019, 6(1), 60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Sabett C, Hafftka A, Sexton K, Spencer RG, Concepts Magn. Reson., Part A 2017, 46A(2), e21427. [Google Scholar]
  • [57].Keogh E, Mueen A, inEncyclopedia of machine learning and data mining, Springer, US; 2017, 314. [Google Scholar]
  • [58].Emmert-Streib F, Dehmer M, Mach. Learn. Knowl. Extraction 2019, 1(1), 359. [Google Scholar]
  • [59].Hansen PC, Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion, in SIAM monographs on mathematical modeling and computation, SIAM; 1998. [Google Scholar]
  • [60].Dagum L, Menon R, IEEE Comput. Sci. Eng. 1998, 5(1), 46. [Google Scholar]
  • [61].Kahn MH, Mackisack MS, Osborne MR, Smyth GK, J. Comput. Graph. Statist 1992, 1(4), 329. [Google Scholar]
  • [62].Foss SD, Biometrics 1969, 25(3), 580. [Google Scholar]
  • [63].Kirkup L, Sutherland J, Comput. Phys 1988, 2(6), 64. [Google Scholar]
  • [64].Bretthorst LG, Hutton WC, Garbow JR, Ackerman JJH, Concepts Magn. Reson. A: Bridg. Educ. Res 2005, 27A(2), 55. [Google Scholar]
  • [65].Afkham BM, Chung J, Chung M, Inverse Probl. 2021, 37(10), 105017. 10.1088/1361-6420/ac245d [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Materials

RESOURCES