Abstract
Purpose:
To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound.
Theory and Methods:
We generalize the mean squared error loss to control the bias and variance of the NN’s estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of the resulting NNs are studied for two neuroimaging applications.
Results:
In simulations, the proposed strategy reduces the estimates’ bias throughout parameter space and achieves a variance close to the Cramér-Rao bound. In vivo, we observe good concordance between parameter maps estimated with the proposed NNs and traditional estimators, such as non-linear least-squares fitting, while state-of-the-art NNs show larger deviations.
Conclusion:
The proposed NNs have greatly reduced bias compared to those trained using the mean squared error and offer significantly improved computational efficiency over traditional estimators with comparable or better accuracy.
Keywords: quantitative MRI, neural networks, Cramér-Rao bound, parameter estimation, efficiency
1 |. INTRODUCTION
Unbiased parameter estimators are critical for achieving accurate, precise, and reproducible quantitative MRI (qMRI). An unbiased estimator that achieves the Cramér-Rao bound (CRB)—the theoretical floor for the variance of an unbiased estimator1—is referred to as (statistically) efficient.2 While the typical maximum likelihood and least squares estimators used in qMRI—e.g., dictionary matching3 and non-linear least squares—are asymptotically efficient with respect to the number of measurements (assuming zero-mean, uncorrelated, homoscedastic Gaussian noise in the case of least squares),2,4,5 they are computationally inefficient, especially when fitting high-dimensional models. Neural networks (NNs) offer significantly reduced fitting time and robustness at inference, eliminating an important barrier to the clinical adoption of qMRI methods.6–8
Regression neural networks are typically minimum mean squared error (MSE) estimators, i.e., they are trained to minimize the quadratic loss over an empirical dataset. Like other Bayesian approaches,9,10 their performance is sensitive to the prior (i.e., the training data distribution), where they can achieve a smaller MSE than an efficient estimator by trading off bias for variance. However, this is true only “on average”—i.e., for parameters close to the mean of the training distribution—and implies that a minimum-MSE NN performs well only if the in vivo data distribution is known a priori. This may lead to overly optimistic results when validating the NN’s performance in the laboratory setting, with degraded performance in a clinical setting where the qMRI parameters may change in unpredictable ways. While a certain amount of bias might be tolerable for clinical diagnostic use, this is unlikely to be true if the bias varies throughout parameter space, as is typically the case with minimum-MSE NNs. Further, bias impedes inter-method comparability. To minimize this sensitivity to the prior, in this work, we generalize the MSE loss to better control the bias in the NN’s estimates.
Though we desire statistical efficiency in the NN, efficient estimators may, in general, not exist or be impractical to find for any given qMRI application, which usually involves a nonlinear multiparametric estimation problem and constraints on parameter space.2,11 The specific biophysical model and measurement scheme employed, as well as areas of parameter space that are inherently difficult-to-estimate, all contribute to this challenge.12 In this work, we therefore soften the requirement that the NN be efficient and instead seek only to promote the associated properties during training; i.e., that the NN have minimal bias and a variance close to or below the CRB.
2 |. THEORY
2.1 |. Generalization of the MSE
We begin by writing the multi-variate MSE loss typically used in qMRI, e.g., in DRONE,8 given by
| (1) |
where is the measurement vector, is the ground-truth parameter vector, is the NN’s estimate, is the number of parameters to estimate, and is the number of samples in the training dataset .
An important limitation of Eq. (1) is that parameters with different units cannot directly be summed together. This can be addressed using a weighted loss:
| (2) |
where is a positive semi-definite weighting matrix that tunes each parameter’s contribution to the overall loss. A diagonal matrix weights each parameter individually, where weights with the inverse units of their respective parameters render the cost function dimensionless. The choice (the identity) reduces Eq. (2) back to Eq. (1).
Next, we average over different noise realizations of the measurements (we assume additive white complex Gaussian noise in this work), yielding the weighted MSE loss
| (3) |
As shown in Appendix A, this enables decomposition of the loss into the bias and variance as
| (4) |
where is the (uncorrected) sample covariance, denotes the matrix trace, and the bias is given by
| (5) |
| (6) |
Here, the bias is approximated using the sample mean, so its error w.r.t. the true bias decreases as 13
As suggested by Diskin et. al.,14 we can promote properties similar to an efficient estimator in the trained NN by penalizing the squared bias in addition to the MSE. This leads to the weighted “bias-constrained error”14 loss
| (7) |
where the non-negative tuning parameter controls the bias’ contribution to the overall cost. While a larger generally decreases the bias at the cost of increased variance, it can be difficult to identify a in Eq. (7) that optimally reduces the bias without increasing the variance above the CRB across all estimated parameters in .
2.2 |. Variance-Constrained Bias Loss
To achieve this goal, our main contribution is to modify Eq. (7) to explicitly penalize deviations from an efficient estimator. Here, we primarily consider the CRB-weighting we proposed in Ref. 15: , a matrix with the inverse of the individual parameter’s CRBs for the measurement along the main diagonal and 0 elsewhere. This choice of weighting leads us to formulate the loss
| (8) |
where the first term penalizes the bias and the second term, where , is a variance penalty parameterized by ; e.g., penalizes variances exceeding the CRB. Note Eq. (8) reduces to Eq. (1) for and , and is equivalent to Eq. (7) for , and .
Since is equivalent to imposing a variance constraint, in analogy to the “bias-constrained error” term coined in Ref. 14, we refer to Eq. (8) as the weighted “variance-constrained bias loss.” However, a hard constraint requires the NN to uniformly achieve the CRB across the training set, which may not be possible without increasing the bias. We thus relax this constraint in practice by using a finite .
In the following, we study the use of the proposed NN training strategy for two qMRI applications outlined below.
3 |. METHODS
3.1 |. Pulse Sequences
For our first application we used the hybrid-state16 sequence described in Ref. 17 to extract six biophysical parameters of a 2-pool quantitative magnetization transfer (qMT) model,18–20 i.e., the normalized fractional semi-solid spin-pool size , the relaxation rates , of the free spin-pool, the exchange rate , and the relaxation rates/times , of the semi-solid spin-pool. To compute the CRB, we additionally considered three nuisance parameters: the complex-valued scaling and the field inhomogeneities and , but ignored them for parameter estimation. This hybrid-state sequence was optimized for a minimal CRB of , and in individual 4s long cycles with antiperiodic boundaries.16,17
In our second application, we considered the 2D single-slice inversion-recovery MR-fingerprinting FISP (MRF-FISP) sequence21 designed to estimate a single compartment and . We additionally considered only to compute the CRB. We used spoiling along the slice select direction, 1ms sinc-pulses with a time-bandwidth product of 4, , , and (time between adiabatic inversion pulse and first sinc-pulse).
3.2 |. Training Data
For the qMT model, we simulated a dictionary of approximately 600,000 fingerprints with the generalized Bloch framework20 using randomly generated parameters drawn from a mixture of Gaussian distributions centered around the values expected to be measured in vivo. Gaussians were preferred to uniform distributions to capture the high-dimensional parameter space with comparably few samples while still approximating the distribution expected in vivo. We heuristically chose 80% of fingerprints to have parameters typical for gray and white matter denoting a normal distribution with mean and standard deviation and scripts denoting truncation limits), , 10% parameters typical for fat , and 10% parameters that are typical for CSF 17,22,23 Field inhomogeneities were simulated with (uniform distribution) and . We used an SVD of the full simulated dictionary to compute a temporal subspace of rank 15 and compress the dictionary to emulate the typical subspace reconstruction measurement process;24–28 i.e., in this case. For practical reasons, we used only samples for training the NN, of which 20% were reserved for testing.
For the MRF-FISP sequence, we computed a dictionary of fingerprints with Bloch simulations spanning grey/white matter, fat, and CSF values at 3T: in the ranges [500:2:1500] ms, [250:2.4:550] ms, [3000:16:5000] ms (min:stepsize:max) and in the ranges [10:0.38:200]ms, [60:0.65:140]ms, [1500:8.1:2500] ms, respectively.21,22 The transmit field strength was assumed to be uniform, i.e., . We accounted for the slice profile by taking the complex average of 1324 isochromats with 600 isochromats uniformly distributed with phase between the FWHMs of the small flip angle approximated slice profile.29 For CSF’s long relaxation times, we instead simulated a total of 5300 isochromats across the slice profile. This dictionary was used for computing the rank 10 temporal subspace, dictionary compression, and network training.
3.3 |. Loss Functions
We consider two variants of the loss in Eq. (8):
MSE-CRB:15
Bias-Reduced:
Strategy (1) is a state-of-the-art approach that improves upon the MSE (Eq. (1)) by accounting for variations in scale across the different parameters and ensuring robustness to difficult-to-estimate parameters during training. While Eq. (8) shows that the loss would be 0 for an efficient NN, the CRB-weighting does not in itself ensure efficiency. Hence, strategy (2) improves on (1) by introducing averaging over multiple noise realizations to enable finer control over the NN’s bias and variance properties. We empirically determined the suitable regularization strength to be for both qMRI applications. While is chosen for simplicity, could also be used to account for the variance of the sample variance for 13
3.4 |. Network Architecture and Training
For the qMT model, we use a slightly modified version of the NN architecture we used in Ref. 15: 11 fully connected layers with skip connections and batch normalization, a maximum layer width of 1024, and an input layer where (the 15 temporal coefficients) is split into real and imaginary parts, with a total of 2,187,138 trainable parameters. The outputs are also constrained using ReLUs capped at the maximum values expected in vivo (note they could also be clamped to the minimum). We train the NNs using the Rectified ADAM optimizer,30 a learning rate of 10−4, and a batch size of 256.
For MRF-FISP, we adapt the original DRONE architecture,8 retaining the same fully connected 3 layers but modifying the input layer to be the real and imaginary parts of (the 10 temporal coefficients), using ReLU activations, incorporating batch normalization, and constraining the output values using ReLUs capped at the maximum values expected in vivo (5 for both and respectively), with a total of 98,402 trainable parameters. We train the NNs using the ADAM optimizer,31 a learning rate of 10−4, and a batch size of 256.
For each pulse sequence, we first trained a NN with and to convergence. This network is used to initialize the Bias-Reduced NNs trained for 500 epochs with , and . From the same initialization, we trained an NN with the MSE-CRB loss for a further 500 epochs with a batch size of 51200 for fair comparison to the state-of-the-art.
To ensure robustness to variable noise levels, we added white complex Gaussian noise to the training data for . A random SNR is selected for each measurement at each training epoch, which is used to generate the noise realizations.
3.5 |. Simulation Experiments
To study the bias and variance properties of the proposed NNs, we simulate measurements for both our applications as described in Section 3.2 but instead corrupt with white complex Gaussian noise times. For the qMT sequence, we compared our NNs to non-linear least squares (NLLS) using the Levenberg-Marquardt algorithm,32 initialized with the ground-truth. For MRF-FISP, we compared to dictionary matching, a discretized maximum likelihood estimator commonly used in the MRF literature.3
3.6 |. In Vivo Experiments
To evaluate the effect of the proposed NN estimators in vivo, we scanned two healthy volunteers on a 3T Prisma system (Siemens, Erlangen, Germany) with a 32-channel head coil after obtaining informed consent in agreement with our IRB’s requirements. For the qMT sequence, we scanned the whole brain of one subject using 3D radial koosh-ball k-space sampling with a 2D golden means pattern33 reshuffled to improve k-space coverage and minimize eddy current artifacts34 with a 256mm isotropic FOV and 1.24mm isotropic effective resolution, repeating the hybrid-state sequence for 12min scan time. For MRF-FISP, we acquired a single axial slice in subject two’s brain using 2D golden-angle radial k-space sampling instead of spirals, with an , voxel size , and 10 cycles of the sequence (10 radial spokes per frame), for 3.4min of scan time.
For both sequences, we use the low-rank inversion subspace reconstruction approach24,26 to reconstruct coefficient images directly in the subspace. A locally low-rank penalty35,36 is used to suppress artifacts for the qMT sequence (which we note modifies the noise distribution from the assumed white complex Gaussian noise during NN training). The reconstructed coefficients are used voxel-by-voxel as NN inputs to estimate the biophysical parameters.
4 |. RESULTS
Fig. 1 visualizes how the proposed strategy reduces the NN’s variable bias along one axis in parameter space. While the MSE-CRB NN can correctly estimate the simulated changes in , most other parameters exhibit bias that is also variable throughout parameter space. The Bias-Reduced strategy yields overall reduced bias, at the cost of overall increased variance (particularly for ). However, we observe for the Bias-Reduced NN that the only major effect of increasing is an increased variance of all parameters’ estimates—consistent with the expected increase in CRB.
FIGURE 1.
Boxplot comparison of simulated qMT parameter fits with networks trained using a state-of-the-art (MSE-CRB or Cramér-Rao bound weighted mean squared error criterion15) and the proposed Bias-Reduced loss, assuming . As an example, we vary only (the free spin pool’s transverse relaxation rate) while keeping all other parameters constant (red reference lines are the ground-truth). The proposed strategy significantly reduces the variable bias in all other parameters throughout parameter space except ( using Welch’s unequal variances -test).
Fig. 2 analyzes the simulated bias and variance as a function of SNR for a single point in parameter space corresponding to white matter.17 In several parameters, e.g., and , the MSE-CRB NN achieves the lowest variance for all SNR values at the cost of increased bias. The proposed loss reduces the bias for most parameters and SNR levels (except, e.g., at high SNR) and the resulting variance, while larger, more closely follows the CRB. Notably, the Bias-Reduced NN outperforms NLLS in in both bias and variance, despite the latter’s initialization with the ground-truth (initialization is inapplicable for NNs at inference).
FIGURE 2.
Normalized bias and standard deviation of the magnetization transfer parameter estimates as a function of . The estimates are based on simulations using typical white matter values 17 which are used for normalization. We compare neural networks trained using the MSE-CRB15 and proposed Bias-Reduced losses to non-linear least squares (NLLS) and a hypothetical efficient estimator, which has zero bias and variance equal to the Cramér-Rao bound. The proposed strategy is similar in performance to NLLS in all parameters except , where it more closely matches an efficient estimator. This analysis, repeated for grey matter values, is shown in Sup. Fig. S2.
Fig. 3 evaluates the impact of for the Bias-Reduced loss across the entire test set where each fingerprint has a random SNR. The proposed strategy with significantly reduces the overall bias for all parameters relative to the MSE-CRB criterion ( using the Wilcoxon signed-rank test)—notably, including —with a variance less than or equal to the CRB for the majority of fingerprints. Smaller values lead to further reduced overall bias at the cost of increased overall variance. Similar analyses of the NN’s performance for parameter and SNR values beyond the ranges used for training are shown in Sup. Figs. S1 and S3, respectively. Sup. Fig. S4 shows that, while a NN trained using Eq. (7) has a similar bias, the proposed strategy has more uniform variance properties across all estimated parameters.
FIGURE 3.
Normalized histograms of the CRB-weighted squared bias (A–F) and variance (G–L) of the magnetization transfer parameter estimates , where indexes across the test set (where each fingerprint has a random SNR) and indexes across parameters. Note the scaling of the y-axis truncates the left-most bins in each subplot. The MSE-CRB15 loss (blue) offers the lowest variance but the highest bias overall in comparison to the proposed Bias-Reduced strategies (red) and (green). A smaller reduces the overall bias slightly at the outsized cost of an increased proportion of fingerprints exceeding the CRB (; Eq. (8)). A comparison to Eq. (7) is shown in Sup. Fig. S4.
Fig. 4 investigates the effect the proposed strategy has on in vivo parameter fits. In comparison to the MSE-CRB, the Bias-Reduced strategy yields improved visual contrast in , which is likely a result of a reduced bias towards the white matter prior in the training data. There is greater correspondence with the reference NLLS maps in the harder-to-estimate parameters—particularly —though substantial differences remain in , which is line with Fig. 2F.
FIGURE 4.
In vivo magnetization transfer parameter maps fitted with the MSE-CRB15 and proposed Bias-Reduced neural networks in comparison to a non-linear least squares (NLLS) reference. The Bias-Reduced network offers the highest visual contrast in (magnifications) and has improved consistency with NLLS in all parameters (particularly and , red arrows) except for , consistent with Fig. 2F.
Similar results are seen in simulation and in vivo using the FISP sequence, though less pronounced. Sup. Fig. S5 shows that the proposed Bias-Reduced strategy has the lowest overall bias throughout parameter space in comparison to the MSE-CRB NN and dictionary matching. As seen in Fig. 5, all three estimators perform similarly in vivo with respect to , but the Bias-Reduced strategy produces the maps most similar to dictionary matching. This is quantified in the box-plots in Fig. 5G, which, in line with Fig. 1, demonstrates that the proposed Bias-Reduced strategy reduces ROI-dependent bias in vivo. Specifically, bias is reduced for the extreme values measured in the splenium, which have a comparably large CRB (cf. Fig. S5G). In Sup. Fig. S7, we show that the improvement offered by the proposed strategy is not simply due to averaging over multiple noise realizations.
FIGURE 5.
In vivo and maps acquired using the MRF-FISP sequence and fitted using NNs trained with two different strategies in comparison to a dictionary-matching-based reference. (G) analyzes the values within the two white matter ROIs drawn in (F), where outliers are not plotted. The Bias-Reduced NN yields more similar parameter maps to those of dictionary matching, but with the benefit of improved computational efficiency. Similar accuracy and precision to dictionary matching is also observed in simulation (Sup. Figs. S5–S6).
5 |. DISCUSSION
We propose a simple training loss enabling control over the bias and variance properties of NN parameter estimators. We show empirically in two qMRI applications that the proposed loss reduces the bias in comparison to the traditional MSE loss while keeping the variance close to the CRB. Such NNs are beneficial for developing and validating new qMRI biomarkers, particularly for advanced biophysical models that attempt to move beyond the standard Bloch equations; e.g., myelin water imaging, magnetization transfer, and diffusion. The proposed NNs are also expected to be more robust to pathology, where deviations from the prior are unpredictable and cannot be known a priori.
In this article, we considered the CRB-weighted MSE loss, which emphasizes different areas of parameter space depending on their ease of estimation. This, in addition to the training data distribution, can be thought of as priors that affect the NN’s generalization capabilities. The MSE-CRB-trained NN learns to minimize the loss by reducing the variance at the cost of bias towards the prior. The proposed approach reduces the impact of these priors while promoting the properties of an efficient estimator throughout parameter space. In general, we expect that the proposed method is more beneficial for high-dimensional qMRI models with difficult-to-estimate parameters that cannot feasibly be sampled on a uniform grid to generate training data, e.g., the Standard Model of diffusion.37
We obtained similar results when employing NN architectures of varying sizes, and postulate that the NN only needs to have sufficient expressivity to capture the complexity of the training data and estimation problem. For example, recent work regarding the “interpolation point” of NNs suggests that the number of trainable parameters should be greater than the number of examples times the number of estimated parameters.38 However, a thorough investigation of the best choice of architecture and nonlinear activation functions is outside this article’s scope.
As ground-truth parameters are often unknown in vivo, our approach helps reduce uncertainty about the quality of the NN’s estimates. As we view the regression NN through the lens of an estimator, our approach is related to other work surrounding quantifying uncertainty in NNs, e.g., by estimating the variance of the NN’s predictions in a Bayesian framework.39–42 By focusing on promoting the properties of efficient estimators, our approach avoids the limitations of training NNs to reproduce the fallible estimates of traditional estimators.6–8 While self-supervised methods also encourage unbiasedness to some degree,43 they are only effective primarily for the parameters with the largest signal derivatives. Encoding bias reduction into the NN’s weights during training also reduces the need to apply a computationally expensive bias correction after estimation.44,45
An important limitation lies in our adoption of the common assumption that the signal is perfectly described by the biophysical model plus white complex Gaussian noise. Unmodeled biophysical effects in the experimental data, however, are usually not Gaussian distributed. Imaging artifacts from various sources,46 advanced imaging techniques such as parallel imaging,47,48 and regularized image reconstruction49 can further alter the residuals’ distribution. Our approach does not reduce bias resulting from this “data mismatch,” which applies to all estimators considered in this article.
There are several interesting avenues for future work. Eq. (4) weights both the bias and covariance terms with the same weighting matrix and the employed CRB-weighting de-emphasizes difficult-to-estimate areas of parameter space during training. If the bias needs to be further reduced in areas where the CRB is large, one could design a that only normalizes the different parameters by their average value within the training dataset. Further, the MSE-CRB strategy considers only the individual parameter’s variances and ignores the off-diagonal elements of the covariance matrix in Eq. (4), which is equivalent to considering the estimation of each parameter separately. However, this assumption may not necessarily hold true for the employed network architecture, and future work will consider separate NNs trained to regress each parameter individually. Eq. (4) also offers the flexibility to design a non-diagonal to explicitly penalize the covariances, which could be beneficial for a joint statistical analysis of the parameter estimates.
6 |. CONCLUSION
A tunable generalization of the MSE loss enables training NNs that are more similar to efficient estimators than those trained with the traditional MSE loss. The proposed NNs are well-suited for the development and validation of new quantitative biomarkers.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by NIH grants F30 AG077794, R01 NS131948, T32 GM136573, and was performed under the rubric of the Center for Advanced Imaging Innovation and Research (CAI2R), an NIBIB National Center for Biomedical Imaging and Bioengineering (P41 EB017183).
Funding Information
National Institutes of Health; Grant Numbers: F30 AG077794, R01 NS131948, T32 GM136573, P41 EB017183
APPENDIX
A. DERIVATION OF THE WEIGHTED MSE LOSS
For simplicity, here we consider only the sample mean over the noise realizations in Eq. (3), ignoring the sum over different samples within the training set. Now,
where (i) uses the linearity of the trace, (ii) follows from inserting Eq. (5), (iii) uses the fact that and the definition of the (uncorrected) sample covariance as , and (iv) uses the notation .
Note that since Eq. (3) is written as a sample mean—which is necessary for training a network in practice—it only approximates the mean squared error written as an expectation over the noise distribution. The derivation shown here holds within this approximation of the expectation using a sample mean.
Footnotes
SUPPORTING INFORMATION
The following supporting information is available as part of the online article:
Figure S1. Comparison of the MSE-CRB and Bias-Reduced NN’s CRB-weighted squared bias and variance for 500 qMT fingerprints (each with a random SNR) randomly sampled from a mixture of Gaussian distributions truncated at non-physical values (e.g., constraints like are still imposed). Fingerprints that are outside the cutoff ranges of the training data distribution in any parameter are colored red, and otherwise blue. The x-axis shows the Euclidean distance from the mean of the training distribution, weighted by the standard deviations (e.g., calculated from z-scores of the individual parameters), and thus (approximately) follows the Chi distribution. The dashed black line corresponds to the Cramé-rRao Bound. For both networks, the bias is generally higher for fingerprints outside the training distribution. The proposed training strategy reduces the overall bias for fingerprints both in and outside of the distribution.
Figure S2. Repetition of Fig. 2 showing the normalized absolute percent bias and percent standard deviation of the 2-pool qMT parameters in grey matter 17 as a function of SNR for neural networks trained using the MSE-CRB15 and proposed Bias-Reduced losses in comparison to non-linear least squares (NLLS) and a hypothetical efficient estimator. Here, similarly to white matter, the proposed strategy performs similarly to NLLS in all parameters except , where the performance is more in line with an efficient estimator.
Figure S3. Repetition of Fig. 3 where all fingerprints in the test set have a random SNR higher than the range of SNRs seen during training. In this case, the bias is overall higher for both networks. While this suggests somewhat impaired generalization of the employed NN architecture,50 it is also consistent with normalization by smaller Cramér-Rao bounds—which account for the decreased noise level—and an expected floor to the accuracy of the NN estimator that is related purely to measurement noise. The proposed strategy for reducing bias still holds outside of the training range of SNRs, albeit somewhat less for .
Figure S4. Repetition of Fig. 3 comparing NNs trained with the MSE-CRB, the proposed Bias-Reduced and the bias-constrained loss (Eq. (7)) with an optimized lambda. While the bias-constrained approach has similar bias to the Bias-Reduced strategy, it has less uniform variance properties across all estimated qMT parameters with a longer tail past the line.
Figure S5. Normalized bias and standard deviation of the FISP-based estimates as a function of and using NNs trained with two different strategies in comparison to a dictionary-matching-based reference (C,F). (A,D) The Cramér-Rao bound weighted mean squared error (MSE-CRB).15 (B,E) The Bias-Reduced strategy achieves the lowest overall bias throughout parameter space with a similar variance to the CRB reference (G). The green circle marks the average white matter values measured in vivo (Fig. 5).
Figure S6. Repetition of Sup. Fig. S5 showing the normalized absolute percent bias and percent standard deviation of the FISP-based estimates instead. In this case, the performance is similar between NNs trained using both strategies and the reference.
Figure S7. Comparison of in vivo FISP and maps estimated using NNs trained with the typical mean squared error (MSE) criterion in comparison to the proposed method and dictionary matching. With only one noise realization , small values are poorly represented in the overall MSE loss, contributing to poor fits in vivo (E, consistent with Fig. 5 of Ref. 8). While this is somewhat mitigated by averaging over , the resulting maps are still biased (F), which is ameliorated by use of the proposed Bias-Reduced strategy (G).
DATA AVAILABILITY STATEMENT
Julia code to train neural networks for the qMT model are uploaded to the Github repository @andrewwmao/BiasRe-ducedNetworks. Example scripts are also provided to reproduce Figs. 1, 2, and 4, except for non-linear least squares fitting, the code for which is already available at @JakobAss-laender/MRIgeneralizedBloch.jl.
REFERENCES
- 1.Harald Cramér. Mathematical Methods of Statistics. Princeton University Press; 1946. [Google Scholar]
- 2.Kay Steven M. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Philadelphia, PA: Prentice Hall; 1993. [Google Scholar]
- 3.Ma Dan, Gulani Vikas, Seiberlich Nicole, et al. Magnetic resonance fingerprinting. Nature. 2013;495(7440):187–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Newey Whitney K., McFadden Daniel. Chapter 36: Large sample estimation and hypothesis testing. In: Elsevier; 1994. (pp. 2111–2245). [Google Scholar]
- 5.Wu Chien-Fu. Asymptotic Theory of Nonlinear Least Squares Estimation. The Annals of Statistics. 1981;9(3):501–513. [Google Scholar]
- 6.Liu Hanwen, Xiang Qing San, Tam Roger, et al. Myelin water imaging data analysis in less than one minute. Neurolmage. 2020;210(January):116551. [DOI] [PubMed] [Google Scholar]
- 7.Lee Jieun, Lee Doohee, Choi Joon Yul, Shin Dongmyung, Shin Hyeong Geol, Lee Jongho. Artificial neural network for myelin water imaging. Magnetic Resonance in Medicine. 2020;83(5):1875–1883. [DOI] [PubMed] [Google Scholar]
- 8.Cohen Ouri, Zhu Bo, Rosen Matthew S. MR fingerprinting Deep RecOnstruction NEtwork (DRONE). Magnetic Resonance in Medicine. 2018;80(3):885–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.McGivney Debra, Deshmane Anagha, Jiang Yun, et al. Bayesian estimation of multicomponent relaxation parameters in magnetic resonance fingerprinting. Magnetic Resonance in Medicine. 2018;80(1):159–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bouhrara Mustapha, Spencer Richard G.. Improved determination of the myelin water fraction in human brain using magnetic resonance imaging through Bayesian analysis of mcDESPOT. NeuroImage. 2016;127:456–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Somekh-Baruch Anelia, Leshem Amir, Saligrama Venkatesh. On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems. IEEE Transactions on Information Theory. 2018;64(8):5549–5554. [Google Scholar]
- 12.Assländer Jakob. A Perspective on MR Fingerprinting. Journal of Magnetic Resonance Imaging. 2021;53(3):676–685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ahn S., Fessler J. A.. Standard errors of mean, variance, and standard deviation estimators. 413: Comm. and Sign. Proc. Lab., Dept. of EECS. Univ. of Michigan, Ann Arbor, MI, 48109–2122; 2003. [Google Scholar]
- 14.Diskin Tzvi, Eldar Yonina C., Wiesel Ami. Learning to Estimate Without Bias. IEEE Transactions on Signal Processing. 2023;71:2162–2171. [Google Scholar]
- 15.Zhang Xiaoxia, Duchemin Quentin, Liu Kangning, et al. Cramér-Rao bound-informed training of neural networks for quantitative MRI. Magnetic Resonance in Medicine. 2022;88(1):436–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Assländer Jakob, Novikov Dmitry S., Lattanzi Riccardo, Sodickson Daniel K., Cloos Martijn A.. Hybrid-state free precession in nuclear magnetic resonance. Communications Physics. 2019;2(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Assländer Jakob, Mao Andrew, Beck Erin S, et al. On multi-path longitudinal spin relaxation in brain tissue. arXiv. 2023;:arXiv:2301.08394. [Google Scholar]
- 18.Henkelman R. Mark, Huang Xuemei, Xiang Qing-San, Stanisz G. J., Swanson Scott D., Bronskill Michael J.. Quantitative interpretation of magnetization transfer. Magnetic Resonance in Medicine. 1993;29(6):759–766. [DOI] [PubMed] [Google Scholar]
- 19.Helms Gunther, Hagberg Gisela E.. In vivo quantification of the bound pool T1 in human white matter using the binary spin-bath model of progressive magnetization transfer saturation. Physics in Medicine and Biology. 2009;54(23). [DOI] [PubMed] [Google Scholar]
- 20.Assländer Jakob, Gultekin Cem, Flassbeck Sebastian, Glaser Steffen J, Sodickson Daniel K. Generalized Bloch model: A theory for pulsed magnetization transfer. Magnetic Resonance in Medicine. 2021;(July):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jiang Yun, Ma Dan, Seiberlich Nicole, Gulani Vikas, Griswold Mark A.. MR fingerprinting using fast imaging with steady state precession (FISP) with spiral readout. Magnetic Resonance in Medicine. 2015;74(6):1621–1631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bojorquez Jorge Zavala, Bricq Stéphanie, Acquitter Clement, Brunotte François, Walker Paul M., Lalande Alain. What are normal relaxation times of tissues at 3T? Magnetic Resonance Imaging. 2017;35:69–80. [DOI] [PubMed] [Google Scholar]
- 23.Stanisz Greg J., Odrobina Ewa E., Pun Joseph, et al. T1, T2 relaxation and magnetization transfer in tissue at 3T. Magnetic Resonance in Medicine. 2005;54(3):507–512. [DOI] [PubMed] [Google Scholar]
- 24.McGivney Debra F., Pierre Eric, Ma Dan, et al. SVD compression for magnetic resonance fingerprinting in the time domain. IEEE Transactions on Medical Imaging. 2014;33(12):2311–2322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tamir Jonathan I., Uecker Martin, Chen Weitian, et al. T2 shuffling: Sharp, multicontrast, volumetric fast spin-echo imaging. Magnetic Resonance in Medicine. 2017;77(1):180–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Assländer Jakob, Cloos Martijn A, Knoll Florian, Sodickson Daniel K, Hennig Jürgen, Lattanzi Riccardo. Low rank alternating direction method of multipliers reconstruction for MR fingerprinting. Magnetic Resonance in Medicine. 2018;79(1):83–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhao Bo, Setsompop Kawin, Adalsteinsson Elfar, et al. Improved magnetic resonance fingerprinting reconstruction with low-rank and subspace modeling. Magnetic Resonance in Medicine. 2018;79(2):933–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mao Andrew, Flassbeck Sebastian, Gultekin Cem, Assländer Jakob. Cramér-Rao Bound Optimized Subspace Reconstruction in Quantitative MRI. arXiv. 2023;:arXiv:2305.00326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Malik Shaihan J, Sbrizzi Alessandro, Hoogduin Hans, Hajnal Joseph V. Equivalence of EPG and Isochromat-based simulation of MR signals. Proc. Intl. Soc. Mag. Reson. Med.. 2016;:3196. [Google Scholar]
- 30.Liu Liyuan, Jiang Haoming, He Pengcheng, et al. On the Variance of the Adaptive Learning Rate and Beyond. 8th International Conference on Learning Representations (ICLR). 2020;:arXiv:1908.03265. [Google Scholar]
- 31.Kingma Diederik P., Ba Jimmy. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations (ICLR). 2015;:arXiv:1412.6980. [Google Scholar]
- 32.Marquardt Donald W.. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics. 1963;11(2):431–441. [Google Scholar]
- 33.Chan Rachel W., Ramsay Elizabeth A., Cunningham Charles H., Plewes Donald B.. Temporal stability of adaptive 3D radial MRI using multidimensional golden means. Magnetic Resonance in Medicine. 2009;61(2):354–363. [DOI] [PubMed] [Google Scholar]
- 34.Flassbeck Sebastian, Assländer Jakob. Minimization of eddy current artifacts in sequences with periodic dynamics. Magnetic Resonance in Medicine. 2024;91(3):1067–1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Trzasko J., Manduca A.. Local versus Global Low-Rank Promotion in Dynamic MRI Series Reconstruction. Proc. Intl. Soc. Mag. Reson. Med.. 2011;:4371. [Google Scholar]
- 36.Tao Zhang, Pauly John M., Levesque Ives R.. Accelerating parameter mapping with a locally low rank constraint. Magnetic Resonance in Medicine. 2015;73(2):655–661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Novikov Dmitry S., Fieremans Els, Jespersen Sune N., Kiselev Valerij G.. Quantifying brain microstructure with diffusion MRI: Theory and parameter estimation. NMR in Biomedicine. 2019;32(4):1–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Belkin Mikhail, Hsu Daniel, Ma Siyuan, Mandal Soumik. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences of the United States of America. 2019;116(32):15849–15854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lambert Benjamin, Forbes Florence, Tucholka Alan, Doyle Senan, Dehaene Harmonie, Dojat Michel. Trustworthy clinical AI solutions: a unified review of uncertainty quantification in deep learning models for medical image analysis. arXiv. 2022;:arXiv:2210.03736. [DOI] [PubMed] [Google Scholar]
- 40.Abdar Moloud, Pourpanah Farhad, Hussain Sadiq, et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion. 2021;76:243–297. [Google Scholar]
- 41.Jalal Ajil, Arvinte Marius, Daras Giannis, Price Eric, Dimakis Alexandros G., Tamir Jonathan I.. Robust Compressed Sensing MRI with Deep Generative Priors. 35th Conference on Neural Information Processing Systems. 2021;:arXiv:2108.01368. [Google Scholar]
- 42.Luo Guanxiong, Blumenthal Moritz, Heide Martin, Uecker Martin. Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine. 2023;90(1):295–311 [DOI] [PubMed] [Google Scholar]
- 43.Luu Huan Minh, Park Sung Hong. SIMPLEX: Multiple phasecycled bSSFP quantitative magnetization transfer imaging with physic-guided simulation learning of neural network. NeuroImage. 2023;284(April):120449. [DOI] [PubMed] [Google Scholar]
- 44.Kosmidis Ioannis. Bias in parametric estimation: Reduction and useful side-effects. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(3):185–196. [Google Scholar]
- 45.Whitaker Steven T., Nielsen Jon-Fredrik, Fessler Jeffrey A.. Endto-End Scan Parameter Optimization for Improved Myelin Water Imaging. Proc. Intl. Soc. Mag. Reson. Med.. 2022;:1273. [Google Scholar]
- 46.Aja-Fernández Santiago, Vegas-Sánchez-Ferrero Gonzalo. Statistical Analysis of Noise in MRI. Springer International; 2016. [Google Scholar]
- 47.Varadarajan Divya, Haldar Justin P.. A Majorize-Minimize Framework for Rician and Non-Central Chi MR Images. IEEE Transactions on Medical Imaging. 2015;34(10):2191–2202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bouhrara Mustapha, Spencer Richard G.. Fisher information and Cramér-Rao lower bound for experimental design in parallel imaging. Magnetic Resonance in Medicine. 2018;79(6):3249–3255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Fessler J. A.. Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): applications to tomography. IEEE Transactions on Image Processing. 1996;5(3):493–506. [DOI] [PubMed] [Google Scholar]
- 50.Mohan Sreyas, Kadkhodaie Zahra, Simoncelli Eero P., Fernandez-Granda Carlos. Robust and Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks. 8th International Conference on Learning Representations (ICLR). 2020;:1–22. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Julia code to train neural networks for the qMT model are uploaded to the Github repository @andrewwmao/BiasRe-ducedNetworks. Example scripts are also provided to reproduce Figs. 1, 2, and 4, except for non-linear least squares fitting, the code for which is already available at @JakobAss-laender/MRIgeneralizedBloch.jl.





