Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 30.
Published in final edited form as: IEEE Trans Med Imaging. 2020 Feb 5;39(7):2451–2460. doi: 10.1109/TMI.2020.2969425

Model Comparison Metrics Require Adaptive Correction if Parameters are Discretized: Proof-of-Concept Applied to Transient Signals in Dynamic PET

Heather Liu 1, Evan D Morris 2
PMCID: PMC7392400  NIHMSID: NIHMS1608358  PMID: 32031932

Abstract

Linear parametric neurotransmitter PET (lp-ntPET) is a novel kinetic model that estimates the temporal characteristics of a transient neurotransmitter component in PET data. To preserve computational simplicity in estimation, the parameters of the nonlinear term that describe this transient signal are discretized, and only a limited set of values for each parameter are allowed. Thus, linear estimation can be performed. Linear estimation is implemented using predefined basis functions that incorporate the discretized parameters. The implementation of the model using discretized parameters poses unique challenges for significance testing. Significance testing employs model comparison metrics to determine the significance of the improvement of the fit accomplished by including a basis function, i.e. it determines the presence of a transient signal in the PET data. A false positive occurs when the bases overfit data that do not contain a transient component. The number of parameters in a model, p, is necessary to determine the degrees of freedom in the model. In turn, p is crucial for the calculation of model selection metrics and controlling the false positive rate (FPR). In this work, we first explore the effect of parameter discretization on FPR by fitting simulated null data with varying numbers of bases. We demonstrate the dependence of FPR on number of bases. Then, we propose a correction to the number of parameters in the model, peff, which adapts to the number of bases used. Implementing model selection with peff maintains a stable FPR independent of number of bases.

Keywords: lp-ntPET, parametric imaging, model comparison, goodness of fit, basis functions, constrained optimization

I. Introduction

POSITRON emission tomography (PET) makes it possible to image molecular targets with high specificity. Kinetic models are necessary to quantify physiological properties of the target and to describe tracer-target dynamics. The linearparametric neurotransmitter model (lp-ntPET) estimates the timing of transient neurotransmitter (NT) release occurring within a single scan [1]-[7]. The model has been applied successfully to characterize the dynamics of smoking-induced dopamine (DA) release [8], [9] as well as mu-opioid receptor occupancy after naloxone administration [10]. Various efforts have been made to refine the model’s utility, such as development of nonparametric algebraic methods [11] and incorporation into direct reconstruction of PET data [12].

lp-ntPET is formulated as the sum of two components: 1) the tracer component, which quantifies the steady-state properties of the system, and 2) the NT component, which characterizes a transient NT signal that competes with the tracer. lp-ntPET is the linearized version of the ntPET model [5]. Because of its linear form, lp-ntPET can be used to estimate parameters that describe the tracer and NT components on the voxel level with high computational efficiency—thousands of voxels can be fitted, in just minutes. To implement linear estimation, parameters in the NT component that describe the timing of the transient signal are discretized. A limited set of plausible timing parameter values for these discretized timing parameters are defined before estimation. Each discrete combination of possible timing parameters forms one basis function. All combinations together form a library of ‘bases’ that represents all candidate timing profiles of the transient signal. Linear fitting for the rest of the parameters in the model is performed for each of the bases. The combination of linear parameters and basis function that produces the best fit is retained as the set of optimal parameters.

lp-ntPET (the “full model”) is susceptible to overfitting and false positive detection of a transient NT signal. Thus, significance testing is essential to control the false positive rate (FPR). Without the NT component, lp-ntPET is identical to the MRTM model (hence, the “restricted model”) [13]. Model comparison metrics can be used to evaluate the significance of improvement in the fit by the full model over the restricted model. In essence, these metrics adjudicate the need for including the basis functions during fitting, which in turn indicates the presence of a true positive transient in the PET signal.

Previous work by our group has characterized the performance of different model comparison metrics for comparing the full model to the restricted model [4], [14]. However, the effect of parameter discretization on model comparison and FPR has not yet been explored. Model comparison metrics (and statistical tests of models, in general) expect precise knowledge of the number of parameters in the full model, pfull. However, a limited (discretized) range of values for the timing parameters cannot fully span their respective parameter spaces. Consequently, those parameters contribute only fractional degrees of freedom to the model. In order to properly implement model selection, we propose that it is necessary to determine the “effective” number of parameters in a model. We expect that the effective number of parameters in the full model, pfulleff, reflects the fractional degrees of freedom contributed by the timing parameters such that pfulleff<pfull. We hypothesize that pfulleff depends on the number of bases provided for fitting; the greater the number of distinct bases, the more of parameter space is covered, and the greater the apparent degrees of freedom in the full model. Implementing model comparison with pfulleff should alleviate the dependence of FPR on the size of the basis library.

To address our hypothesis, we use simulations of null PET data to demonstrate the dependence of FPR on the number of bases. We then use the demonstrated relationship between number of bases and FPR to determine pfulleff, the correction to the number of parameters in the full model for a particular number of bases. This correction adapts the number of parameters in the full model to achieve a uniform FPR during model selection independent of number of bases. We evaluate the ability of pfulleff to remove the dependence of FPR on number of bases in dynamic 3D and 4D phantom data. Finally, we assess the performance of pfulleff in human data from null [11C]Raclopride scans, and demonstrate the maintenance of a stable FPR in real data.

II. Theory

A. The lp-ntPET Model

lp-ntPET (1a), the “full model”, is a multilinear compartmental model containing a time-varying term that describes a transient NT signal (1b). The model is composed of a tracer component and a NT component. The NT component characterizes the effect of a transient time-varying NT signal with γ, the peak amplitude, and timing parameters: tD, the signal start time relative to injection, tP, the peak time relative to injection, and α, the decay rate. These timing parameters are discretized in our linear implementation. u(t) is the unit step function.

CT(t)=R1CR(t)+k20tCR(u)duk2a0tCT(u)duγ0tCT(u)hi(u)du (1a)

where,

hi(t)=(ttDtPtD)αexp(α(1ttDtPtD))u(ttD) (1b)

The tracer component is composed of the three time-invariant parameters representing kinetic constants describing the tracer. This component is identical to the MRTM [13] (2), the “restricted model”.

CT(t)=R1CR(t)+k20tCR(u)duk2a0tCT(u)du (2)

CT and CR are the concentrations of tracer in the target and reference compartments, respectively. The target compartment contains the NT signal. The reference compartment is used as a proxy for the input function of tracer introduced into the system. R1 is the ratio of tracer delivery to the target and reference compartments; k2 is the efflux rate related to the free diffusion of tracer ; k2a is the efflux rate that incorporates the effects of specific binding of the tracer to the target.

In the basis function implementation of the full model, the NT timing parameters are restricted to a predetermined set of discrete values. All possible combinations of predetermined tD, tP, and α values form a unique library of bases that may vary in both number and timing characteristics. Each basis function is generated by (1b) and then incorporated into (1a) to produce a fit and resultant sum of squared errors (SSE). The combination of the basis function and linear parameters that produces the lowest SSE is retained as the final set of estimated parameters.

B. Controlling False Positives With Model Comparison Metrics

Model selection metrics evaluate the significance of the improvement in fit that is achieved by including additional parameters in a model. In this case, the metrics evaluate the advantage given by including the NT component in (1a). When used to fit data without an NT signal (null data), basis functions are, by definition, extraneous and any improvement in fit given by the NT component is therefore overfitting. A false positive is defined in this context as a fit to null data for which the full model is erroneously selected over the restricted model. We examined the behavior of three common model selection metrics in determining false positives: the F-statistic (3), the corrected Akaike Information Criterion (AICc) (4) [15], and the Bayesian Information Criterion (BIC) (5) [16].

F=(SSEresSSEfull)(pfullpres)(SSEfull)(npfull) (3)
AICc=2p+nln(SSEn)+2p2+2pnp1 (4a)

We define:

ΔAICc=AICcfullAICcres (4b)
BIC=pln(n)+nln(SSEn) (5a)

We define:

ΔBIC=BICfullBICres (5b)

Subscripts “full” and “res” indicate the full and restricted models, respectively. SSE is the sum of squared errors from the fit; p is the number of parameters in the model; n is the number of data points being fitted, i.e. the number of frames per scan.

The criteria for selecting the full model over the restricted model are as follows: 1) the F-statistic must surpass the Fcritical threshold at the 5% significance level (for the typical threshold of p = 0.05). The Fcritical threshold is determined by p fullpres degrees of freedom in the numerator and np full degrees of freedom in the denominator. 2) ΔAICc must be less than zero. 3) ΔBIC must be less than zero.

C. False Positive Rate and “Effective” Number of Parameters

The false positive rate (FPR) is the fraction of the total number of null data sets, k, for which the full model is determined to be superior by a given metric. FPR is defined for each model comparison metric, respectively, as:

FPRF=i=1k[Fi>Fcrit,0.95]k (6)
FPRAICc=i=1k[ΔAICci<0]k (7)
FPRBIC=i=1k[ΔBICi<0]k (8)

Traditionally, all parameters implied within the right-hand-sides of (6)-(8) are assumed to be known. According to our hypothesis however, p full should actually be a variable that increases with greater coverage of parameter space, i.e., inclusion of more distinct bases. If we stipulate a constant FPR on the left-hand-sides of (6)-(8) and explicitly solve for p full in each case, we obtain the “effective” number of parameters for a specific implementation of the full model. We will denote this unknown variable as pfulleff. Put simply, the known and unknown variables are switched between the calculations of FPR and pfulleff. pfulleff can then be determined numerically or analytically. pfulleff is essentially a modified p full that recalibrates the distribution of each model comparison metric such that a desired FPR of null instances surpasses the critical threshold (Fcrit, for the F-statistic; 0, for ΔAICc, and ΔBIC). We will refer to FAICc, and ΔBIC calculated with pfulleff in (3)-(5), as FeffAICceff, and ΔBICeff, respectively. These adapted model selection metrics account for the number of bases used during implementation of the full model.

We expect pfulleff to lie between 4 and 7 because, of the 7 parameters in the full model, {R1, k2, k2a, γ} are continuous and {td, tp,α} are discretized. The 4 continuous parameters span their parameter spaces because they are explicitly calculated. Thus, notwithstanding correlation, they contribute 4 full degrees of freedom to the model. The 3 discretized parameters, contained within the basis functions, cannot span their parameter spaces and therefore contribute only fractional degrees of freedom. Taken together, the 3 discretized parameters contribute up to, but less than, 3 additional degrees of freedom to the model.

III. Methods

A. Simulations of Ideal Null Data

Noiseless striatal time-activity curves (TACs) were simulated using the simplified reference tissue model (SRTM) [17] to represent [11C]Raclopride uptake in the striatum and cerebellum. For the striatum, SRTM parameters were set to: R1 = 1, k2 = 0.42 min−1, BPND = 3. A noiseless cerebellum curve was simulated using the 1-tissue compartment model (K1 = 0.0918 mL/(min g), k2 = 0.4484 min−1). The arterial input function was taken from a human scan (from Siemens HRRT) following bolus injection of 20 mCi into a male subject (85.45 kg). The noiseless cerebellum curve was taken as the reference region input, CR (t). Simulated data were binned into 1-minute frames for the first 10 minutes and 3-minute frames for the remainder of the 90-minute scan. The noiseless data simulated from SRTM were then fitted with MRTM (2). This fitted MRTM curve was taken as the ground truth. Noisy data were then generated by adding homoscedastic Gaussian noise to the noiseless MRTM curve. Ten-thousand TACs were generated at each of 11 noise levels, ranging from region-level to voxel-level noise. These data were considered ideal because the fit by the restricted model to the noiseless data contained zero error. All simulations were implemented in MATLAB software (R2017a, The MathWorks, Inc., Natick, MA) using COMKAT modeling routines [18].

B. Simulations of Realistic Null Data in 3D and 4D Phantom

Realistic data were simulated using the ntPET model [5] to resemble [11C]Raclopride uptake by both the striatum and cerebellum. Kinetic parameters were adapted from Pappata et al. [19], Morris et al. [20], and Fisher et al. [21]. Striatal parameters were set to: K1 = 0.07344 mL/(min g), k2 = 0.35872 min−1, kon = 0.0173 mL/(pmol min), koff = 0.1363 min−1, Bmax = 100 pmol/mL, Fv = 0.04 mL/mL, konDA=0.25mL/(pmol min) and koffDA=25min1(kD = 100 nM). Basal DA concentration was set to 100 nM, so that 50% of receptors would be occupied at baseline. Cerebellum parameters were set to: K1 = 0.0918 mL/(min g), k2 = 0.4484 min−1, and konDA=koffDA=0. The same arterial input function described for the ideal data was used to simulate realistic data. Ten-thousand striatal TACs with voxel-level noise and a noiseless reference region curve were simulated. Time bins were identical to those of the ideal simulations. Noise was added to the striatal TACs. Noise adhered to a Gaussian distribution with a zero mean and standard deviation modeled according to:

εi=noisescalePETi×eλtiΔti×eλti (9)

where PETi is the signal at a single time point, i, without decay correction; γ is the decay constant for 11C; Δti is the duration of the time frame; εi is the standard deviation of the additive error in the TAC, which was scaled to voxel level noise [22].

Of the 10,000 simulated striatal TACs, 9900 were created to be null and 100 were created to be positive. The positive TACs contained randomly generated time-varying components that adhered to (1b). The timing parameters for the positive simulations were chosen from the following probability density functions: γ ~ N(200nM, 50nM),tD ~ U (35min, 50min), tP ~ U (3min, 30min), α ~ U(0.05, 1); U(min, max) specifies a uniform distribution and N(mean, standard deviation) specifies a normal distribution. Distributions for tD and tP were discretized in 3 min intervals. The possible α values were 0.05, 0.1, 0.5, or 1. There were 240 total possible combinations of the timing parameters.

For the 3D phantom (Fig. 1), 10,000 TACs were arranged in a 100 pixel × 100 pixel square; the 100 TACs containing a time-varying component were arranged in a 10 pixel × 10 pixel positive region in the center. Null TACs were assigned to the rest of the phantom.

Fig. 1.

Fig. 1.

2D parametric images of 3D phantom. A) simulated gamma (DA release) values; γ ~ N(200nM, 50nM). B) simulated alpha values; α ~ U(0.05, 1). C) simulated tD values; tD ~ U(35min, 50min). D) simulated tP values; tP ~ U(3min, 30min).

For the 4D phantom (Fig. 2), 10,000 TACs were arranged in a 50 voxel x 50 voxel × 4 voxel cuboid; the 100 TACs containing a time-varying component were arranged in a 5 voxel × 5 voxel × 4 voxel positive region placed in the center. Null TACs were assigned to the rest of the phantom. This arrangement of voxels was chosen to evoke the 4-slice precommissural striatum mask used in previous studies [9], [23].

Fig. 2.

Fig. 2.

3D parametric images of 4D phantom (shown in 4 slices). A) simulated gamma values; γ ~ N(200nM, 50nM). B) simulated alpha values; α ~ U(0.05, 1). C) simulated tD values; tD ~ U(35min, 50min). D) simulated tP values; tP ~ U(3min, 30min).

C. Fitting of Ideal Data

Each TAC was fitted with both the full and restricted models. The full model was implemented with libraries of varying numbers of bases, such that there were 10,000 unique sets of fits for each unique combination of noise level and number of bases. For illustration, Fig. 3 shows the libraries with the fewest and most bases. Fitting was implemented using a noiseless CT in the integral terms in order to eliminate correlated noise between CT on the left-hand-side and ∫ CT on the right-hand-side of (1a)) and (2). Although this modification cannot be applied to real data (CT will never be noiseless), we sought to adhere to the assumptions of linearity as closely as possible in this idealized scenario.

Fig. 3.

Fig. 3.

Response function libraries for fitting with the full model. Fitting libraries varied between A) 6 and B) 1470 bases. Libraries were expanded at ~3 minute resolution, i.e. new tD and tP values were appended in 3 minute increments.

D. Determining FPR and peff,5% From Ideal Data

FAICc, and ΔBIC were calculated using p full = 7 for all fits to the ideal data. First, (6)-(8) were applied to determine FPR as a function of noise and number of bases. Then, FPR was set to 0.05 and (6)-(8) were used to solve for pfulleff,5%1 as a variable for every combination of noise and number of bases. pfulleff,5% was determined numerically with the Quasi-Newton algorithm built into MATLAB.

The standard deviations for all FPRs are expected to be small because each FPR is calculated from a large number of data sets (10,000). To confirm, 10 replicates of 10,000 data sets stimulated at voxel noise were fitted with a library of 288 bases (a typical implementation of the model). The standard deviation of the FPR is calculated from the 10 replicates.

E. Fitting of Realistic Phantom Data

All phantom TACs were fitted with both the full and restricted models. The full model was implemented with libraries of varying sizes between 9 bases and 240 bases. Libraries varied in resolution of bases, but preserved the range of α, tD, and tP (as shown in Fig. 4) Due to the random nature of the simulated timing parameters, it was necessary to preserve the minimum and maximum limits of each parameter in all fitting libraries. This prevented the estimated value of each timing parameter from being restricted to a range that did not include the parameter’s true simulated value. Fitting was implemented with a noisy CT in the integral terms in (1a) and (2).

Fig. 4.

Fig. 4.

Response function libraries used for fitting phantom data. As library size increases, the resolution of bases increases but their span is preserved.

Feff,5%AICceff,5%, and ΔBICeff,5% were calculated for each pair of fits by the full and restricted models, using pfulleff (instead of p full), as determined for each library size at voxel-level noise. These values for pfulleff,5% are indicated in the blue contour of Fig. 6. Binary “significance masks” were produced to indicate F > Fcrit, ΔAICc < 0, or ΔBIC < 0 at each pixel or voxel.

Fig. 6.

Fig. 6.

Surface plot of “effective” number of parameters, pfulleff,5%, as determined from ΔAICc. Each combination of number of bases and noise contains the result from analysis of 10,000 pairs of fits. Approximate voxel- and region-level noise are indicated with bolded contours.

pfulleff controls for false positives at the voxel level. Due to the violations of linearity and imperfect adherence to model assumptions in real data, FPR tends to be higher by as much as an order of magnitude in real data compared with ideal data. Thus, it is necessary to apply a second level of control to FPR at the image level. Cluster-size thresholding is conventionally used to eliminate false positives at the image level [14]. Various cluster-size thresholds were applied to the significance masks. Cluster-size thresholding is commonlyused as a method to correct for multiple-comparisons in voxel-wise analysis. A single cluster was defined using a blob coloring algorithm based on six-neighborhood connectedness. The FPR was assessed after applying different cluster-size thresholds varying between 1 and 30 pixels/voxels.

F. Implementation in Human Baseline [11C]Raclopride Data

To assess the utility of Feff, ΔAICceff, and ΔBICeff in real PET data, all methods described above were applied to dynamic voxel-wise data from two null [11C]Raclopride scans. Both subjects were healthy adult male humans. Data from subject 1 (82.1 kg) were acquired following a bolus injection of 13.94 mCi. Data from subject 2 (85.5 kg) were acquired following a bolus injection of 19.73 mCi. No pharmacological or behavioral stimuli occurred before or during either scan. The reference region curve was derived from the cerebellum and was smoothed before fitting. A mask was used to identify 1004 voxels (voxel size: 2 mm × 2 mm × 2 mm, in MNI space) located in the precomissural striatum [23].

IV. Results

A. False Positive Rate and “Effective” Number of Parameters in Ideal Simulations

FPR increased at a saturable rate with number of bases for each model comparison metric (Fig. 5). There was no overall dependence of FPR on noise, although some variation of FPR between noise levels can be observed. The standard deviation of the FPR determined for the combination of voxel noise and 288 bases was 1.48 ± 0.12%. All three model comparison metrics demonstrated similar overall behavior. However, ΔBIC yielded consistently lower FPR than ΔAICc or F. FPR was uniformly below 5% for all model comparison metrics when calculated with p full = 7. For space considerations, only results from ΔAICc will be shown for the remainder of the Results section. The “effective” number of parameters, pfulleff,5%, necessary to achieve a FPR of 5% is plotted versus noise level and number of bases in Fig. 6, for ΔAICc. Results for ΔBIC and F can be found in the Supplemental2. pfulleff,5% increased at a saturable rate with number of bases for all metrics. pfulleff,5% varied between 5-6.3, with no dependence on noise.

Fig. 5.

Fig. 5.

False positive rate as a function of noise and number of bases for A) F-statistic, B) ΔAICc and C) ΔBIC. FPR is determined from 10,000 pairs of fits with the full and restricted models, for each unique combination of noise and number of bases. FPR increases with number of bases and appears to saturate. There is no overall trend with noise. All model comparison metrics demonstrate similar overall behavior, but BIC is considerably more conservative.

B. Fitting of 3D and 4D Phantom Using plpntPETeff

Fig. 7 shows a grid of sample significance masks for the 3D phantom using ΔAICceff,5% at different levels of cluster-size thresholding for voxel-level noise. The background surrounding the center of the phantom became decreasingly noisy as the cluster-size threshold was increased. Fig. 8 shows FPR for the 3D and 4D phantoms, respectively, as a function of number of bases and cluster-size threshold. For both the 3D and 4D phantoms, FPR appears to have been preserved at a constant level across different numbers of bases. FPR decreased with increased cluster-size threshold. A slight inflation of false positives was observed for low-resolution libraries (i.e., fewer bases), compared to all other libraries. This inflation was more prominent in the 4D phantom. In the 3D phantom (Fig. 8A), a cluster-size threshold of 18 pixels eliminated all false positives noncontiguous with the positive region. In the 4D phantom (Fig. 8B), a cluster-size threshold of 30 voxels eliminated nearly all false positives noncontiguous with the positive region.

Fig. 7.

Fig. 7.

Select 2D parametric binary images for ΔAICceff < 0 in 3D phantom for various fitting libraries and cluster-size thresholds. White indicates a pixel for which the full model is determined to be superior to the restricted model. The 10 pixel x 10 pixel positive region is visible in the center of the phantom; the rest of the phantom is null. FPR decreases with increased cluster-size threshold but remains stable across number of bases. At low-resolution libraries (<100 bases), there is a slight inflation of FPR.

Fig. 8.

Fig. 8.

False positive rate (determined with ΔAICceff) as a function of number of bases and cluster-size threshold in A) 3D phantom and B) 4D phantom. FPR decreases with increased cluster-size threshold, but decreases more slowly in the 4D phantom, as indicated by the more gradual color gradient moving upwards. FPR remains largely stable across number of bases, with a slight inflation in FPR at low-resolution libraries (<100 bases), apparent in the 4D phantom.

C. Fitting of Human Baseline [11C] Raclopride Data

Application of ΔAICceff,5% for significance testing on human null PET data demonstrated stable FPR across basis libraries of different sizes. Average FPR values (using both subjects) are shown in Table I. Supplemental Table I shows results calculated without the correction (p full = 7), for comparison. In this analysis, we defined any voxel for which ΔAICceff,5% < 0 as a false positive. Without any cluster-size thresholding, FPR was ~15%. Fig. 9 visualizes the binary significance images for ΔAICceff,5% < 0 fitted with 30 bases and thresholded at a cluster-size of 9 voxels, in both subjects. A 9-voxel cluster-size threshold gave an average FPR of ~5% for all fitting libraries. The spatial locations of significant voxels differed between the subjects, suggesting that the significant clusters are, indeed, false positives. All significant voxels were eliminated at a cluster-size threshold of 24 for both subjects. A slightly inflated FPR was produced for libraries <60 bases.

TABLE I.

FPR as Determined With ΔAICceff in Human Data

Number of bases
in fitting library
30 60 90 204 306
cluster threshold
1 15.4% 14.4% 14.7% 14.8% 14.7%
3 12.1% 11.3% 11.9% 11.8% 11.9%
9 5.7% 5.4% 5.6% 5.0% 4.6%
15 3.1% 2.9% 2.9% 2.8% 2.3%
24 0.0% 0.0% 0.0% 0.0% 0.0%

Fig. 9.

Fig. 9.

Binary images for ΔAICceff < 0 with a cluster-size threshold of 9 voxels in null human PET scans. A) Data for subject 1 fitted with 30 bases. B) Data for subject 2 fitted with 30 bases. White indicates a significant voxel. All 4 contiguous coronal slices of the precommissural striatum are shown in AAL space. Voxel size is 2 mm × 2 mm × 2 mm. FPR is ~5% for both subjects. The two subjects show different spatial pattern of significant voxels, indicating that activated clusters are false positives.

V. Discussion

We have demonstrated a dependence of FPR on the number of bases used in the implementation of the full lp-ntPET model. To alleviate this dependence, we developed a correction to the number of parameters in the full model, p full, yielding a new parameter defining the “effective” number of parameters, pfulleff. pfulleff depends solely on the model and the number of bases. By using pfulleff to calculate any standard model comparison metric, the dependence of FPR on number of bases can be eliminated. When applying pfulleff,5% to ideal data, the corrected model comparison metrics yield a consistent FPR of 5%.

A. FPR and pfulleff versus Resoiution of Basis Library

Intuitively, the greater the number o bases, the more densely parameter space is covered by the discretized parameters. Without adapting significance testing to properly reflect the greater or lesser coverage of parameter space, the chance of overfitting will be greater or lesser, accordingly. We have demonstrated that this is true by showing that FPR increased with number of bases if 7 parameters were stipulated in the full model. In ideal simulated data, if p full = 7, FPR was consistently found to be lower than the expected 5%. in other words, the standard model comparison metrics, F, AICc, and BIC, over-penalized the full model for number of parameters. This indicates that, in fact, the full model behaved as if it contained less than 7 parameters.

Our results confirm our hypothesis that the 3 discretized parameters in the basis function implementation of the full lp-ntPET model assert fractional degrees of freedom, and thus 4 < pfulleff <7. The discretized parameters should not be treated as full parameters during statistical testing because they do not span their full parameter spaces. pfulleff adjusts the number of parameters in the full model to reflect the number of bases. As more bases are included, the resolution of the library is increased, and pfulleff increases asymptotically towards full apparent degrees of freedom. In practice, model parameters are correlated and are not completely identifiable, so pfulleff approaches a value less than 7.

B. Success of pfulleff,5% in Phantom and Human Data

Realistic data do not perfectly adhere to all assumptions of linearized reference tissue models [24]. The assumption of uncorrelated noise between the dependent variables and the independent variable, CT, is not strictly true. Thus, an FPR that exceeds 5% was expected. Cluster-size thresholding is necessary as a secondary method to control for false positives at the image level. Previous work showed that a cluster-size threshold of 15 voxels was necessary to achieve a 1% cluster-wise FPR (one in 100 activated clusters was a false positive) [14]. Here, we chose to define false positives on the voxel level. A cluster-size threshold of ~9 voxels was sufficient to achieve a 5% voxel-level FPR in a 4D phantom of mostly null voxels. A threshold of 15-18 voxels was necessary to achieve a 1% voxel-level FPR. Comparable cluster-size thresholds did, in fact, achieve similar FPRs in both the human data and the 4D phantom data.

The relationship between decreased FPR and increased cluster-size threshold was nearly identical between the 4D phantom and human data, as seen in the Table I and Fig. 8B. Both phantom and human data demonstrated slightly inflated FPR when the number of bases is <100. This trend is contrary to what was observed without the correction to P full (Fig. 5), and suggests that pfulleff,5% overcompensated for low-resolution libraries. Without cluster-size thresholding, the human data had fewer false positives than the 4D phantom (15% vs. 21%). This can be understood by recognizing that false positives, calculated on the voxel level, were increased by the presence of a true positive cluster in the phantom. This positive cluster acted as a ‘seed’ for nearby false positive voxels, which allowed them to survive cluster thresholding. In addition, correlation between voxels was not introduced in the simulated data. It is possible that the correlation between voxels in the human data resulted in a lower FPR.

C. Limitations of the Simulations and the Model

1). FPR and Noise:

Our initial investigation of bootstrapping the data (data not shown) suggests the ripples seen in Fig. 5 are an artifact of the limited number of simulated data sets. Note that 10,000 simulated curves yields fewer than 500 samples for which F >Fcrit, ΔAICc < 0, or ΔBIC < 0. Thus, these critical thresholds are essentially determined by less than 5% of samples and are therefore not fully stable.

Theoretically, FPR should not depend on noise level. While it may seem intuitive that increased noise should result in higher FPR, the strength in using model comparison as a method of determining false positives is that it is a “ratio method”. As noise increases, the error in the fit increases proportionally for both models. Thus, the ratio of the errors does not change appreciably and the calculated FPR remains fairly stable across noise levels. Note that at zero noise, all model comparison metrics are undefined for null data due to division by zero.

2). 4D Phantom Data vs. Human Data:

There are some notable differences between the simulated phantom data and human data. In the 4D phantom, the location of the positive cluster was known, and thus easily distinguished from false positives. In the human data, all voxels determined to have a significant time-varying signal were considered to be false positives. Furthermore, noise and time-varying responses in the phantom were generated randomly and independently for each voxel. In real data, some noise correlation is expected as a result of the reconstruction process. Correlation of the timing and amplitude of the biological signal is also expected between neighboring voxels. Both noise and biological correlation would increase the similarity of the TACs in neighboring voxels. As a result, the cluster-size distribution of positive voxels could be skewed.

3). Other Factors That May Affect Selectivity:

This study explores solely the effect of number of bases on the selectivity of the model. However, other factors, such as time-frame binning, tracer kinetic characteristics, and the selection of the bases themselves, may also affect FPR.

4). Varying Sensitivity of Discrete Parameters:

The discretized parameters, tD, tP, and α, have differential effects on the shape of the basis function. Thus, both the number of bases and which bases are included in the library will affect the fit to the data by the full model. Not all bases are equally able to describe the noise that occurs in PET data. When varying the size of the libraries, we sought to be as equitable as possible, adding and removing the same number of values for each parameter. However, while tD and tP are discretized at the frame resolution, α is continuous and does not affect the timing of the response function in a linear manner. Thus, the incrementing of α values could not be carefully controlled. As a result, it is difficult to eliminate the effect of parameter sensitivity in the incrementing of bases.

Nonetheless, our primary goal was to explore how the size of a discretized parameter space affects the apparent degrees of freedom of the model, i.e. its “effective” number of parameters. The selection of bases is a topic for a separate study that explores the sensitivity of the model to different temporal patterns of true positives.

D. Selectivity vs. Sensitivity

This work does not address the sensitivity of the full model. FPR relates only to the selectivity of the model. However, there is an indirect relationship. By correcting p full, we can increase the resolution of the fitting library without inflating FPR. This could offer an indirect benefit to sensitivity because higher resolution libraries should have better ability to detect true positive signals of varying and unknown temporal profiles.

E. Broader Implications

Our observations regarding lp-ntPET and its basis function libraries could have broad implications that extend to other models containing parameters that do not fully span their parameter spaces. SRTM is another example of a kinetic model that is implemented with basis functions for computational efficiency [25]. SRTM is often evaluated against other candidate models for characterization of novel tracers usingmodel comparison metrics [26-29]. Our findings suggest that the number of parameters stipulated for SRTM during model comparison may require a correction based on the number of basis functions used. Spectral analysis, as applied to PET data [30], is also implemented using a limited number of exponential basis functions [31, 32]. While model comparison may not directly apply to spectral analysis, we speculate that the number of basis functions used affects the apparent model degrees of freedom during parameter estimation. Degrees of freedom are not only implicated in model comparison, but in statistical evaluation of linear models, in general.

Beyond our application of interest, we conjecture that any form of constrained optimization may impact a model’s apparent degrees of freedom. Non-negative fitting is commonly used to estimate parameters that, by their nature, can only be positive. Other boundaries and constraints placed on an estimated parameter may also limit the parameter’s effective degrees of freedom.

F. Practical Application to Other Data Sets

The method for determining pfulleff presented in this work could be applied to any other tracer and scanner for estimating a transient signal that affects the tracer uptake, which cannot be modeled adequately with time-invariant parameters alone:

1) Use the known tracer kinetic constants, typical measurement variance for the scanner, and an arterial input function for the tracer to generate a large set of “null” data using the appropriate kinetic model and the noise model defined in (9). Null data should be simulated without any signal described by the basis functions.

2) Fit null data with the restricted model and the full model, using basis libraries of varying sizes. When increasing the size of the library, add values for each parameter within a given range, such that for each successive library, the density of sampling increases but the span of the parameter space does not. Add the same number of values to each successive library and space values for each parameter spaced as evenly as possible. Compute the distribution of the desired model comparison metric from the fits to both models.

3) Define the desired FPR on the left-hand-side of (6)-(8). Incorporate SSE full, SSEres, pres, n from each pair of fits into the right-hand-side of (6)-(8). Ideally, the FPR would be selected considering prior information about the model’s receiver operating characteristic. The FPR of 5% for this work was selected arbitrarily based on the conventional statistical threshold of p = 0.05.

4) Solve for pfulleff numerically, for every basis library.

VI. Conclusion

We have introduced the concept of “effective” number of parameters for models that estimate variables from a discrete set of values. We showed that a discretized parameter contributes only a fractional degree of freedom to the model. The discretized parameter tends towards a full parameter as it is allowed to take on more values. The model comparison process is necessary for controlling false positive results that erroneously indicate the presence of a transient time-varying signal. However, the dependence of FPR on number of bases means that the selectivity of lp-ntPET depends on its implementation; selectivity should depend solely on the noise in the data and the model used. We have developed adaptive model comparison metrics that incorporate peff to properly account for the coverage of parameter space by the discretized parameters. Applying these adaptive metrics allows for a potential increase in sensitivity without a concomitant decrease in selectivity, as more bases are used.

Supplementary Material

supp1-2969425

Acknowledgment

H. Liu and E. D. Morris would like to thank Dr. Edward Soares for his helpful statistical discussions.

This work was supported in part by the National Institutes of Health under grant R01DA038709 (Morris).

Footnotes

1

For clarity, we have augmented the superscript of pfulleff and ΔAICceff with ‘,5%’ when referring to values specifically calculated for FPR = 5%

2

Supplementary materials are available in the supporting documents /multimedia tab

Contributor Information

Heather Liu, Department of Biomedical Engineering, Yale University and the Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT 06520 USA.

Evan D. Morris, Department of Biomedical Engineering, Yale University, the Department of Radiology and Biomedical Imaging, and the Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520 USA

References

  • [1].Constantinescu CC, Yoder KK, Kareken DA, Bouman CA, O’Connor SJ, Normandin MD, and Morris ED, “Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method,” Phys Med Biol, vol. 53, no. 5, pp. 1353–67, March 7, 2008. [DOI] [PubMed] [Google Scholar]
  • [2].Morris ED, Constantinescu CC, Sullivan JM, Normandin MD, and Christopher LA, “Noninvasive visualization of human dopamine dynamics from PET images,” Neuroimage, vol. 51, no. 1, pp. 135–44, May 15, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Normandin MD, and Morris ED, “Estimating neurotransmitter kinetics with ntPET: a simulation study of temporal precision and effects of biased data,” Neuroimage, vol. 39, no. 3, pp. 1162–79, February 1, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Normandin MD, Schiffer WK, and Morris ED, “A linear model for estimation of neurotransmitter response profiles from dynamic PET data,” Neuroimage, vol. 59, no. 3, pp. 2689–99, February 01, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Morris ED, Yoder KK, Wang C, Normandin MD, Zheng QH, Mock B, Muzic RF Jr., and Froehlich JC, “ntPET: a new application of PET imaging for characterizing the kinetics of endogenous neurotransmitter release,” Mol Imaging, vol. 4, no. 4, pp. 473–89, Oct-Dec, 2005. [DOI] [PubMed] [Google Scholar]
  • [6].Normandin MD, and Morris ED, “Temporal resolution of ntPET using either arterial or reference region-derived plasma input functions,” Conf Proc IEEE Eng Med Biol Soc, vol. 1, pp. 2005–8, 2006. [DOI] [PubMed] [Google Scholar]
  • [7].Wang S, Kim S, Cosgrove KP, and Morris ED, “A framework for designing dynamic lp-ntPET studies to maximize the sensitivity to transient neurotransmitter responses to drugs: Application to dopamine and smoking,” NeuroImage, vol. 146, pp. 701–714, February/1/, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Morris ED, Kim SJ, Sullivan JM, Wang S, Normandin MD, Constantinescu CC, and Cosgrove KP, “Creating dynamic images of short-lived dopamine fluctuations with lp-ntPET: dopamine movies of cigarette smoking,” J Vis Exp, no. 78, August 06, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Cosgrove KP, Wang S, Kim SJ, McGovern E, Nabulsi N, Gao H, Labaree D, Tagare HD, Sullivan JM, and Morris ED, “Sex differences in the brain’s dopamine signature of cigarette smoking,” The Journal of neuroscience : the official journal of the Society for Neuroscience, vol. 34, no. 50, pp. 16851–5, December 10, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Johansson J, Hirvonen J, Lovro Z, Ekblad L, Kaasinen V, Rajasilta O, Helin S, Tuisku J, Siren S, Pennanen M, Agrawal A, Crystal R, Vainio PJ, Alho H, and Scheinin M, “Intranasal naloxone rapidly occupies brain mu-opioid receptors in human subjects,” Neuropsychopharmacology, March 13, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Constantinescu CC, Bouman C, and Morris ED, “Nonparametric extraction of transient changes in neurotransmitter concentration from dynamic PET data,” IEEE Trans Med Imaging, vol. 26, no. 3, pp. 359–73, March, 2007. [DOI] [PubMed] [Google Scholar]
  • [12].Angelis GI, Gillam JE, Ryder WJ, Fulton RR, and Meikle SR, “Direct Estimation of Voxel-Wise Neurotransmitter Response Maps From Dynamic PET Data,” IEEE Trans Med Imaging, vol. 38, no. 6, pp. 1371–1383, June, 2019. [DOI] [PubMed] [Google Scholar]
  • [13].Ichise M, Liow JS, Lu JQ, Takano A, Model K, Toyama H, Suhara T, Suzuki K, Innis RB, and Carson RE, “Linearized reference tissue parametric imaging methods: application to [11C]DASB positron emission tomography studies of the serotonin transporter in human brain,” J Cereb Blood Flow Metab, vol. 23, no. 9, pp. 1096–112, September, 2003. [DOI] [PubMed] [Google Scholar]
  • [14].Kim SJ, Sullivan JM, Wang S, Cosgrove KP, and Morris ED, “Voxelwise lp-ntPET for detecting localized, transient dopamine release of unknown timing: Sensitivity Analysis and Application to Cigarette Smoking in the PET Scanner,” Human Brain Mapping, vol. 35, no. 9, pp. 4876–4891, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Akaike H, “A new look at the statistical model identification,” Automatic Control, IEEE Transactions on, vol. 19, no. 6, pp. 716–723, December/06/, 1974. [Google Scholar]
  • [16].Schwarz G, “Estimating the Dimension of a Model,” Ann. Statist, vol. 6, no. 2, pp. 461–464, 1978/03, 1978. [Google Scholar]
  • [17].Lammertsma AA, and Hume SP, “Simplified reference tissue model for PET receptor studies,” NeuroImage, vol. 4, no. 3 Pt 1, pp. 153–8, December, 1996. [DOI] [PubMed] [Google Scholar]
  • [18].Muzic RF Jr., and Cornelius S, “COMKAT: compartment model kinetic analysis tool,” J Nucl Med, vol. 42, no. 4, pp. 636–45, April, 2001. [PubMed] [Google Scholar]
  • [19].Pappata S, Dehaene S, Poline JB, Gregoire MC, Jobert A, Delforge J, Frouin V, Bottlaender M, Dolle F, Di Giamberardino L, and Syrota A, “In vivo detection of striatal dopamine release during reward: a PET study with [(11)C]raclopride and a single dynamic scan approach,” Neuroimage, vol. 16, no. 4, pp. 1015–27, August, 2002. [DOI] [PubMed] [Google Scholar]
  • [20].Morris ED, Fisher RE, Alpert NM, Rauch SL, and Fischman AJ, “In vivo imaging of neuromodulation using positron emission tomography: Optimal ligand characteristics and task length for detection of activation,” Human Brain Mapping, vol. 3, no. 1, pp. 35–55, 1995. [Google Scholar]
  • [21].Fisher RE, Morris ED, Alpert NM, and Fischman AJ, “In vivo imaging of neuromodulatory synaptic transmission using PET: A review of relevant neurophysiology,” Human Brain Mapping, vol. 3, no. 1, pp. 24–34, 1995. [Google Scholar]
  • [22].Mazoyer BM, Huesman RH, Budinger TF, and Knittel BL, “Dynamic PET data analysis,” J Comput Assist Tomogr, vol. 10, no. 4, pp. 645–53, Jul-Aug, 1986. [DOI] [PubMed] [Google Scholar]
  • [23].Martinez D, Slifstein M, Broft A, Mawlawi O, Hwang DR, Huang Y, Cooper T, Kegeles L, Zarahn E, Abi-Dargham A, Haber SN, and Laruelle M, “Imaging human mesolimbic dopamine transmission with positron emission tomography. Part II: amphetamineinduced dopamine release in the functional subdivisions of the striatum,” J Cereb Blood Flow Metab, vol. 23, no. 3, pp. 285–300, March, 2003. [DOI] [PubMed] [Google Scholar]
  • [24].Salinas CA, Searle GE, and Gunn RN, “The simplified reference tissue model: model assumption violations and their impact on binding potential,” J Cereb Blood Flow Metab, vol. 35, no. 2, pp. 304–11, February, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Gunn RN, Lammertsma AA, Hume SP, and Cunningham VJ, “Parametric imaging of ligand-receptor binding in PET using a simplified reference region model,” Neuroimage, vol. 6, no. 4, pp. 279–87, November, 1997. [DOI] [PubMed] [Google Scholar]
  • [26].Alves IL, Vallez Garcia D, Parente A, Doorduin J, Dierckx R, Marques da Silva AM, Koole M, Willemsen A, and Boellaard R, “Pharmacokinetic modeling of [(11)C]flumazenil kinetics in the rat brain,” EJNMMI Res, vol. 7, no. 1, pp. 17, December, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Kessler M, Mamach M, Beutelmann R, Lukacevic M, Eilert S, Bascunana P, Fasel A, Bengel FM, Bankstahl JP, Ross TL, Klump GM, and Berding G, “GABAA Receptors in the Mongolian Gerbil: a PET Study Using [(18)F]Flumazenil to Determine Receptor Binding in Young and Old Animals,” Mol Imaging Biol, May 17, 2019. [DOI] [PubMed] [Google Scholar]
  • [28].Yaqub M, Boellaard R, van Berckel BN, Tolboom N, Luurtsema G, Dijkstra AA, Lubberink M, Windhorst AD, Scheltens P, and Lammertsma AA, “Evaluation of tracer kinetic models for analysis of [18F]FDDNP studies,” Mol Imaging Biol, vol. 11, no. 5, pp. 322–33, Sep-Oct, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Yaqub M, Boellaard R, van Berckel BN, Ponsen MM, Lubberink M, Windhorst AD, Berendse HW, and Lammertsma AA, “Quantification of dopamine transporter binding using [18F]FP-beta-CIT and positron emission tomography,” J Cereb Blood Flow Metab, vol. 27, no. 7, pp. 1397–406, July, 2007. [DOI] [PubMed] [Google Scholar]
  • [30].Cunningham VJ, and Jones T, “Spectral analysis of dynamic PET studies,” J Cereb Blood Flow Metab, vol. 13, no. 1, pp. 15–23, January, 1993. [DOI] [PubMed] [Google Scholar]
  • [31].Myers JFM, Rosso L, Watson BJ, Wilson SJ, Kalk NJ, Clementi N, Brooks DJ, Nutt DJ, Turkheimer FE, and Lingford-Hughes AR, “Characterisation of the contribution of the GABAbenzodiazepine α1 receptor subtype to [(11)C]Ro15-4513 PET images,” Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism, vol. 32, no. 4, pp. 731–744, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Turkheimer F, Moresco RM, Lucignani G, Sokoloff L, Fazio F, and Schmidt K, “The Use of Spectral Analysis to Determine Regional Cerebral Glucose Utilization with Positron Emission Tomography and [18F]Fluorodeoxyglucose: Theory, Implementation, and Optimization Procedures,” Journal of Cerebral Blood Flow & Metabolism, vol. 14, no. 3, pp. 406–422, 1994/May/01, 1994. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-2969425

RESOURCES