Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2021 Feb 22;23(3):875–890. doi: 10.1093/biostatistics/kxaa060

Assessing risk model calibration with missing covariates

Yei Eun Shin 1, Mitchell H Gail 2, Ruth M Pfeiffer 3,
PMCID: PMC9608650  PMID: 33616159

Summary

When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case–control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, Inline graphic, to expected events, Inline graphic, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate Inline graphic. We derive analytic variance formulas for Inline graphic with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.

Keywords: Case-cohort study, External validation, Missing, Model calibration, Nested case–control study, Pseudo-risk model, Survey calibration, Weight adjustment

1. Introduction

Statistical models that predict risk of disease incidence or mortality following disease onset have applications in clinical and public health settings. They are used to inform decisions for preventive interventions or treatments and to identify high-risk individuals for intensive screening for early detection of disease.

Once a risk model is developed, before recommending it for broader use, one needs to assess how valid model predictions are, ideally in independent data. Two popular measures of predictive performance of a risk model are calibration and discrimination. Calibration assesses bias in model predictions, and discrimination quantifies how different the predicted risks are in individuals with events compared to those without events. Discrimination is typically measured by the area under the receiver operator characteristics curve (AUC) (Pepe, 2003, p. 67). Here, we focus on calibration, as unbiased predictions are a key model feature for clinical and public health applications. We estimate calibration using the ratio of the number of events Inline graphic predicted by the risk model, to the number of observed events, Inline graphic, that arise in an independent validation cohort, overall or in subgroups defined by predictors in the model or by risk deciles (Pfeiffer and Gail, 2017, Chapter 6). Other goodness of fit tests have been proposed, e.g., by Gong and others (2014). However, such comparisons are impeded when data on some of the model predictors are missing in the validation cohort. The example that motivated our work is an absolute risk model for second primary thyroid cancer (SPTC), developed using data from the Childhood Cancer Survivor Study (CCSS) in the USA, Canada, and Norway (Kovalchik and others, 2013). Validation of this model in the only two other such cohorts worldwide, the French and British childhood cancer survivors, was hampered by missing model predictors. In this article, we thus propose and study various approaches to accommodate missing data in the validation of risk prediction models. We assume the independent validation data arise from a two-phase sampling design from a well-defined cohort. In phase 1, the validation cohort is sampled from a superpopulation but not all model predictors are measured on all cohort members. Specifically, we consider data missing completely at random or at random due to phase 2 subsampling based on two common designs for epidemiologic studies, the case-cohort design (Prentice, 1986) and the nested case–control design (Langholz and Thomas, 1990). Some model predictors are observed on everyone in the cohort (phase 1), while others are only observed on individuals sampled into a second phase. A standard approach is to weigh the phase 2 sample back to the whole cohort, based on inverse probability of sampling weights (also called “design weights”). This method yields Horvitz–Thompson estimates of Inline graphic (Horvitz and Thompson, 1952). This approach, however, does not use any information available for individuals who were not in the phase 2 sample.

Deville and Särndal (1992) introduced weight adjustment methods to incorporate such information, referred to as phase 1 information, by adjusting weights to improve efficiency. The adjusted weights are computed such that the weighted-total of auxiliary statistics in phase 2 equals their total in the entire cohort. Auxiliary statistics are functions of variables that measured on everyone in phase 1 and thus can be computed for all cohort members. Most important for gains in efficiency are the specific choices of auxiliary statistics. Using all available phase 1 data directly as auxiliary statistics is often computationally burdensome and not efficient. Thus, only selected variables should be used, and additionally, the specific functional form through which these variables are used in the auxiliary statistics impacts improvements in efficiency through its relation with the design weights (Wu and Sitter, 2001; Breidt and Opsomer, 2017). We propose to use pseudo-risk estimates which we compute based on full cohort (phase 1) information, for efficiently calibrating the design weights. Our approach is compared to multiple imputation, which is also popular for handling missing data (Rubin, 2004; White and Royston, 2009). The remainder of the article is organized as follows. After introducing notation and the general set-up (Section 2), we present three approaches, classical sampling probability weighting, weight adjustment, and multiple imputation, when covariates in validation data are missing completely at random, or missing by design due to case-cohort or nested case–control subsampling (Section 3). In Section 4, we derive the variance estimators for weight adjusted estimates of the Inline graphic calibration measure. We compare these approaches in various simulated scenarios (Section 5) and for a real data example (Section 6), before closing with a discussion (Section 7). To avoid semantic confusion, the reader should distinguish “survey (or weight) calibration,” which is a method to improve efficiency of estimates, from “model calibration,” which assesses the degree of bias in a risk prediction model.

2. Notation and missing data set-up

2.1. Notation

The risk model Inline graphic estimates the probability of a dichotomous event, Inline graphic, occurring in the time interval of length Inline graphic given the predictors Inline graphic. Otherwise, Inline graphic. Unless needed for clarity, we also denote the model by Inline graphic. This simple formulation applies to several important problems in clinical medicine and public health. A model Inline graphic could be an absolute risk model, when Inline graphic denotes developing a specific disease in a defined age interval, Inline graphic, in the presence of competing risks. An absolute risk model is also relevant when, following diagnosis, Inline graphic represents death in Inline graphic from the diagnosed disease in the presence of competing causes of death. Absent competing risks, Inline graphic could refer to a pure risk model, for example when Inline graphic models overall survival after disease onset. We regard the risk model as fixed and assume that the data used to develop the model and those used for validation are independent.

We assume that a cohort of Inline graphic individuals is available to assess calibration (bias) of the risk model Inline graphic. For each individual Inline graphic, we observe the outcome, Inline graphic, and the time to event or censoring, Inline graphic where Inline graphic and Inline graphic denote the event time and the censoring time, respectively. Here, Inline graphic is assumed to be independent of Inline graphic given Inline graphic.

We call a model Inline graphic well calibrated in the cohort if Inline graphic for every value Inline graphic. If the model is well calibrated, then Inline graphic, overall or in subgroups (Pfeiffer and Gail, 2017, Chapter 6). Assuming that the model predictors Inline graphic are available for all subjects in the validation cohort, measures of model calibration thus typically compare the observed number of events in Inline graphic, Inline graphic and the expected number of events computed from the model, Inline graphic for a given risk projection period Inline graphic.

In the presence of censoring, Inline graphic is well defined as above, but the computing of Inline graphic needs to be modified, and for individuals who are censored before Inline graphic (for absolute risk models only administrative censoring or loss to follow-up need to be considered), risk is projected up to the censoring time Inline graphic. As the outcome for censored individuals is also only observable until time Inline graphic, this approach leads to unbiased assessment of calibration. The statistic we focus on in this article is the observed-to-expected ratio,

graphic file with name kxaa060-Equation1.gif (2.1)

which estimates the corresponding superpopulation quantity, Inline graphic. When all risk model predictors Inline graphic are observed, inference on Inline graphic derives from i.i.d. phase 1 sampling of Inline graphic from the superpopulation. For rare outcomes Inline graphic can be regarded as fixed and Inline graphic as a Poisson random variable. With missing data however, additional variation in estimates of Inline graphic arises from phase 2 sampling. We next describe settings with missingness and the impact on inference when Inline graphic has to be estimated. We emphasize that Inline graphic and Inline graphic are parameters of the sampled validation cohort Inline graphic whereas Inline graphic and Inline graphic are the corresponding means in the superpopulation.

2.2. Patterns of missing model predictors and inclusion probability

We divide the model predictors into two categories: predictors Inline graphic that are available on everybody in the validation cohort (phase 1), and predictors Inline graphic that are only observed for a subsample of individuals, those included in phase 2. We let Inline graphic denote a sampling indicator that is 1 if subject Inline graphic is included in phase 2 and 0 otherwise, for a given cohort Inline graphic.

The inclusion probabilities Inline graphic depend on the missingness mechanisms for Inline graphic. Under missing completely at random (MCAR), missingness of Inline graphic does not depend on any other information and Inline graphic is constant for all Inline graphic where Inline graphic is the Bernoulli sampling inclusion probability. Under missing at random (MAR), the patterns of the missing covariates Inline graphic depend on other observed quantities. We consider the two most popular subsampling strategies for cohorts that fall into the MAR category, the case-cohort (CC) design (Prentice, 1986) and the nested case–control (NCC) design (Langholz and Thomas, 1990), where missingness of Inline graphic depends on outcome status, Inline graphic. These designs are particularly relevant for large cohorts and rare outcomes, where it is more cost-effective to measure expensive covariates on all subjects who experience the event of interest during follow-up (cases), but only on a small subset of individuals who have not experienced the event (controls). For the CC design, a random subcohort is selected at the beginning of the follow-up with a constant inclusion probability Inline graphic and all cases (Inline graphic) that develop outside the subcohort are included with Inline graphic. For the NCC design, every time a case develops during follow-up Inline graphic individuals from those at risk are selected, and Inline graphic are measured for all cases (Inline graphic if Inline graphic) and those selected controls. Following Samuelsen (1997), Inline graphic for Inline graphic is Inline graphic where the product is taken over all Inline graphic’s with Inline graphic and Inline graphic; Inline graphic is the cohort entry time of subject Inline graphic, Inline graphic is the time to event or censoring, and Inline graphic is the number of those at risk at Inline graphic. For MAR settings other than CC and NCC the methods we develop here can be applied as well, assuming that the inclusion probabilities are known or can be estimated under the MAR assumption.

In summary, the inclusion probabilities for individuals Inline graphic are

graphic file with name kxaa060-Equation2.gif (2.2)

3. Estimation of the expected number of events with missing covariates

3.1. Adjusted inclusion probability weighting

As the model predictors Inline graphic are completely observed for all individuals in phase 2, an estimate of Inline graphic, the expected number of events that occurred in the follow-up period Inline graphic, is

graphic file with name kxaa060-Equation3.gif (3.3)

where Inline graphic is the inverse of the inclusion probability defined in (2.2).

However, Inline graphic, also known as the Horvitz–Thompson estimate (Horvitz and Thompson, 1952), only uses phase 2 information. We thus use survey calibration to incorporate information on the variables Inline graphic and possibly additional variables that are available on everyone in the cohort into the weights, to increase the efficiency of the weighted estimate Inline graphic. Survey calibration adjusts the inclusion probability weights Inline graphic via auxiliary statistics, denoted by Inline graphic, which are based on the phase 1 variables, so that the weighted sum of Inline graphic equals the total sum of Inline graphic in the cohort, which is known (Deville and Särndal, 1992). The new adjusted weights Inline graphic satisfy

graphic file with name kxaa060-Equation4.gif (3.4)

for some distance measure Inline graphic. After weight adjustment, Inline graphic is estimated by

graphic file with name kxaa060-Equation5.gif (3.5)

The constrained optimization problem (3.4) is solved by applying Newton’s method to a Lagrangian function. Here, we use the distance measure Inline graphic (Case 2 of Deville and Särndal (1992)), which is called raking or exponential tilting. This choice is appealing as it always leads to a solution of the form Inline graphic, where Inline graphic is a vector of Lagrangian multipliers. In other words, the adjusted weight is a positive multiple of the original weight. See Appendix A of the Supplementary material available at Biostatistics online for further details on the computation. To avoid confusion, we henceforth call survey calibration weight adjustment while “calibration” refers to model calibration.

The weighted estimator (3.3) is design unbiased by the definition of the inclusion probability weights, i.e., Inline graphic and therefore unbiased for Inline graphic as Inline graphic. The weight-adjusted estimator defined in (3.5) is asymptotically design unbiased and Inline graphic, if (3.4) has a solution Inline graphic, and Inline graphic as Inline graphic, where Inline graphic is the phase 2 sample size. Thus the weight adjusted estimator is also asymptotically unbiased for Inline graphic. See the result 4 and remark of Deville and Särndal (1992, p. 379) for more details.

The key in gaining efficiency is choosing auxiliary statistics Inline graphic whose sum is strongly correlated with a target estimator that one would use if there were no missing data, in our case Inline graphic. The weight-adjusted estimator is asymptotically equivalent to an estimator (Deville and Särndal, 1992, Result 5) constructed by weighted linear regression of Inline graphic on Inline graphic in phase 2. Larger correlations between Inline graphic and Inline graphic lead to a smaller variance of Inline graphic, as the residual error from the regression is reduced. Note that the choice of Inline graphic does not impact the consistency of Inline graphic but can affect its variance.

For our target estimator, Inline graphic, we suggest Inline graphic as auxiliary statistics, where Inline graphic denotes a predicted value. We call Inline graphic pseudo-risk estimates, that can be viewed as the first term in a Taylor expansion of the true risk Inline graphic around Inline graphic: Inline graphic. Better prediction of Inline graphic results in smaller values of Inline graphic and therefore higher correlations between Inline graphic and Inline graphic and more precise estimates of Inline graphic. In Section 4.1 where we derive the analytic variance of Inline graphic, we discuss this point further.

To predict a univariate Inline graphic, one can use a generalized linear model (GLM) weighted by Inline graphic, e.g., weighted logistic or weighted linear regression in the phase 2 data, with predictors Inline graphic that are available in phase 1. Here, Inline graphic denotes a vector of other ancillary predictors for Inline graphic. For multivariate Inline graphic, one can fit marginal GLMs to each component of Inline graphic or use a multivariate linear regression model. As alternatives to Inline graphic, estimates of the cumulative baseline hazard, Inline graphic, or the cumulative hazard, Inline graphic, could be used. We then compute pseudo-risk estimates Inline graphic using the predicted values Inline graphic for all individuals in the cohort, even for those included in phase 2.

The inclusion weights are then calibrated using Inline graphic to improve the estimation of Inline graphic. The first component of Inline graphic is Inline graphic for all observations, leading to the constraint Inline graphic and thus standardizing the adjusted weights. For example, when Inline graphic is constant (e.g., under MCAR) and adjusted using only Inline graphic, then one would obtain Inline graphic for all Inline graphic where Inline graphic. These Inline graphic are also known as empirical weights (Robins and others, 1994, Section 6.1).

Figure 1 summarizes the weight adjustment methods based on pseudo-risk estimates.

Fig. 1.

Fig. 1

Diagram of obtaining design weights and adjusted weights for estimating Inline graphic, where Inline graphic are partially missing in a validation cohort. Without loss of generality we assume that Inline graphic and Inline graphic. The estimator of Inline graphic with adjusted weights, Inline graphic, which incorporate all available information in a full cohort via pseudo-risk estimates, is Inline graphic

3.2. Multiple imputation

As an alternative to weighting, Inline graphic can also be estimated using multiple imputation for the missing predictors (Rubin, 2004). In the first step one creates Inline graphic complete copies of a dataset, where each is based on imputed missing values from an imputation model for their predictive distribution given the observed data. Such models are implemented in many statistical packages, e.g., mice in R uses multivariate imputation by chained equations (van Buuren and Groothuis-Oudshoorn, 2011). In the second step, one computes the statistic of interest for each of the Inline graphic complete datasets, and uses their empirical mean as the final estimate.

Using this approach, we estimate Inline graphic as

graphic file with name kxaa060-Equation6.gif (3.6)

where Inline graphic is the mean of risks that are evaluated at imputed values Inline graphic for the Inline graphicth imputation. For phase 2 individuals whose Inline graphic are observed (i.e., for individuals Inline graphic with Inline graphic), Inline graphic for Inline graphic.

We use all observed data Inline graphic in the imputation, including survival information, as recommended by White and Royston (2009). If the sampling pattern is related to missing covariates, the inclusion probability weights, Inline graphic, should be used for the imputation as well.

When the imputation model is misspecified however, estimates of Inline graphic can be biased (Keogh and others, 2018). Unlike weight adjustment, the multiple imputation approach directly uses the imputed Inline graphic’s to estimate Inline graphic so that Inline graphic is susceptible to bias if the imputed values of Inline graphic are biased due to misspecification of the imputation model. We confirmed this observation in simulations (Scenario II in Section 5.2).

4. Variance estimation

4.1. Variance of Inline graphic

To ease the notational burden, we assume that the projection period Inline graphic is fixed, and omitting the subscript Inline graphic, use the notations Inline graphic. We decompose the variance by conditioning on the cohort Inline graphic as

graphic file with name kxaa060-Equation7.gif (4.7)

where the first term presents the variance due to phase 2 sampling from Inline graphic, and the second term presents the variance from sampling Inline graphic itself from an infinite superpopulation.

Conditionally on the cohort Inline graphic, the weights, Inline graphic or Inline graphic, and risk values, Inline graphic, are fixed, and only the inclusion indicators Inline graphic are random. For the weighted estimators (3.3) and (3.5) of Inline graphic, Inline graphic as Inline graphic has a Bernoulli distribution with Inline graphic. Thus, Inline graphic and Inline graphic for Inline graphic, where Inline graphic is a joint inclusion probability.

Letting Inline graphic, the variance of inverse probability weighted estimator (3.3) is

graphic file with name kxaa060-Equation8.gif (4.8)

which is estimated by

graphic file with name kxaa060-Equation9.gif (4.9)

where Inline graphic. Both Inline graphic and Inline graphic depend on the missingness mechanism: MCAR or CC samples are independent, and thus Inline graphic. NCC samples are dependent and Inline graphic where the product is taken over all Inline graphic such that Inline graphic, Inline graphic denotes the number of controls selected for each case, and Inline graphic is the number of subjects at risk at Inline graphic (Samuelsen, 1997).

The variance of the weight-adjusted estimator (3.5) is obtained following similar steps, with further details given in Appendix B of the Supplementary material available at Biostatistics online, as

graphic file with name kxaa060-Equation10.gif (4.10)

where Inline graphic denotes a residual of the adjusted-weighted linear model that regresses Inline graphic on Inline graphic with a regression coefficient Inline graphic. Stronger correlation of the auxiliary statistic Inline graphic with Inline graphic leads to smaller residuals, and thus a smaller first term in (4.10). Therefore, choosing the pseudo-risk estimate as an auxiliary statistic as suggested in Section 3.1 can lead to large improvements in efficiency of Inline graphic. An estimate of (4.10) is

graphic file with name kxaa060-Equation11.gif (4.11)

Note that we weigh each term using the inclusion probability weights Inline graphic, not Inline graphic, in (4.9) and (4.11) because the adjusted weights are tailored to estimate Inline graphic efficiently, but not the variance of its estimator. Comparing the variance formulas in (4.8) and (4.10) shows that weight adjustment only affects the first term by replacing Inline graphic with Inline graphic; the second terms in both formulas are identical. See also Appendix B of Supplementary material available at Biostatistics online for the variance estimation when Inline graphic is unknown and needs to be estimated.

The variance for the multiple imputation estimate (3.6), that accommodates the additional variation resulting from the imputations follows from Rubin’s formula (Rubin, 2004),

graphic file with name kxaa060-Equation12.gif (4.12)

where Inline graphic and Inline graphic. Formula (4.12) can also be viewed as arising from the variance decomposition in (4.7) with conditioning on multiple imputed cohorts, Inline graphic, where Inline graphic is an estimate of Inline graphic for each Inline graphic. The first term in (4.12) estimates the mean of the variances of Inline graphic for each imputed dataset, and the second term estimates the variance between the imputed datasets.

4.2. Variance of Inline graphic

The variance of the target estimate, the ratio of observed and expected number of events, Inline graphic, also depends on the variability of the observed number of events, Inline graphic. When the outcome is rare, it is reasonable to assume that Inline graphic has a Poisson distribution with rate Inline graphic (Cameron and Trivedi, 2013, Chapter 1.1). We let Inline graphic and Inline graphic denote the mean and variance of the estimator Inline graphic, respectively, and the covariance between Inline graphic and Inline graphic is denoted by Inline graphic.

Using Taylor linearization,

graphic file with name kxaa060-Equation13.gif (4.13)

which is estimated by

graphic file with name kxaa060-Equation14.gif (4.14)

For each approach, Inline graphic is given in (4.9), (4.11), and (4.12), and Inline graphic is

graphic file with name kxaa060-Equation15.gif

Note that the first term of the covariance, Inline graphic, is estimated by the sum of cases’ risk values. For Inline graphic, (4.14) holds only when the imputation model is correctly specified and the estimate is unbiased.

4.3. Remark on efficiency of Inline graphic compared to Inline graphic

As already commented on in Section 3.1, Inline graphic (or Inline graphic) is (asymptotically) design unbiased (Section 3.1), and Inline graphic. Similarly, by the law of iterated expectations, Inline graphic. As a result, the variance decomposition (4.7) can be simplified to Inline graphic, and the variance expression in (4.13) is reduced to Inline graphic. If cohort data are fully available, Inline graphic and Inline graphic.

For longer projection periods Inline graphic, the expected number of events Inline graphic increases and leads to more precise estimates Inline graphic because Inline graphic gets smaller. We confirm this in simulation studies.

On the other hand, the relative efficiency (Inline graphic) of Inline graphic compared to Inline graphic expressed as the ratio of their variances,

graphic file with name kxaa060-Equation16.gif (4.15)

decreases as Inline graphic increases due to the increase in the conditional variance, Inline graphic. Nevertheless, weight adjustment mitigates this increase in Inline graphic so that efficiency losses are smaller than those seen for inclusion probability weighting; that is, Inline graphic for Inline graphic is smaller than Inline graphic for Inline graphic. We confirm these observations in simulation studies (e.g., Figure 2).

Fig. 2.

Fig. 2

Empirical relative efficiency (ratio of the empirical variance from full cohort to that from the two-phase estimate) of observed-to-expected ratios over Inline graphic; weight adjustment using pseudo-risk is the most efficient for all projection periods, compared to the other approaches using design weights and adjusted weights by phase 1 variables.

5. Simulations

We conducted two sets of simulations, the first to investigate the performance of the proposed pseudo-risk estimate as an auxiliary statistic for weight adjustment (Section 5.1) and the second to compare weight adjustment with multiple imputation (Section 5.2).

5.1. Efficiency of weight-adjusted calibration estimators using pseudo-risk as auxiliary statistic

We investigated the performance of weight adjustment compared to inverse probability weighting method for estimating Inline graphic. We also assessed the proposed auxiliary statistic, pseudo-risk estimates, by comparing to another choice of auxiliary statistics, namely all phase 1 covariates.

5.1.1. Risk model and data generation

We validated a pure risk model (that treats competing risks as random censoring), Inline graphic, developed from the United States and Canada childhood cancer survivor (CCSS) cohort, comprised of Inline graphic 5-year survivors of a childhood cancer. The hazard function is Inline graphic, where Inline graphic is a baseline hazard function, and Inline graphic are log-hazard ratios associated with the covariates Inline graphic. We assumed that Inline graphic includes the following binary risk factors: female gender (yes/no), “birth after 1970 (yes/no),” “age at the first primary cancer Inline graphic 15-year-old (yes/no),” “any thyroid nodule in life time (yes/no),” “any alkylating agents (yes/no)” and “any radiation treatment (yes/no),” and Inline graphic was the binary risk factor “any radiation to the neck (yes/no).” Table 1 shows the corresponding hazard ratio estimates, obtained from the CCSS cohort for the predictors of model 2 of Kovalchik and others (2013) where the outcome of interest was diagnosis of a second primary thyroid cancer (SPTC). The baseline hazard function Inline graphic was estimated by the Breslow estimate (see Figure A1, Appendix D in Supplementary material available at Biostatistics online). We treated death from competing causes as a censoring event. The observed number of SPTCs in the CCSS cohort for projections lengths Inline graphic, and Inline graphic years were Inline graphic, Inline graphic, Inline graphic, and Inline graphic. This model was regarded as fixed.

Table 1.

Hazard ratio estimates from the childhood cancer survival study (CCSS) cohort in the USA and Canada for the second primary thyroid cancer (SPTC) model predictors; all risk predictors are binary (yes = 1 or no = 0); “Radiation to neck” is considered to be a missing variable (Inline graphic) in a simulation study using resampled cohorts.

  Risk predictor Hazard ratio from CCSS
Inline graphic Birth year after 1970 1.69
Age at first primary cancer Inline graphic-year 3.05
Female 2.32
Thyroid nodule in life time 7.05
Any alkylating agent 1.63
Any radiation 1.40
Inline graphic Radiation to neck 6.01

We generated 500 validation cohorts of size Inline graphic by resampling subjects with replacement from the CCSS cohort. For each cohort, we used the following phase 2 sampling schemes: MCAR, MAR/CC, and MAR/NCC. To allow for comparisons across the designs, we first created the MAR/NCC subsample, with Inline graphic controls matched on time to each case, corresponding to a sample of Inline graphic unique individuals (Inline graphic varies between simulated cohorts because the number of cases is random and some controls are sampled repeatedly). We then created an MCAR subsample using Bernoulli sampling with Inline graphic where Inline graphic is the size of a full cohort (phase 1). The MAR/CC samples were the same individuals as used as MCAR samples, but in addition we included all cases with Inline graphic, and therefore the phase 2 CC sample size was slightly bigger than that of NCC or MCAR.

We assumed that Inline graphic “Radiation to neck,” a predictor with high hazard ratio (Inline graphic), was only available for phase 2 samples. We created a binary ancillary predictor Inline graphic where Inline graphic had a uniform distribution on the interval Inline graphic and was independent of Inline graphic, so that Inline graphic and Inline graphic.

5.1.2. Estimation of Inline graphic

We estimated Inline graphic by the weighted sums with three different weights: design weights, Inline graphic; adjusted weights using the model covariates Inline graphic and the ancillary predictor Inline graphic that are available in phase 1 as the auxiliary statistic, Inline graphic; and adjusted weights using the pseudo-risk as the auxiliary statistic, Inline graphic. To predict Inline graphic for the computation of the pseudo-risk estimates, Inline graphic, we used design-weighted logistic regression with predictors, Inline graphic, Inline graphic, Inline graphic, and Inline graphic.

We assessed the efficiency of estimates Inline graphic based on adjusted design weights when phase 1 variables Inline graphic, and ancillary predictors of Inline graphic, namely, Inline graphic, are used as auxiliary statistics, denoted by Inline graphic, and when pseudo-risk estimates, Inline graphic, are used as auxiliary statistics, denoted by Inline graphic.

5.1.3. Results

Table 2 summarizes Inline graphic estimates from Inline graphic validation cohorts resampled from the CCSS cohort. All estimates were unbiased with Means close to the true Inline graphic ratios computed from CCSS; 0.89, 0.88, 1.10, and 1.04 for Inline graphic and Inline graphic, respectively. The weight-adjusted estimates had smaller mean absolute deviations (Mads) and standard deviations (Sds) than the design weighted estimates for all designs and all values of Inline graphic. The estimates Inline graphic using adjusted weights based on the pseudo-risk improved the estimation efficiency more than those using Inline graphic directly for weight adjustment. This supports that efficiency gains depend on the choice of auxiliary statistics for weight adjustment, and pseudo-risk is a good choice because it is strongly correlated with the actual risk, as discussed in Section 3.1. Figure 2 shows the relative efficiencies (Res) compared to the full cohort analysis. While Re decreased with increasing Inline graphic for all missingness mechanisms and approaches, the decreases were mitigated by weight adjustment, especially when using pseudo-risk as the auxiliary statistic.

Table 2.

Summary of observed-to-expected ratio estimates, the precision of the variance estimates, and 95% confidence interval coverage, based on Inline graphic resampled cohorts from the CCSS cohort; Inline graphic denotes the design-weighted estimates; Inline graphic denotes the weight-adjusted estimates using Inline graphic; Inline graphic denotes the weight-adjusted estimates using Inline graphic.

+Inline graphic   FULL MCAR MAR/CC MAR/NCC Inline graphic
  Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic
5 Mean 0.88 0.89 0.88 0.88 0.89 0.88 0.88 0.91 0.88 0.89
Mad 0 0.061 0.048 0.028 0.058 0.047 0.027 0.106 0.079 0.049
Sd 0.283 0.293 0.286 0.284 0.293 0.286 0.284 0.324 0.304 0.294
Se 0.276 0.288 0.283 0.278 0.287 0.282 0.279 0.308 0.287 0.281
Cr 0.92 0.93 0.92 0.92 0.92 0.92 0.92 0.93 0.91 0.93
10 Mean 0.88 0.88 0.88 0.87 0.88 0.88 0.88 0.89 0.86 0.88
Mad 0 0.061 0.047 0.029 0.059 0.046 0.027 0.084 0.071 0.041
Sd 0.161 0.175 0.168 0.163 0.174 0.168 0.163 0.192 0.183 0.174
Se 0.165 0.182 0.174 0.168 0.181 0.174 0.169 0.195 0.178 0.172
Cr 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.92 0.94
20 Mean 1.10 1.10 1.10 1.09 1.10 1.10 1.10 1.11 1.06 1.10
Mad 0 0.086 0.066 0.040 0.084 0.065 0.038 0.091 0.093 0.045
Sd 0.106 0.148 0.130 0.115 0.147 0.130 0.114 0.154 0.151 0.122
Se 0.108 0.153 0.135 0.117 0.151 0.134 0.117 0.152 0.139 0.119
Cr 0.96 0.96 0.95 0.96 0.96 0.95 0.96 0.95 0.90 0.94
35 Mean 1.04 1.04 1.04 1.04 1.04 1.04 1.04 1.05 1.01 1.04
Mad 0 0.086 0.066 0.038 0.085 0.064 0.038 0.080 0.087 0.042
Sd 0.090 0.139 0.120 0.101 0.138 0.118 0.099 0.132 0.132 0.104
Se 0.094 0.140 0.121 0.104 0.139 0.120 0.104 0.134 0.124 0.106
Cr 0.96 0.95 0.94 0.96 0.94 0.94 0.96 0.95 0.90 0.95

Mean, mean of estimates; Mad, mean absolute deviation of estimates; Sd, standard deviation of estimates; Se, mean of estimated standard errors; Cr, coverage rate of 95% confidence intervals.

The coverage rates (Crs) of the 95% confidence intervals (CIs) for weighting approaches were near the nominal 95% level, except for Inline graphic. When few events are observed (Inline graphic), the normal approximation may not be fully appropriate. A 95% CI based on 500 simulations around an estimate of Inline graphic is Inline graphic, which includes almost all the coverages shown. The mean of the estimated standard errors (Ses) based on the variance formulas were close to the empirical Sds, indicating that the variance formulas given in Section 4 yield unbiased estimates.

5.2. Comparison of weight adjustment with multiple imputation

We compared the bias and efficiency in Inline graphic for weight adjustment and multiple imputation when the prediction models for Inline graphic used in the auxiliary statistic and the imputation models for Inline graphic are correct (Scenario I) and misspecified (Scenario II).

5.2.1. Risk model and data generation

We generated univariate covariates for a validation cohort of Inline graphic subjects as follows. For Scenario I (correctly specified prediction/imputation model), each of Inline graphic was sampled from a multivariate normal distribution with mean Inline graphic, variances Inline graphic, and Inline graphic and Inline graphic. For Scenario II (misspecified prediction/imputation model), Inline graphic and Inline graphic came from a bivariate normal distribution with variances Inline graphic and Inline graphic, and Inline graphic where Inline graphic followed a normal distribution with mean 0 and variance 1. Given Inline graphic, the event time Inline graphic for each cohort member was generated from an exponential distribution with parameter Inline graphic with Inline graphic, Inline graphic and Inline graphic. We assumed there is administrative censoring and thus observed Inline graphic, and the event indicator, Inline graphic. Under these parameter choices, Inline graphic. We sampled individuals into phase 2 using three missing mechanisms, MCAR, MAR/CC, and MAR/NCC, as described in Section 5.1. For the NCC design, we selected Inline graphic controls per case.

5.2.2. Estimation of Inline graphic

We estimated Inline graphic by Inline graphic and Inline graphic. For both scenarios, we predicted and imputed Inline graphic under the assumption that Inline graphic is linear in Inline graphic, which is correct for Scenario I but not for Scenario II. The prediction for Inline graphic for pseudo-risk computations was based on design-weighted linear regression, and we created Inline graphic imputed data sets with Inline graphic.

5.2.3. Results

Table 3 summarizes results for Inline graphic simulated validation cohorts. When the model for Inline graphic was correctly specified for prediction and imputation (Scenario I), all Means were near Inline graphic, suggesting that the Inline graphic estimates are unbiased. Using Inline graphic in the Inline graphic estimation was more efficient than Inline graphic, as shown in Section 5.1, and was almost equally efficient as Inline graphic; both Inline graphic and Inline graphic led to smaller Mad and Sd for all missingness mechanisms.

Table 3.

Summary of observed-to-expected ratio estimates, the precision of the estimates, and 95% confidence interval coverage, based on Inline graphic simulated validation cohorts.

  FULL MCAR MAR/CC MAR/NCC Inline graphic
  Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic Inline graphic /Inline graphic
Scenario I: when the model for missing data is correct.
 Mean 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
 Mad 0.030 0.036 0.030 0.031 0.036 0.030 0.031 0.032 0.030 0.030
 Sd 0.037 0.046 0.037 0.037 0.045 0.037 0.037 0.039 0.037 0.037
 Se 0.038 0.045 0.038 0.038 0.045 0.038 0.038 0.041 0.038 0.039
 Cr 0.96 0.95 0.96 0.95 0.95 0.96 0.96 0.96 0.96 0.96
Scenario II: when the model for missing data is misspecified.
 Mean 1.00 1.00 1.00 0.94 1.00 1.00 0.94 1.00 1.00 0.94
 Mad 0.030 0.035 0.032 0.073 0.035 0.031 0.077 0.032 0.031 0.078
 Sd 0.037 0.044 0.039 0.101 0.044 0.039 0.108 0.040 0.038 0.109
 Se 0.038 0.046 0.039 0.075 0.045 0.039 0.077 0.041 0.039 0.076
 Cr 0.96 0.96 0.96 0.92 0.96 0.95 0.92 0.96 0.96 0.93

Mean, mean of estimates; Mad, mean absolute deviation of estimates; Sd, standard deviation of estimates; Se, mean of estimated standard errors; Cr, coverage rate of 95% confidence intervals.

When the model for predicting and imputing Inline graphic was misspecified (Scenario II), both Inline graphic, and Inline graphic were unbiased, but Inline graphic was more precise than Inline graphic, although its standard deviation (Sd = 0.039) was not reduced as much as in Scenario I (Sd = 0.037). However, Inline graphic was biased, leading to biased estimates of Inline graphic (Mean Inline graphic). Moreover, Sds of Inline graphic (Inline graphic) were more than twice as big as those of Inline graphic (Inline graphic) for all missingness mechanisms. This confirmed that weight adjustment is more robust to model misspecification than multiple imputation.

The variance formula (4.14) for the weight adjustment approach worked well regardless of the correctness of the prediction model specification, as the coverages (Crs) of the 95% CIs for all approaches were near the nominal 95% level. A 95% CI based on 500 simulations around an estimate of Inline graphic is Inline graphic, which includes almost all the coverages shown. However, for multiple imputation, the variance formula (4.14) worked well only when the imputation model was correctly specified; 95% CI coverage for Scenario II was slightly subnominal, at 0.92.

6. Data example

To illustrate our methods, we assessed the calibration of an absolute risk model for second primary thyroid cancer (SPTC) (Kovalchik and others, 2013, Model 2), developed from the CCSS data and two nested case-control studies, the Late Effects Study Group and the Nordic CCSS, implemented in the R package ‘thyroid’ (https://dceg.cancer.gov/tools/risk-assessment/tcrat).

For validation, we used the independent cohort of Inline graphic French childhood cancer survivors, also used by Kovalchik and others (2013) for assessing model performance, who give the following details. During follow-up, Inline graphic SPTCs were observed in the French cohort. Censoring events were loss to follow-up and end of study. The predictors female gender (yes/no), “birth after 1970 (yes/no),” “age at the first primary cancer Inline graphic 15-year-old (yes/no),” “any alkylating agents (yes/no),” and “any radiation treatment (yes/no)” are fully observed in the cohort (Inline graphic). The predictors “any thyroid nodule in life time (yes/no)” and “any radiation to the neck (yes/no)” (Inline graphic) had Inline graphic values missing completely at random (MCAR). To better assess the performance of the methods, we further created a Inline graphic missing MCAR rate. To predict Inline graphic for the pseudo-risk computation, we separately regressed the components of Inline graphic on the phase 1 variables (Inline graphic), the two ancillary predictors “Hodgkin diagnosis (yes/no)” for the first primary cancer, and “radiation absorbed dose to thyroid (in Gy)” (Inline graphic), and the survival information (times to event or censoring, Inline graphic, and outcome variables, Inline graphic), using inclusion probability weighted logistic regression in phase 2. The same variables were used to impute Inline graphic in the multiple imputation with Inline graphic imputed datasets.

Table 4 summarizes the calibration estimates from the French CCSS. The model underestimated the true risk in the validation cohort by 12% based on inverse probability weighting and the weight adjusted estimates of Inline graphic, and by 9% based on multiple imputation, albeit not significantly. The weight adjustment based on pseudo-risk was much more efficient than using MCAR sampling weights, with standard errors of 1.24 for Inline graphic and 1.52 for Inline graphic. This was also reflected in the standard errors of the calibration estimates, Inline graphic. Multiple imputation resulted in larger standard errors for Inline graphic and Inline graphic.

Table 4.

Summary of calibration estimates from validating the absolute risk model for second primary thyroid cancer (Kovalchik and others, 2013, Model 2) where the covariates “Any radiation” and “Thyroid nodule in life time” are 44% missing completely at random (MCAR) in the independent validation cohort, French CCSS. Standard errors are in parentheses.

Method Inline graphic Inline graphic Inline graphic
Inverse probability weighting Inline graphic 35 31.26 (1.52) 1.12 (0.191)
Weight adjustment via pseudo-risk Inline graphic 35 31.28 (1.24) 1.12 (0.188)
Multiple imputation Inline graphic 35 32.18 (2.27) 1.09 (0.194)

7. Discussion

In this article, we proposed efficient auxiliary statistics for weight adjustment to estimate the number of events Inline graphic predicted from a risk model used to compute the ratio of observed to expected events, Inline graphic as a measure for calibration in an independent validation cohort in which some model covariates are missing. Our focus was to efficiently estimate the expected number of events from a risk model in an independent validation cohort, treating the risk model as fixed, while Shin and others (2020) used weight calibration to improve the efficiency of the risk estimates.

Weighting and multiple imputation are widely used to handle missing variables, and both methods require assumptions on the missingness mechanism. We considered three common missingness mechanisms encountered in cohort studies: missing completely at random and missing due to sampling individuals based on the case-cohort design and the nested case–control design. For each setting, the data can be viewed as arising from two-phase sampling. In phase 1, we measure certain variables on the entire cohort that is sampled from a superpopulation. Phase 2 consists of a subsample of the subjects for whom additional variables that were not available in phase 1 are obtained. The phase 2 sample thus has complete data on the risk model predictors. With weighting methods, the missingness mechanism defines the probabilities of inclusion in phase 2. Applying inverse probability of inclusion weights (Horvitz and Thompson, 1952) to the subjects with complete data reweights them to represent the entire validation cohort, but these weights can be inefficient. We used survey sampling methods (also called weight calibration, regression calibration, or model-assisted survey estimation) (Breidt and Opsomer, 2017) to obtain more efficient weights by utilizing phase 1 information. The key in improving efficiency is to find auxiliary statistics of the phase 1 data that are highly correlated with the statistic of interest that we would use if complete data were available in phase 1. We proposed a “pseudo-risk” as an auxiliary statistic, which can be computed from phase 1 data and which is highly correlated with the actual risk one would use with complete cohort data. We showed that using this “pseudo-risk” as an auxiliary statistic led to large improvements in the precision of estimates of Inline graphic obtained from the model. As a result, the precision of estimates Inline graphic was also improved.

In our analytic computations, we focused on the setting of rare outcomes, and assumed that the observed number of events follows a Poisson distribution. This is a practically relevant situation. However, when the Poisson assumption on Inline graphic cannot be made, one can estimate its variance from the phase 1 data as Inline graphic, which reduces to Inline graphic for rare outcomes.

We handled censoring by shortening the projection period to the actual observation time, which leads to unbiased estimates of Inline graphic over an average follow-up duration. An alternative approach is to add to the observed number of events the number expected to occur among censored individuals, e.g., as suggested by Li and others (2018). This approach allows one to base calibration assessment on the expected number of events for risks computed up to Inline graphic for all individuals (including censored subjects). It thus assesses Inline graphic over the full interval Inline graphic but requires estimating the conditional survival function given the risk estimates.

Multiple imputation had similar efficiency gains as weight adjustment when the model for imputing the missing data was correctly specified. However, when the imputation model was misspecified, multiple imputation leads to biased estimates, which was also observed by Keogh and others (2018) and Seaman and others (2012). In contrast, weight-adjusted estimates are asymptotically consistent, provided the design-based inclusion probabilities are known (Breidt and Opsomer, 2017; Deville and Särndal, 1992), even when the auxillary statistics (such as the pseudo-risk) are biased.

Some related literature should be mentioned. Ganna and others (2012) used inverse probability weighting to estimate pure covariate-specific risks from case-cohort and nested case–control designs, but they did not attempt to improve efficiency by weight adjustment or study estimates of the Inline graphic calibration measure. Whittemore and Halpern (2016) discussed a two-phase design, which is similar to a stratified case-cohort design, for comparing mean projected risks with observed risks. They used inverse probability weighting and discussed efficient stratification (partitioning) for sampling, but they did not consider weight adjustment for improving the efficiency of estimation for a given two-phase design.

Our approach for the nested case–control and case-cohort designs can also be extended to estimate the expected number of events in risk deciles for the whole population, extending e.g., Chambers and Dunstan (1986). However, more work is needed to derive the variance estimates of the corresponding calibration statistic for that setting. Moreover, when data are MCAR and covariates are missing on cases, then it is not clear which quantile to put a case with missing covariates in.

In summary, we found that weight adjustment based on pseudo-risks is a convenient and effective way to improve the precision of estimates of expected counts and observed-to-expected ratios in validation cohorts with missing risk factor information. In the examples we studied, such weight calibration yielded estimates at least as precise as multiple imputation and was robust to model misspecification.

Supplementary Material

kxaa060_Supplementary_Data

Acknowledgements

We thank Florence de Vathaire for access to the data of the French CCSS cohort.

Conflict of Interest: The authors declare no conflicts of interest.

Contributor Information

Yei Eun Shin, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA.

Mitchell H Gail, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA.

Ruth M Pfeiffer, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA.

Software

The R package rmodcal for assessing risk model calibration with missing covariates introduced in this paper is available at https://github.com/syeeun/rmodcal. Example code using a simulated data set is also provided in the package.

Supplementary materials

Supplementary material is available online at http://biostatistics.oxfordjournals.org.

Funding

The Intramural Research Program of the National Cancer Institute, Division of Cancer Epidemiology and Genetics.

References

  1. Breidt, F. J. and Opsomer, J. D. (2017). Model-assisted survey estimation with modern prediction techniques. Statistical Science 32, 190–205. [Google Scholar]
  2. Cameron, A. C. and Trivedi, P. K. (2013). Regression Analysis of Count Data, Vol. 53. Cambridge, UK: Cambridge University Press. [Google Scholar]
  3. Chambers, R. L. and Dunstan, R. (1986). Estimating distribution functions from survey data. Biometrika 73, 597–604. [Google Scholar]
  4. Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 376–382. [Google Scholar]
  5. Ganna, A., Reilly, M., de Faire, U., Pedersen, N., Magnusson, P. and Ingelsson, E. (2012). Risk prediction measures for case-cohort and nested case-control designs: an application to cardiovascular disease. American Journal of Epidemiology 175, 715–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gong, G., Quante, A. S., Terry, M. B. and Whittemore, A. S. (2014). Assessing the goodness of fit of personal risk models. Statistics in Medicine 33, 3179–3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 663–685. [Google Scholar]
  8. Keogh, R. H., Seaman, S. R., Bartlett, J. W. and Wood, A. M. (2018). Multiple imputation of missing data in nested case-control and case-cohort studies. Biometrics 74, 1438–1449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kovalchik, S. A., Ronckers, C. M., Veiga, L. H. S., Sigurdson, A. J., Inskip, P. D., De Vathaire, F. and others. (2013). Absolute risk prediction of second primary thyroid cancer among 5-year survivors of childhood cancer. Journal of Clinical Oncology 31, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Langholz, B. and Thomas, D. C. (1990). Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison. American Journal of Epidemiology 131, 169–176. [DOI] [PubMed] [Google Scholar]
  11. Li, L., Greene, T. and Hu, B. (2018). A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data. Statistical Methods in Medical Research 27, 2264–2278. [DOI] [PubMed] [Google Scholar]
  12. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Medicine. Oxford, U.K.: Oxford University Press. [Google Scholar]
  13. Pfeiffer, R. M. and Gail, M. H. (2017). Absolute Risk: Methods and Applications in Clinical Management and Public Health. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. New York: CRC Press. [Google Scholar]
  14. Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11. [Google Scholar]
  15. Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. [Google Scholar]
  16. Rubin, D. B. (2004). Multiple Imputation for Nonresponse in Surveys, Volume 81. Hoboken, NJ: John Wiley & Sons. [Google Scholar]
  17. Samuelsen, S. O. (1997). A pseudolikelihood approach to analysis of nested case-control studies. Biometrika 84, 379–394. [Google Scholar]
  18. Seaman, S. R., Bartlett, J. W. and White, I R. (2012). Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Medical Research Methodology 12, 46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Shin, Y. E., Pfeiffer, R. M., Graubard, B. I. and Gail, M. H. (2020). Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort. Biometrics 76, 1087–1097. [DOI] [PubMed] [Google Scholar]
  20. van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software 45, 1–67. [Google Scholar]
  21. White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine 28, 1982–1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Whittemore, A. S. and Halpern, J. (2016). Two-stage sampling designs for external validation of personal risk models. Statistical Methods in Medical Research 25, 1313–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Wu, C. and Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association 96, 185–193. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxaa060_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES