Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 1.
Published in final edited form as: Cancer Causes Control. 2019 Aug 3;30(10):1145–1155. doi: 10.1007/s10552-019-01208-9

Improving the Diagnostic Accuracy of a Stratified Screening Strategy by Identifying the Optimal Risk Cutoff

John T Brinton 1, R Edward Hendrick 2, Brandy M Ringham 3, Mieke Kriege 4, Deborah H Glueck 5
PMCID: PMC6736710  NIHMSID: NIHMS1536546  PMID: 31377875

Abstract

Background:

The American Cancer Society (ACS) suggests using a stratified strategy for breast cancer screening. The strategy includes assessing risk of breast cancer, screening women at high risk with both MRI and mammography, and screening women at low risk with mammography alone. The ACS chose their cutoff for high risk using expert consensus.

Methods:

We propose instead an analytic approach that maximizes the diagnostic accuracy (AUC/ROC) of a risk-based stratified screening strategy in a population. The inputs are the joint distribution of screening test scores, and the odds of disease given risk score. Using the approach for breast cancer screening, we estimated the optimal risk cutoff for two different risk models: the Breast Cancer Screening Consortium (BCSC) model and a hypothetical model with much better discriminatory accuracy. Data on mammography and MRI test score distributions came from the Magnetic Resonance Imaging Screening Study Group.

Results:

A risk model with an excellent discriminatory accuracy (c-statistic = 0.947) yielded a reasonable cutoff where only about 20% of women had dual screening. However, the BCSC risk model (c-statistic = 0.631) lacked the discriminatory accuracy to differentiate between women who needed dual screening, and women who needed only mammography.

Conclusion:

Our research provides a general approach to optimize the diagnostic accuracy of a stratified screening strategy in a population, and to assess whether risk risk models lack enough discriminatory accuracy to make stratified screening a reasonable recommendation.

Keywords: cancer screening, stratified screening, risk assessment, ROC analysis

1. Introduction

In cancer screening, more intensive screening strategies are often recommended for people whose risk of cancer exceeds a cutoff value [33,25,1,4]. People at lower risk are given less intense screening or no screening at all. The approach can be termed a stratified screening strategy. In the United States, stratified screening strategies are recommended for breast cancer, endometrial cancer, and colorectal cancer [36,35].

Stratified screening matches screening intensity to risk. The underlying assumption of stratified screening is that for people at high risk, the benefits of detection outweigh the possible harms of additional screening. The early detection of cancers can reduce mortality, and improve the quality and length of life [16,30]. Yet cancer screening can also lead to false positive exams, undesirable events for anyone seeking cancer screening. The problem is compounded with regular screening, since even a low false positive rate can lead to a high cumulative false positive rate. For example, regular mammography screening will result in at least one false positive exam for 50% of women who undergo yearly mammography over 10 years [12].

While stratified screening strategies have some appeal, the question is how to decide who should get intensive screening. In many cases, expert panels choose the risk level at which more intensive screening is recommended. Though expert panels usually select the risk cutoff after a review of evidence, development of guidelines may still be subjective. To illustrate the issue, consider the American Cancer Society (ACS) guidelines for adjunct breast MRI screening [33]. The ACS recommends that women over 30 years of age who have between 20% and 25% lifetime risk of breast cancer be screened with contrast-enhanced breast MRI in addition to mammography [33]. No information is given in the document as to how the cutoff for additional screening was chosen. In addition, a risk cutoff between 20–25% may leave women and their personal physicians without clear guidance for how to proceed.

Perhaps in light of this lack of clear guidance, Dr. Otis Brawley, chief medical and scientific officer of the American Cancer Society, argues that processes for developing risk cutoff s for cancer screening should be transparent [9]. One approach to make the process more transparent is to use an analytic approach to determine an optimal screening strategy, a tack taken by a variety of authors. Pepe et al. [26] assessed the classification power of biomarkers. Baker et al. [6] suggested using a utility function to assess whether combinations of biomarkers would result in a useful screening regimen. Gail and Pfeiffer [13] used a loss-function approach to model a risk-tool based decision for further screening. In a loss-function approach, one assigns a penalty to wrong decisions. Wrong decisions include falsely diagnosing disease in someone who is disease-free, and incorrectly declaring no disease in someone who has cancer.

In response, we propose instead a metric designed to measure the diagnostic accuracy of a stratified screening strategy, without consideration of cost or loss. The approach mirrors that used in the medical literature, where criteria based on accuracy often appear before any consideration of cost. For example, it took almost 3 years after Pisano et al. [27] compared the diagnostic accuracy of digital and film mammography for Tosteson et al. [38] to compare the cost-effectiveness of the two modalities. The rapid adoption of digital mammography, despite the lack of documentation of cost-effectiveness, suggests that accuracy, not cost, is the true metric that drives adoption of screening techniques.

In order to provide a method to quantify the population-wide accuracy of a screening strategy, we define a population based metric of diagnostic accuracy, based on a population-wide receiver operating characteristic (ROC) curve. A population-wide ROC curve for a strategy is a measure of the sensitivity and specificity of the strategy across all risk spectra of the population. For the population-wide curve, appropriate weighting is used to account for the differential risk of disease within each stratum. In turn, the area under the population-wide ROC curve quantifies the diagnostic accuracy of a screening strategy.

Using area under the ROC curve (AUC) as a metric, we propose methods to optimize the risk cutoff that defines the more intensely screened group. To illustrate the utility of the method, we find an optimal risk cutoff for the Breast Cancer Surveillance Consortium (BCSC) risk model [7]. For comparison, we apply the method to a hypothetical example where the risk assessment model has much greater discriminatory accuracy than the BCSC risk model. Results of this work are intended to help investigators choose evidence-based risk cutoffs that optimize the diagnostic accuracy of a stratified screening strategy.

2. Methods

2.1. Analytic Approach

We describe a method to obtain a risk cutoff that maximizes the diagnostic accuracy of a stratified screening strategy in a population. The approach requires two inputs: 1)the joint distribution of screening test scores, and 2) the odds of disease given the risk score.

2.1.1. Assumptions and Notation

Figure 1 shows a stratified screening strategy similar to the one suggested for breast cancer by Saslow et al. [33]. Every person undergoes risk assessment. People at high risk are given two screening tests. People at low risk are given one screening test. A high score on either or both screening tests indicates that the strategy is positive for disease. Typically, if the strategy is positive, a study participant would be referred for further work-up which may lead to a confirmatory test, such as biopsy. For the purpose of this research, we assume that when a person has two screening tests with different scores, the maximum score of the two tests is used. This “worst scenario” approach corresponds to that described in Sardanelli et al. [32, pg. 98]

Figure 1:

Figure 1:

Stratified Screening Strategy with Two Screening Tests. Figure 1 illustrates a stratified screening strategy and possible outcomes of the strategy based on risk assessment and two screening tests.

We assume that the two screening tests used in a stratified screening strategy have the same scale, and thus, the same threshold for suspicion of disease on either test. The requirement that the two screening tests are scored on the same scale is often made so that clinicians do not have to make assumptions about how one scale compares to another. It is a common practice in breast cancer screening, in which film-screen mammography, digital mammography, and contrast enhanced breast MRI are usually rated using the same scale [43,15,19,20,22,18,14,32,44,21], known as BI-RADS [34].

Finally, we assume that both the receiver operating characteristic (ROC) curves and the diagnostic accuracies (AUCs) of the screening tests are independent of the risk score. This assumption is reasonable when the physical process by which the screening test operates does not depend on the same process that modifies risk. For example, the sensitivity and specificity of screening breast MRI appear to be independent of family history, a surrogate measure of risk [18,19,20,22].

Individuals at high risk are given two ordinal categorical screening tests: screening test A and screening test B. Individuals at low risk are given only one screening test, test A. For each screening test, assume that the higher the screening test score, the higher the likelihood of disease. We assume that the two screening tests are scored independently.

Let yidj be the screening test score for individual i ∈ {1, 2,…, N}, with disease status d ∈ {c, n} on the screening test j ∈ {A, B}. Here c indicates the presence of disease and n indicates the absence of disease. Let T be the number of possible outcomes for the two screening tests, so that each test can take on one value from the possible set of outcomes t ∈ {1, 2,…, T}. Test scores for individuals with disease and without disease each have a potentially bivariate discrete probability mass function. For a given disease status and for tA and tB ∈ { 1, 2,…, T} , assume that the two tests have a bivariate discrete probability mass function fA,B|d (yidA = tA, yidB = tB|d= wtAtBd, where wtAtBd, is the probability that screening test A and screening test B take on values tA and tB respectively. Under these assumptions, we have that

tA=1TtB=1TwtAtBn = 1, (1)

and

tA=1TtB=1TwtAtBc = 1. (2)

Recall our assumption of a common cutoff score of θ for both screening tests. For a high-risk participant, the stratified screening strategy is positive if (yidAθ), (yidBθ), or both. For a low-risk participant, the stratified screening strategy is positive only if (yidAθ).

Let fZ (z) and FZ (z) be the probability density and cumulative distribution functions for the risk scores, where Z ∈ [zmin, zmax]. Here, the risk score is assumed to have a continuous distribution. Let λ ∈ [zmin, zmax] be the risk cutoff for categorizing an individual as low or high risk for cancer: individuals with risk scores less than λ are deemed low-risk, while those with risk scores greater than or equal λ are high-risk. Let the index k ∈ {l, h} denote risk strata. The probability that a person will be classified as low-risk is rι (λ) = Pr [Z < λ]. The probability of being high-risk is rh (λ) = 1 − rι (λ).

Let π, πh (λ), and πι (λ) be the prevalence of disease in the general screening population, the high-risk stratum, and the low risk stratum, respectively. Let D+ be the event an individual has disease. Let Pr {D+|Z = z} be the conditional probability an individual has disease given a risk score. From Equation 6.6, p. 278, [29], the prevalence of disease in each stratum is given by

πl(λ)=zminλPr{D+|Z=z}fZ (zdz (3)

and

πh(λ)=ππl(λ)=λzmaxPr{D+|Z=z}fZ (zdz. (4)

Note that with λ = zmax, πι (λ) = πι (zmax) = π. Similarly, with λ = zmin, πh (λ) = πι (zmin) = π.

For conciseness, we write πι (λ) and πh (λ) as πι and πh. Similarly, rι and rh denote the proportion of the entire population classified as low- or high-risk.

2.1.2. Sensitivity and Specificity of the Stratified Screening Strategy

The sensitivity and specificity of the stratified screening strategy depend on the sensitivity and specificity values for the two component screening tests. The sensitivity for screening test A is given by

sAtA=0TtB=1TwtAtBc. (5)

The sensitivity for screening test B is given by

sBtA=1TtB=0TwtAtBc. (6)

The specificity of screening test A is given by

pAtA=1θ1tB=1TwtAtBc (7)

and the specificity for screening test B is given by

pBtA=1TtB=1θ1wtAtBc. (8)

To derive formulae for the sensitivity and specificity of the stratified screening strategy, we give the probability of every possible outcome for the stratified screening strategy (Table 1).

Table 1:

Stratified Screening Strategy Outcomes and Associated Probabilities

Risk Level Disease Test A Test B Strategy Probability
High + + + + jh
High + + + sB · πh · rhjh
High + + + sA · πh · rhjh
High + (πh · rhsB · πh · rh) – (sA · πh · rhjh)
High + + + (rhπh ·rh) − pA · (rhπh ·rh) – (pB · (rhπh ·rh) − gh)
High + + pA · (rhπh ·rh) − gh
High + + pB · (rhπh ·rh) − gh
High gh
Low + + No value + sA πl ·rl
Low + No value πlsA πl ·rl
Low + No value + (rlπl ·rl) − pA · (rlπl ·rl)
Low No value pA · (rlπl ·rl)

The number of true positives (TP) for the strategy is the number of people called positive by the strategy who do, in fact, have cancer. False positives (FP), true negatives (TN) and false negatives (FN) are defined similarly. Then the sensitivity of the strategy is given by

Sens (θ,λ)=TPTP + FN, (9)

and the specificity by

Spec (θ,λ)=TNTN + FP, (10)

where

jh=rh×πh×{tA=θTtB=θTwtAtBc} (11)
gh=rh×(1πh)×{tA=1θ1tB=1θ1wtAtBn}, (12)
TP = (jh)+(sBπhrhjh)+(sAπhrhjh)+(sAπlrl), (13)
FP=[(rhπhrh)pA(rhπhrh)(pB(rhπhrh)gh)]+[pA(rhπhrh)gh]+[pB(rhπhrh)gh]+[(rlπlrl)pA(rlπlrl)], (14)
FN = [(πhrhsBπhrh)(sAπhrhjh)]+(πlsAπlrl), (15)

and

TN=gh+[pA(rlπlrl)]. (16)

2.1.3. Calculating the Diagnostic Accuracy of the Stratified Screening Strategy

Equations (Equation 9) and (Equation 10) define a receiver operating characteristic curve for the stratified screening strategy as a function of the risk cutoff, lambda. We can assess the diagnostic accuracy of the stratified screening strategy for each risk cutoff using the area under the receiver operating characteristic curve (AUC). We calculate the AUC of the stratified screening strategy for each risk cutoff using a trapezoidal rule approximation[37].

Recall that in a stratified screening strategy, all people are assessed for risk of disease using a risk model, and those people with risk scores above a risk cutoff are given more intensive screening. Both the ROC curve and the AUC for the stratified screening strategy depend on the choice of risk cutoff for the strategy. Because the ROC and AUC depend on the risk cutoff, the diagnostic accuracy of the stratified screening strategy is a function of the risk cutoff.

2.1.4. Finding the Risk Cutoff that Maximizes the Diagnostic Accuracy of the Stratified Screening Strategy

We used a grid search approach [37] to identify an optimal risk cutoff for the strategy λ*, which maximizes the AUC of the strategy. When no maximum exists, one approach is to select λ* = λmax. This choice corresponds to a screening strategy where all participants are given only Test A. Alternatively, one could choose λ* = λmin . This cutoff choice corresponds to adopting a screening strategy where all participants are given both screening tests. Giving everyone in the population both screening tests is a useful strategy when the two screening tests produce better diagnostic accuracy than either component screening test alone.

2.2. Methods for Real and Hypothetical Examples

To evaluate the performance of the proposed stratified screening strategy we require the following information: 1) the joint distribution of screening test scores and 2) the distribution of the risk scores in the general population. From this information we are able to calculate the AUC of the strategy for a given risk score cutoff.

We provide two examples. In the first example, we use a risk model with moderate discriminatory accuracy developed with data from the Breast Cancer Screening Consortium (BCSC) [7]. In the second example, we use a hypothetical risk model with high discriminatory accuracy. The hypothetical risk model has higher discriminatory accuracy than the BCSC model (c-statistics of 0.947 versus 0.631).

2.2.1. Breast Cancer Screening Consortium Example

We used publicly-available data from the Breast Cancer Screening Consortium (BCSC). The data included presence or absence of cancer, and a risk score calculated using the BCSC model [7]. The BCSC model is a risk assessment tool based on data from over 2 million women, aged 35 years or older, who had no previous breast cancer and did not have breast augmentation. The data are from community-based registries and cover a broad geographic swath of the United States. The BCSC model provides an estimate of the 5-year risk of breast cancer. Scores on the BCSC risk model range between 0.0 and 0.04. A score of 0.04 reflects a 5-year probability of cancer of approximately 0.025. The risk assessment tool had a c statistic of 0.631 (95% confidence interval [CI] = 0.618 to 0.644) for pre-menopausal women and 0.624 (95% CI = 0.619 to 0.630) for postmenopausal women [7]. The c statistic is a measure of discriminatory accuracy of a risk prediction model, and ranges between 0.5 for a risk assessment tool no better than chance and 1.0 for a perfect risk assessment tool [17].

The empirical distribution of risk scores in the screening population, fz (z), was estimated from the BCSC data [7]. We show the empirical distributions of the risk scores conditional on disease status in Figure 2a. Note that the calculation of the probability of breast cancer given the risk score uses the unconditional empirical distribution of the risk scores (i.e., the risk scores for both the cancers and the non-cancers combined). We display the conditional empirical distributions to illustrate the large amount of overlap between the two distributions. The large overlap is the underlying reason for the BCSC model’s modest discriminatory ability.

Figure 2:

Figure 2:

Optimal Risk Cutoff for Adjunct Screening Using the BCSC Risk Model. Figure 2 illustrates a) the population distribution of five-year Breast Cancer Surveillance Center (BCSC) risk scores, b) the probability of disease conditional upon the risk scores and c) the risk cutoff which maximizes the AUC for a stratified screening strategy using the BCSC risk assessment tool.

Most models predict incident risk; however, for the application of our method we utilize the BCSC risk model data to estimate the prevalence of cancer in a screening population. The probability of breast cancer given the risk score, Pr {D+|Z = z}, was estimated from the BCSC assigned five-year risk scores by regressing the risk score on disease status indicator. The probability of disease given the risk score was estimated with the inverse logit as in Equation 17, with β0 = −7.4, and β1 = 92.7,

Pr{D+|Z=z}=exp(β0+zβ1)exp(β0+zβ1)+1. (17)

The probability of disease given the BCSC risk score is shown in Figure 2b.

The joint distribution of MRI and mammography test scores was obtained from unpublished data from the study of Kriege et al. [18]. Briefly, 1909 women with a cumulative lifetime risk of 15% or more were screened every year by mammography and MRI. Each modality was scored independently using a standardized Breast Imaging Reporting and Data System (BI-RADS) scale. Women were followed for the development of breast cancer for a median of 2.9 years. Of the 1909 women, 1795 had data on both mammography and MRI for at least one visit. Most woman had multiple rounds of screening and BI-RADS scores for both mammography and MRI. To ensure that each woman appeared in the data set only once, we used the following approach. For women who eventually developed breast cancer (N = 45), the last screening scores before diagnosis were used. For women with no evidence of breast cancer (N = 1750), one pair of BI-RADS scores, including one for mammography and one for MRI, were chosen at random from all of the woman’s screening examinations. The AUC for mammography alone was 0.686, while the AUC for screening breast MRI was 0.827 [18].

2.2.2. Hypothetical Risk Model Example

We were curious to evaluate whether a risk prediction tool with better discriminatory accuracy yielded a clearer choice for a risk model cutoff, and thus we simulated a stratified screening strategy with a better diagnostic accuracy.

Risk models have low discriminatory ability if the distribution of risk scores for women with disease and women without disease substantially overlap, the case is shown for the BCSC risk tool results in Figure 2a. Instead, we made distributional assumptions for the risk scores so that there was a strong separation between scores for women with disease and without disease. We assumed that risk scores for women without breast cancer had a beta distribution with parameters α = 3 and β = 21. We assumed that the risk scores for women with breast cancer were beta-distributed with parameters α = 9 and β = 21. We fixed the prevalence of disease at 0.006, the prevalence of breast cancer observed in Pisano et al., [27]. The resulting distribution of risk scores for the entire population, fz (z), is a mixture of the two beta distributions. The simulated risk score distributions for 100,000 women without disease and 600 women with disease are shown in Figure 3a.

Figure 3:

Figure 3:

Optimal Risk Cutoff for Adjunct Screening Using a Model with High Discriminatory Accuracy. Figure 3 illustrates a) the population distribution of risk scores from a risk assessment tool with better predictive accuracy than the Breast Cancer Surveillance Center risk tool, b) the probability of disease given the risk score and c) the risk cutoff which maximizes the AUC for a stratified screening strategy using the better risk assessment tool.

Using logistic regression on the simulated data, we fit the probability of disease given the risk score Pr{D+|Z = z}. The model fit yielded β0 = −7.73, and β1 = 29.64, using Equation 17. The model had good predictive accuracy with a c statistic of 0.947, much better than the c statistic for the BCSC risk assessment tool [7]. The probability of disease given the risk score is plotted in Figure 3b.

The joint distribution of test scores for mammography and MRI was obtained as described above.

3. Results

3.1. Optimal Risk Score Cutoff Based on the Breast Cancer Screening Consortium (BCSC) Model

As shown in Figure 2c, the optimal risk score cutoff for maximizing the AUC of a stratified screening strategy that uses the BCSC risk assessment model is λ* = 0.0031, which corresponds to a five-year breast cancer probability of 0.0008. The optimal risk cutoff appears in Figure 2c as a vertical line occurring at the risk score where the AUC is maximized. Using this risk cutoff value yields an AUC for the strategy of 0.885.

A stratified screening strategy with a risk cutoff of λ* = 0.0031 would mean that every woman in the screening population with a BCSC 5-year risk score greater than 0.0031 would be screened with both digital mammography and contrast-enhanced screening breast MRI. Based on the BCSC data this would result in more than 99% of the general screening population receiving both tests, an absurd result. The BCSC model lacks the discriminatory accuracy to differentiate between women at high risk, who require screening with both mammography and MRI, and women at low risk, who require mammography alone.

3.2. Optimal Risk Cutoff Based on a Hypothetical Risk Model with High Discriminatory Accuracy

As shown in Figure 3c, the optimal risk score cutoff for maximizing the AUC of a stratified screening strategy that uses the hypothetical risk assessment model is λ* = 0.16. Again, the optimal risk cutoff appears in Figure 3c as a vertical line and occurs at the point where the AUC is maximized. Using this risk cutoff value yields an AUC for the strategy of 0.90.

The results indicate that in order to optimize the AUC of a stratified screening strategy based on the hypothetical risk assessment tool considered in this section, every woman with a risk score greater than 0.16 on the hypothetical risk model should be screened with both digital mammography and contrast-enhanced screening breast MRI. This would result in 20% of the population receiving both tests, with the rest screened only with digital mammography. With a risk model with excellent discriminatory accuracy, the choice of a cutoff is clear. Using that cutoff maximizes the accuracy of the stratified screening strategy for the entire population.

4. Discussion

We demonstrate a single analytic approach for identifying an optimal cutoff for a risk-based stratified screening strategy. The approach maximizes the AUC of the stratified screening approach in a population. The approach uses mathematical criteria and empirical data, rather than expert opinion to identify a risk threshold for adjunct screening with a secondary test. The work presented in this manuscript has the potential to inform cancer screening recommendations for a variety of disease sites, in addition to the breast cancer case considered in the examples.

If the only risk models available have poor discriminatory accuracy, using a risk model to determine the intensity of screening may not be a good approach, a finding that mirrors that of Wald et al. [40]. The discriminatory accuracy of a risk model measures the probability that a risk model will correctly differentiate between those who will develop disease and those who will not. Using a risk assessment model with low discriminatory accuracy means that the risk model often provides incorrect classification. The risk model cannot discriminate between those who will and those who will not develop disease. Because the risk model is often wrong, there is essentially no improvement between a strategy where all women are screened with both tests (shown on the far left of the horizontal axis in Figure 2c), and a stratified screening strategy where women are first risk-assessed and then receive screening based on their personal risk.

Using a risk model with poor discriminatory accuracy, any algorithm seeking to choose an optimal risk cutoff for a stratified screening strategy will produce unacceptable results. Using the BCSC risk model, we obtained a risk cutoff that suggested 99% of women presenting for breast cancer screening should be screened with both contrast enhanced breast MRI and mammography. Yet such a strategy would never be accepted by women and their physicians, or third-party payers. It could be clinically implemented with fast breast MRI used everywhere, but costs would be enormous.

Our results indicate that matching screening intensity to risk is a good strategy only when the risk assessment tool has good discriminatory accuracy. This result agrees with those of other authors, who used alternative rationales to achieve the same conclusion [13,6]. While no current risk model for breast cancer has a c-statistic as good as the model we posit in Section 3.2 [2], it is the hope that future models that incorporate genetic and epigenetic information may perform better. With better risk models, stratified screening strategies using our optimization strategy would be practical and would improve cancer screening.

One limitation of our work is that the real data example we chose uses the BCSC risk model. It is important to note that the ACS [33] recommended evaluating lifetime risk of breast cancer using models that are largely dependent on detailed family history, such as the BRCAPRO, Claus, or Tyrer-Cuzick models [10, 8,39], not the BCSC model. However, evaluation of the performance of any one of these models is not currently possible. Our method requires, as an input, the probability of disease (prevalent screen detectable disease), given the risk score. Yet this distribution for the BRCAPRO, BOADICEA, Claus, or Tyrer-Cuzick models [10, 8,39,3] is not readily available. Although Amir et al. [2] evaluated the probability of breast cancer incidence given risk assessment with either the Tyrer-Cuzick or Claus models [2, Figure 1, p 812], their data are not publicly accessible.

Another possible limitation is that we assumed that the risk score was independent of the performance of the two screening tests. Yet the BCSC model includes breast density as an input [7]. In addition, breast density is associated with the sensitivity and specificity of mammography [27]. This contravenes our assumption of the independence of the risk assessment model and the performance of the screening tests. We chose to keep the BCSC example, since we could find no other published, freely available data set containing the population distribution of risk scores.

One other potential limitation is our use of the Kriege et al. [18] data to estimate the joint distribution of the mammography and MRI scores. The Kriege et al. [18] study used mostly screen-film mammography. Since the study there has been widespread adoption of digital mammography. In addition, since the Kriege et al. [18] study occurred, radiologist experience and skill with breast MRI has increased. Even with these potential limitations, it is unlikely that the results of our analysis would change much with updated data. In addition, the goal of this manuscript is to demonstrate the applicability of our method. In the future, our results could be updated using new information should updated data on the diagnostic accuracy of MR and mammography become available.

A potential bias occurs in our results because Kriege et al. [18] only enrolled women at high lifetime risk. We used these data under our assumption that the screening test score is independent of the risk, and hence, any estimate, even an estimate from a high-risk population, would be valid. This in fact may not be true. If we could obtain data from a general, low-risk population on the joint distribution of mammography and MR scores, we could evaluate the validity of the assumption. Such data are difficult to obtain.

Some authors have suggested using partial AUC or the Youdens index instead of the full area under the receiver operating characteristic curve [24,23], the metric used in this manuscript. Our rationale for using full area under the curve follows. In most breast cancer papers [27,18,19,20,22], the full area under the curve is used as the metric, due to the nature of the detection task. For a continuous biomarker, typically follow-up testing is only done for extreme values of the biomarker, i.e. for parts of the curve where sensitivity is high and specificity is low. When radiologists review mammography or breast MRI images, sometimes a radiologist will see a salient detail that 99 out of one hundred other readers would miss. Thus detection of cancer may occur even in cases where the sensitivity is low, and the specificity is high. Thus, considering the full curve reflects the true clinical picture. In many cases other than breast cancer screening [41], a partial area under the curve, both for the test and for the population, may have merit. An extension of our method could be achieved by changing the numerical algorithm to use partial AUC, or the Youdens index as the metric.

Extensions to situations with multiple screening tests could certainly be considered in the framework we have set up in this paper. Another extension would be to derive similar probabilistic approaches for more complex stratified screening strategies, such as those with more than one risk assessment tool or more than two possible screening approaches. Radiologists conducting breast cancer screening might be interested in guidelines for adding either whole-breast ultrasound or breast MRI to mammography. Finally, the method could easily apply to models which predict risk for short fixed time horizons (≤10 years), rather than for the remaining lifetime. The advantage of using short time horizon risk models is that most risk models are only validated for short horizons, and are therefore more accurate in the short term [28].

In this manuscript, we provide an approach for optimizing the diagnostic accuracy of a stratified screening strategy by choosing an appropriate risk cutoff. Yet in the end, diagnostic accuracy should not be the only factor used to determine how one should screen for cancer. The most important factor used to determine whether and what kind of screening should be implemented is whether the screening program leads to a mortality reduction and how large the mortality reduction is. Many countries have suggested adding MRI to mammography screening programs for high risk groups [33,31]. The argument for adding MRI to the screening regimen has mostly been driven by expert opinion, in turn motivated by data that suggest that screening with MRI may reduce cancer stage[18,42].

While our approach only considers diagnostic accuracy, and not mortality reduction, our manuscript provides a first step. The next step is to use our approach to choose a risk cutoff, and then use simulation-based models that predict mortality reduction and the risk of false positive screens to evaluate the effect on mortality, morbidity and cumulative false positive rate.

Although this paper applied our method to breast cancer screening with mammography or MRI, the methods could also be applied to evaluate the utility of other modalities in breast cancer screening. In addition, randomized controlled clinical trials have demonstrated that screening yields mortality reduction in colon [5], lung [1] and oral [30] cancer. The methods of this manuscript could be used to find appropriate risk cutoff s to optimize stratified screening strategies for these other disease sites. In fact, the approach has the potential to perform better at other sites since other cancers may have single, and very strong risk factors, such as the odds ratio of 33.6 for oropharyngeal cancer among those who are HPV-16 L1 seropositive, non-smokers and non-drinkers [11].

The methods presented in this paper fulfill Dr. Brawley’s call for transparent processes for developing risk cutoff s for cancer screening [9]. Instead of using expert opinion to choose a cutoff, standards-setting bodies like the American Cancer Society could use this approach to optimize the diagnostic accuracy of a screening strategy.

Acknowledgements

This manuscript was submitted to the Department of Biostatistics and Informatics in the Colorado School of Public Health, University of Colorado Denver, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biostatistics for JTB. Partial funding for DHG was provided by a generous grant from the Lundbeck Foundation, who provided a visiting professorship to the University of Copenhagen. We thank the BCSC investigators, participating mammography facilities, and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at: http://breastscreening.cancer.gov/. Data collection and sharing for the BCSC was supported by the National Cancer Institute (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040, HHSN261201100031C).

Footnotes

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

Contributor Information

John T. Brinton, Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO

R. Edward Hendrick, Department of Radiology, School of Medicine, University of Colorado Denver, Aurora, CO.

Brandy M. Ringham, Lifecourse Epidemiology of Adiposity and Diabetes (LEAD) Center, University of Colorado Denver, Aurora, CO

Mieke Kriege, Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, Netherlands.

Deborah H. Glueck, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO

References

  • 1.Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD: Reduced lung-cancer mortality with low-dose computed tomographic screening. The New England Journal of Medicine 365(5), 395–409 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Amir E, Evans DG, Shenton A, Lalloo F, Moran A, Boggis C, Wilson M, Howell A: Evaluation of breast cancer risk assessment packages in the family history evaluation and screening programme. Journal of Medical Genetics 40(11), 807–814 (2003) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Antoniou AC, Pharoah PPD, Smith P, Easton DF: The BOADICEA model of genetic susceptibility to breast and ovarian cancer. British Journal of Cancer 91(8), 1580–1590 (2004). DOI 10.1038/sj.bjc.6602175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Armstrong AC, Evans GD: Management of women at high risk of breast cancer. BMJ 348(apr28 26), g2756–g2756 (2014). DOI 10.1136/bmj.g2756 [DOI] [PubMed] [Google Scholar]
  • 5.Atkin WS, Edwards R, Kralj-Hans I, Wooldrage K, Hart AR, Northover JMA, Parkin DM, Wardle J, Du y SW, Cuzick J, UK Flexible Sigmoidoscopy Trial Investigators: Once-only flexible sigmoidoscopy screening in prevention of colorectal cancer: a multicentre randomised controlled trial. Lancet (London, England) 375(9726), 1624–1633 (2010). DOI 10.1016/S0140-6736(10)60551-X [DOI] [PubMed] [Google Scholar]
  • 6.Baker SG: Identifying Combinations of Cancer Markers for Further Study as Triggers of Early Intervention. Biometrics 56(4), 1082–1087 (2000) [DOI] [PubMed] [Google Scholar]
  • 7.Barlow WE, White E, Ballard-Barbash R, Vacek PM, Titus-Ernsto, Carney PA, Tice JA, Buist DSM, Geller BM, Rosenberg R, Yankaskas BC, Kerlikowske K: Prospective breast cancer risk prediction model for women undergoing screening mammography. Journal of the National Cancer Institute 98(17), 1204–1214 (2006). DOI 10.1093/jnci/djj331 [DOI] [PubMed] [Google Scholar]
  • 8.Berry DA, Iversen ES Jr, Gudbjartsson DF, Hiller EH, Garber JE, Peshkin BN, Lerman C, Watson P, Lynch HT, Hilsenbeck SG, Rubinstein WS, Hughes KS, Parmigiani G: BRCAPRO validation, sensitivity of genetic testing of BRCA1/BRCA2, and prevalence of other breast cancer susceptibility genes. Journal of Clinical Oncology 20(11), 2701–2712 (2002) [DOI] [PubMed] [Google Scholar]
  • 9.Brawley O, Byers T, Chen A, Pignone M, Ransoho D, Schenk M, Smith R, Sox H, Thorson AG, Wender R: New American Cancer Society process for creating trustworthy cancer screening guidelines. Journal of the American Medical Association 306(22), 2495–2499 (2011) [DOI] [PubMed] [Google Scholar]
  • 10.Claus E: Risk models in genetic epidemiology. Statistical Methods in Medical Research 9(6), 589–601 (2000) [DOI] [PubMed] [Google Scholar]
  • 11.D’Souza G, Pawlita M, Westra WH: Case-Control Study of Human Papillomavirus and Oropharyngeal Cancer. n engl j med p. 13 (2007) [DOI] [PubMed] [Google Scholar]
  • 12.Elmore JG, Barton MB, Moceri VM, Polk S, Arena PJ, Fletcher SW: Ten-year risk of false positive screening mammograms and clinical breast examinations. The New England Journal of Medicine 338(16), 1089–1096 (1998) [DOI] [PubMed] [Google Scholar]
  • 13.Gail MH, Pfei er RM: On criteria for evaluating models of absolute risk. Biostatistics 6(2), 227–239 (2005). DOI 10.1093/biostatistics/kxi005 [DOI] [PubMed] [Google Scholar]
  • 14.Hagen AI, Kvistad KA, Maehle L, Holmen MM, Aase H, Styr B, Vabø A, Apold J, Skaane P, Møller P: Sensitivity of MRI versus conventional screening in the diagnosis of BRCA-associated breast cancer in a national prospective series. Breast (Edinburgh, Scotland) 16(4), 367–374 (2007). DOI 10.1016/j.breast.2007.01.006 [DOI] [PubMed] [Google Scholar]
  • 15.Hartman AR, Daniel BL, Kurian AW, Mills MA, Nowels KW, Dirbas FM, Kingham KE, Chun NM, Herfkens RJ, Ford JM, Plevritis SK: Breast magnetic resonance image screening and ductal lavage in women at high genetic risk for breast carcinoma. Cancer 100(3), 479–489 (2004). DOI 10.1002/cncr.11926 [DOI] [PubMed] [Google Scholar]
  • 16.Hendrick RE, Smith RA, Rutledge JH, Smart CR: Benefit of screening mammography in women aged 40–49: A new meta-analysis of randomized controlled trials. Journal of the National Cancer Institute. Monographs (22), 87–92 (1997) [DOI] [PubMed] [Google Scholar]
  • 17.Hosmer DW, Lemeshow S: Applied Logistic Regression (Wiley Series in Probability and Statistics), 2nd edition edn. Wiley-Interscience Publication; (2000) [Google Scholar]
  • 18.Kriege M, Brekelmans CTM, Boetes C, Besnard PE, Zonderland HM, Obdeijn IM, Manoliu RA, Kok T, Peterse H, Tilanus-Linthorst MMA, Muller SH, Meijer S, Oosterwijk JC, Beex LVAM, Tollenaar RAEM, de Koning HJ, Rutgers EJT, Klijn JGM: E cacy of MRI and mammography for breast-cancer screening in women with a familial or genetic predisposition. The New England Journal of Medicine 351(5), 427–437 (2004) [DOI] [PubMed] [Google Scholar]
  • 19.Kuhl CK, Schrading S, Leutner CC, Morakkabati-Spitz N, Wardelmann E, Fimmers R, Kuhn W, Schild HH: Mammography, breast ultrasound, and magnetic resonance imaging for surveillance of women at high familial risk for breast cancer. Journal of Clinical Oncology: O cial Journal of the American Society of Clinical Oncology 23(33), 8469–8476 (2005) [DOI] [PubMed] [Google Scholar]
  • 20.Leach MO, Boggis CRM, Dixon AK, Easton DF, Eeles RA, Evans DGR, Gilbert FJ, Griebsch I, Ho RJC, Kessar P, Lakhani SR, Moss SM, Nerurkar A, Padhani AR, Pointon LJ, Thompson D, Warren RML: Screening with magnetic resonance imaging and mammography of a UK population at high familial risk of breast cancer. Lancet 365(9473), 1769–1778 (2005) [DOI] [PubMed] [Google Scholar]
  • 21.Lehman CD: Diffusion weighted imaging (DWI) of the breast: Ready for clinical practice. European Journal of Radiology 81 Suppl 1, S80–81 (2012). DOI 10.1016/S0720-048X(12)70032-3 [DOI] [PubMed] [Google Scholar]
  • 22.Lehman CD, Blume JD, Weatherall P, Thickman D, Hylton N, Warner E, Pisano E, Schnitt SJ, Gatsonis C, Schnall M, DeAngelis GA, Stomper P, Rosen EL, O’Loughlin M, Harms S, Bluemke DA: Screening women at high risk for breast cancer with mammography and magnetic resonance imaging. Cancer 103(9), 1898–1905 (2005) [DOI] [PubMed] [Google Scholar]
  • 23.Ma H, Bandos AI, Gur D: On the use of partial area under the ROC curve for comparison of two diagnostic tests. Biometrical Journal. Biometrische Zeitschrift 57(2), 304–320 (2015). DOI 10.1002/bimj.201400023 [DOI] [PubMed] [Google Scholar]
  • 24.Ma H, Bandos AI, Rockette HE, Gur D: On use of partial area under the ROC curve for evaluation of diagnostic performance. Statistics in Medicine 32(20), 3449–3458 (2013). DOI 10.1002/sim.5777 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.McFarland EG, Levin B, Lieberman DA, Pickhardt PJ, Johnson CD, Glick SN, Brooks D, Smith RA: Revised colorectal screening guidelines: Joint effort of the American Cancer Society, U.S. Multisociety Task Force on Colorectal Cancer, and American College of Radiology. Radiology 248(3), 717–720 (2008) [DOI] [PubMed] [Google Scholar]
  • 26.Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P: Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker. American Journal of Epidemiology 159(9), 882–890 (2004). DOI 10.1093/aje/kwh101 [DOI] [PubMed] [Google Scholar]
  • 27.Pisano ED, Gatsonis C, Hendrick E, Ya e M, Baum JK, Acharyya S, Conant EF, Fajardo LL, Bassett L, D’Orsi C, Jong R, Rebner M: Diagnostic performance of digital versus film mammography for breast-cancer screening. The New England journal of Medicine 353(17), 1773–1783 (2005) [DOI] [PubMed] [Google Scholar]
  • 28.Quante AS, Whittemore AS, Shriver T, Hopper JL, Strauch K, Terry MB: Practical problems with clinical guidelines for breast cancer prevention based on remaining lifetime risk. Journal of the National Cancer Institute 107(7) (2015). DOI 10.1093/jnci/djv124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ross S: First Course in Probability, 2nd edn. Macmillan Publishing Company, New York: (1984) [Google Scholar]
  • 30.Sankaranarayanan R, Ramadas K, Thomas G, Muwonge R, Thara S, Mathew B, Rajan B: Effect of screening on oral cancer mortality in Kerala, India: A cluster-randomised controlled trial. Lancet 365(9475), 1927–1933 (2005) [DOI] [PubMed] [Google Scholar]
  • 31.Sardanelli F, Aase HS, Álvarez M, Azavedo E, Baarslag HJ, Balleyguier C, Baltzer PA, Beslagic V, Bick U, Bogdanovic-Stojanovic D, Briediene R, Brkljacic B, Camps Herrero J, Colin C, Cornford E, Danes J, de Geer G, Esen G, Evans A, Fuchsjaeger MH, Gilbert FJ, Graf O, Hargaden G, Helbich TH, Heywang-Köbrunner SH, Ivanov V, Jónsson Á, Kuhl CK, Lisencu EC, Luczynska E, Mann RM, Marques JC, Martincich L, Mortier M, Müller-Schimpfle M, Ormandi K, Panizza P, Pediconi F, Pijnappel RM, Pinker K, Rissanen T, Rotaru N, Saguatti G, Sella T, Slobodníková J, Talk M, Taourel P, Trimboli RM, Vejborg I, Vourtsis A, Forrai G: Position paper on screening for breast cancer by the European Society of Breast Imaging (EUSOBI) and 30 national breast radiology bodies from Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Israel, Lithuania, Moldova, The Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Spain, Sweden, Switzerland and Turkey. European Radiology 27(7), 2737–2743 (2017). DOI 10.1007/s00330-016-4612-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sardanelli F, Podo F, Santoro F, Manoukian S, Bergonzi S, Trecate G, Vergnaghi D, Federico M, Cortesi L, Corcione S, Morassut S, Di Maggio C, Cilotti A, Martincich L, Calabrese M, Zuiani C, Preda L, Bonanni B, Carbonaro LA, Contegiacomo A, Panizza P, Di Cesare E, Savarese A, Crecco M, Turchetti D, Tonutti M, Belli P, Maschio AD, High Breast Cancer Risk Italian 1 (HIBCRIT-1) Study: Multicenter surveillance of women at high genetic breast cancer risk using mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (the high breast cancer risk italian 1 study): final results. Investigative Radiology 46(2), 94–105 (2011). DOI 10.1097/RLI.0b013e3181f3fcdf [DOI] [PubMed] [Google Scholar]
  • 33.Saslow D, Boetes C, Burke W, Harms S, Leach MO, Lehman CD, Morris E, Pisano E, Schnall M, Sener S, Smith RA, Warner E, Ya e M, Andrews KS, Russell CA: American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA: A Cancer Journal for Clinicians 57(2), 75–89 (2007) [DOI] [PubMed] [Google Scholar]
  • 34.Sickles EA, D’Orsi CJ, Bassett LW: ACR BI-RADS — Mammography (2013)
  • 35.Smith RA, Andrews K, Brooks D, DeSantis CE, Fedewa SA, Lortet-Tieulent J, Manassaram-Baptiste D, Brawley OW, Wender RC: Cancer screening in the United States, 2016: A review of current American Cancer Society guidelines and current issues in cancer screening. CA: A cancer journal for clinicians 66(2), 96–114 (2016). DOI 10.3322/caac.21336 [DOI] [PubMed] [Google Scholar]
  • 36.Smith RA, Cokkinides V, Brawley OW: Cancer screening in the United States, 2012: A review of current American Cancer Society guidelines and current issues in cancer screening. CA: a cancer journal for clinicians (2012) [DOI] [PubMed] [Google Scholar]
  • 37.Thisted RA: Elements of Statistical Computing: NUMERICAL COMPUTATION, 1 edn. Chapman and Hall/CRC; (1988) [Google Scholar]
  • 38.Tosteson ANA, Stout NK, Fryback DG, Acharyya S, Herman BA, Hannah LG, Pisano ED, DMIST Investigators: Cost-effect iveness of digital mammography breast cancer screening. Annals of Internal Medicine 148(1), 1–10 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tyrer J, Du y SW, Cuzick J: A breast cancer prediction model incorporating familial and personal risk factors. Statistics in Medicine 23(7), 1111–1130 (2004) [DOI] [PubMed] [Google Scholar]
  • 40.Wald NJ, Hackshaw AK, Frost CD: When can a risk factor be used as a worthwhile screening test? BMJ 319(7224), 1562–1565 (1999). DOI 10.1136/bmj.319.7224.1562 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wang Z, Luo X, Chang Y.c.I. : Assessing the predictive power of newly added biomarkers. Biometrical Journal. Biometrische Zeitschrift 57(5), 797–807 (2015). DOI 10.1002/bimj.201400210 [DOI] [PubMed] [Google Scholar]
  • 42.Warner E: The role of magnetic resonance imaging in screening women at high risk of breast cancer. Topics in magnetic resonance imaging: TMRI 19(3), 163–169 (2008). DOI 10.1097/RMR.0b013e31818bc994 [DOI] [PubMed] [Google Scholar]
  • 43.Warner E, Plewes DB, Hill KA, Causer PA, Zubovits JT, Jong RA, Cutrara MR, DeBoer G, Ya e MJ, Messner SJ, Meschino WS, Piron CA, Narod SA: Surveillance of BRCA1 and BRCA2 mutation carriers with magnetic resonance imaging, ultrasound, mammography, and clinical breast examination. JAMA: the journal of the American Medical Association 292(11), 1317–1325 (2004). DOI 10.1001/jama.292.11.1317 [DOI] [PubMed] [Google Scholar]
  • 44.Yabuuchi H, Matsuo Y, Sunami S, Kamitani T, Kawanami S, Setoguchi T, Sakai S, Hatakenaka M, Kubo M, Tokunaga E, Yamamoto H, Honda H: Detection of non-palpable breast cancer in asymptomatic women by using unenhanced diffusion-weighted and T2-weighted MR imaging: comparison with mammography and dynamic contrast-enhanced MR imaging. European Radiology 21(1), 11–17 (2011). DOI 10.1007/s00330-010-1890-8 [DOI] [PubMed] [Google Scholar]

RESOURCES