Abstract
Background:
Microsimulation models are increasingly being used to inform colorectal cancer (CRC) screening recommendations. MISCAN-Colon is an example of such a model, used to inform the Dutch CRC screening program and United States Preventive Services Task Force guidelines. Assessing the validity of these models is essential to provide transparency regarding their performance. In this study we tested the external and predictive validity of MISCAN-Colon.
Methods:
We validated MISCAN-Colon using the Norwegian Colorectal Cancer Prevention (NORCCAP) trial, a randomized controlled trial that examined the effectiveness of once-only flexible sigmoidoscopy (FS) screening. We simulated the study population and design of the NORCCAP trial in MISCAN-Colon and compared 10- to 12-year model predicted hazard ratios (HRs) for overall and distal CRC incidence and mortality to those observed. In addition, we compared the numbers of screen-detected neoplasia. Finally, we predicted the trial’s future results to allow for the assessment of predictive validity.
Results:
MISCAN-Colon predicted a HR for overall CRC incidence (0.85), for distal CRC incidence (0.82), for overall CRC mortality (0.68) and for distal CRC mortality (0.62). These were within the limits of the 95% confidence intervals of the NORCCAP trial results. Similar results were observed for the number of screen-detected cancers. The model significantly underestimated the number of screen-detected adenomas. Model-predicted HRs for CRC incidence and mortality up to 15- to 17-years follow-up were 0.84 and 0.72, respectively.
Conclusion:
Although the underestimation of screen-detected adenomas requires further investigation, MISCAN-Colon is able to make a valid replication of the CRC incidence and mortality reduction of an FS screening trial, which suggests that it can be considered a useful tool to support decision making on CRC screening.
Keywords: computer simulation, colorectal neoplasm, cancer screening, microsimulation modeling
INTRODUCTION
Governments and health organizations aim to offer cancer screening programs that are effective, affordable and have a low burden for participants.1, 2 Deciding which cancer screening strategy is most suitable for these programs is a complex task that involves making decisions in terms of type of screening test(s), frequency of testing and age range, type of follow-up test(s), and risk stratification for high-risk populations. The health benefits of cancer screening must be as large as possible and substantially exceed potential harms or patient burden. All these aspects can be incorporated into microsimulation models, which can predict the population-level impact of screening strategies in an affordable, timely and ethical manner.3, 4
The Microsimulation Screening Analysis-Colon (MISCAN-Colon) model is an example of a well-established microsimulation model that has been used to inform decision making on colorectal cancer (CRC) screening, including the design of the Dutch CRC screening program and the United States Preventive Services Task Force (USPSTF) guidelines on CRC screening.3, 4 MISCAN-Colon simulates the development of adenomas, which may or may not progress to clinical CRC.5 In order to simulate the sequence from adenoma formation to clinical CRC, the model incorporates parameter values that are derived from published data such as adenoma prevalence and lifetime CRC incidence.6, 7
However, some other essential parameter values, which are crucial to simulating the adenoma carcinoma sequence and highly relevant to estimating the effectiveness of screening, are not available from existing evidence. Some characteristics of the sequence from adenoma to clinical cancer are difficult or impossible to observe in an ethically acceptable manner. For instance, the duration from adenoma formation to clinical CRC cannot be observed. Parameter estimates for these durations must therefore be inferred from data on adenoma prevalence and (interval) cancer incidence. In case of MISCAN-Colon, this inference is performed by calibration using results available from randomized controlled trials (RCTs) that investigate the effectiveness of CRC screening.4
To ensure that model calibration is correct and model predictions are valid, regular assessment of model validity is essential. Model validation is an important process in model development. In literature several levels of model validity have been proposed: face, internal, cross, external, and predictive.8, 9 The most robust levels of validity can be established through external and predictive validation; these validations entail comparing model results with real-world results and comparing model results with prospectively observed events. Although we found examples of other publications in which microsimulation models were externally validated, we did not find any examples of predictive validations.
MISCAN-Colon has been validated externally before, using the results of the United Kingdom Flexible Sigmoidoscopy Screening (UKFSS) trial, which involved once-only screening for CRC with flexible sigmoidoscopy (FS) with follow-up over a 10-year period. This validation was published by Rutter et al. (2016).5 As a consequence the MISCAN model was re-calibrated using the UKFSS trial data, resulting in a longer average duration of adenoma progression to cancer. Reassessing the performance of the recalibrated MISCAN model now requires re-validation.
In this study, we aimed to establish two types of validity of MISCAN-Colon. First, we aimed to reassess the external validity - after the recalibration on the UKFSS trial. Second, we aimed to establish predictive validity of MISCAN-Colon. For these validations, we used the results of the Norwegian Colorectal Cancer Prevention (NORCCAP) trial, which involved once-only screening for CRC with FS.
METHODS
We used MISCAN-Colon to simulate NORCCAP trial outcomes and compared predictions with those observed. Primary validation targets were relative overall and distal CRC incidence reduction and mortality reduction observed by Holme et al. (2014), who described the 10- to 12-year follow-up results of the NORCCAP trial.10 These relative reductions were presented as the hazard ratio (HR) of an event in the intervention group relative to the same event in the control group. We calculated four HRs; overall and distal CRC incidence, and overall and distal CRC mortality. In order to simulate the NORCCAP trial, we adjusted MISCAN-Colon to the demography and screening behavior of the NORCCAP trial population.
NORCCAP trial
In the NORCCAP trial, individuals between the ages of 50 to 65 years from two Norwegian regions were randomly assigned to either a control group (n=78,220), or an intervention group that consisted of two arms (n=10,283 and n=10,289). Since there was no screening program in place in Norway during the study period, the control group did not receive routine colorectal cancer screening.10 Baseline characteristics of the selected individuals are shown in Supplement 1, Table 1. In one intervention arm, individuals were offered a once-only FS (n=10,283). In the other intervention arm, individuals were offered an additional qualitative fecal occult blood test (FOBT) before FS (n=10,289), and 86.7% of the adherers to FS made use of this opportunity.10–13 A positive FS was defined as any polyp with a diameter of >10 mm or any histologically verified adenoma or carcinoma.13 Individuals with a positive FS or FOBT were referred for follow-up colonoscopy.
The trial was carried out in two phases; individuals born from 1935 to 1945 were selected and randomized to undergo screening in 1999 and 2000 (i.e. 53–65 years old at time of screening) and individuals born from 1946 to 1950 were selected and randomized to undergo screening in 2001 (i.e. 49–54 years old at the time of screening). The latest NORCCAP trial publication covered all CRC-related events until December 31st, 2011 (follow-up of 10 to 12 years).10 In this latest publication, no distinction was made between the two different intervention arms regarding the results relevant for the current validation studies. Therefore, we compared model outcomes with the overall results of the intervention arms. In the remainder of this article, we will use the term intervention group when referring to both intervention arms.
MISCAN-Colon
MISCAN-Colon is a microsimulation model for CRC developed at the Department of Public Health of the Erasmus University Medical Center (Rotterdam, the Netherlands). The model’s structure, underlying assumptions, and calibration have been described in previous publications14–16 and in Supplement 2. Briefly, MISCAN-Colon simulates the life histories of a large population of individuals from birth till death. As each simulated person ages, one or more adenomas may develop. These adenomas can progress from small (≤5mm), to medium (6–9mm), to large size (≥10mm). Some adenomas can develop into preclinical cancer, which may progress through stages I to IV. During each stage, CRC may be diagnosed because of symptoms. Survival after clinical diagnosis is determined by the stage at diagnosis, the localization of the cancer, and the person’s age.17
Screening will alter some of the simulated life histories: some cancers will be prevented by the detection and removal of adenomas; other cancers will be detected in an earlier stage with a more favorable survival. However, screening can also result in serious complications, over-diagnosis and over-treatment (i.e. the detection and treatment of adenomas or cancers that would not have been diagnosed in the absence of screening). By comparing life histories with screening with the corresponding life histories without screening, MISCAN-Colon quantifies the effectiveness of screening, as well as the associated costs.
MISCAN-Colon was calibrated to the age-, stage-, and localization-specific incidence and survival of CRC as observed in Norway during the timeframe of the NORCCAP trial (1999–2011).2 Data was provided by the Norwegian Cancer Registry. The age-specific prevalence and multiplicity distribution of adenomas was calibrated using the observations of autopsy studies.7, 18–27 The preclinical duration of CRC and the adenoma dwell-time were calibrated to the rates of interval- and surveillance-detected cancers observed in RCTs evaluating screening using guaiac FOBTs and the once-only sigmoidoscopy UKFSS trial.28–31
Adjustment of MISCAN-Colon to the NORCCAP trial
We used MISCAN-Colon to simulate a population with an age distribution comparable to the NORCCAP trial (personal communication with research leader G. Hoff). Lifetables for 2005 (i.e. middle of the study period) were retrieved from Statistics Norway.32
CRC incidence in the NORCCAP control group was 11% lower than incidence in the whole of Norway. We therefore adjusted the model accordingly by lowering the age-specific onset of adenomas by 11% for all ages.
Comparing incidence rates observed in the NORCCAP trial, we assumed that non-adherers had a slightly higher age-specific onset of adenomas for all ages than individuals in the control group (relative risk of 1.05). In addition, age-specific onset in adherers was lowered for all ages to ensure that the overall CRC risk in the intervention group did not differ from the CRC risk in the control group, taking participation rate into account (relative risk of 0.97).
The control group was simulated for 18 years without intervention. Outcomes for CRC incidence and mortality were evaluated after 10 to 12 years (i.e. consistent with published results of the trial) and for every next year until 18 years of follow-up. Individuals with negative screening were followed for 18 years without further intervention, while for those with adenomas detected we simulated surveillance consistent with the Norwegian recommendations at the time of the trial.33
We assumed age-specific participation rates for FS and FOBT as observed in the NORCCAP trial (Supplement 1 Table 2, personal communication with research leader G. Hoff).10 Adherence at follow-up colonoscopy was derived from Holme et al. (2014).10 Adherence for surveillance colonoscopies was not reported in trial publications; it was assumed to be 80%.
Since FOBT characteristics can differ due to varying cut-off levels and manufacturers and the characteristics of the FOBT used in the NORCCAP trial are unknown, test sensitivity and specificity of the one sample FOBT could not be estimated from literature. We therefore fitted sensitivity and specificity to observed positivity rate, and to detection rates of non-advanced adenomas, advanced adenomas, and carcinomas as observed in the NORCCAP trial (Supplement 1, Table 3, 4 and 5). We assumed that test characteristics of FS and colonoscopy do not differ greatly between settings, and therefore, test sensitivity of FS and follow-up colonoscopy and specificity of follow-up colonoscopy were based on literature.34 The test specificity of FS was adjusted based on the number of referrals after a negative test in the NORCCAP trial (Table 4). We simulated complete visualization of the recto-sigmoid colon in 97% of individuals, of the descending colon in 23%, and of the cecum in less than 1% of individuals (personal communication with G. Hoff). We simulated that colonoscopy examinations completely visualized the sigmoid in more than 99% of the cases and completely visualized the entire colon in 89% of the cases (personal communication with G. Hoff).
4.
Follow up years* | End of data retrieval ** | Overall CRC mortality | Overall CRC incidence | Distal CRC mortality | Distal CRC incidence | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HR | Control group | Screen group | HR | Control group | Screen group | HR | Control group | Screen group | HR | Control group | Screen group | ||
10–12 | 2011 | 0.68 | 40.5 | 27.8 | 0.85 | 142 | 120 | 0.62 | 21.6 | 13.4 | 0.82 | 78.2 | 64.0 |
11–13 | 2012 | 0.69 | 43.2 | 29.8 | 0.84 | 147 | 123 | 0.63 | 23.0 | 14.4 | 0.81 | 81.4 | 65.8 |
12–14 | 2013 | 0.70 | 45.8 | 31.9 | 0.83 | 153 | 127 | 0.63 | 24.4 | 15.5 | 0.80 | 84.5 | 67.8 |
13–15 | 2014 | 0.70 | 48.4 | 34.1 | 0.84 | 159 | 133 | 0.64 | 25.8 | 16.6 | 0.80 | 87.7 | 70.5 |
14–16 | 2015 | 0.71 | 51.1 | 36.4 | 0.84 | 164 | 138 | 0.65 | 27.1 | 17.7 | 0.81 | 90.8 | 73.2 |
15–17 | 2016 | 0.72 | 53.8 | 38.7 | 0.84 | 170 | 143 | 0.66 | 28.6 | 18.9 | 0.81 | 93.9 | 75.8 |
16–18 | 2017 | 0.73 | 56.5 | 41.0 | 0.84 | 175 | 147 | 0.67 | 30.1 | 20.1 | 0.81 | 96.9 | 78.1 |
Abbreviations: CRC, colorectal cancer; NORCCAP, Norwegian Colorectal Cancer Prevention; MISCAN, Microsimulation Screening Analysis; HR, hazard ratio
Numbers under control group and screen group presented per 100 000 person years.
The screening intervention was performed in 1999, 2000 and 2001. Since the closure date for data retrieval is the same for all participants, the number of follow-up years differs among the participants.
Last day of the year
Using MISCAN-Colon, we simulated four different cohorts of each 10 million individuals differing by study arm (control group, intervention group) and age group (50–54 and 55–64) to rule out any distortion caused by the stochastic nature of the model. Model predictions were then rescaled to the size of the NORCCAP trial population.
Validation targets
Our primary validation targets were the overall and distal CRC incidence and mortality rate and HRs of overall and distal CRC incidence and mortality at 10- to 12-year follow-up (depending on the year of trial inclusion), in the intervention group relative to the control group. We defined the rectum, rectosigmoid, and sigmoid colon as distal locations, consistent with reported NORCCAP trial results. In addition, to enable the predictive validity of MISCAN-Colon after publication of the next NORCCAP trial results to be tested, we calculated the expected HRs of overall and distal CRC incidence and mortality up to 18 years of follow-up.
In addition, we considered several secondary validation targets. We computed the cumulative probability of overall and distal CRC incidence and mortality during the study period for intervention and control group. We computed the yearly risk ratios (RRs) of overall CRC incidence and mortality in the intervention group relative to the control group, and yearly RRs for CRC incidence and distal CRC incidence in the adherers relative to the control group during the follow-up of the trial. We also compared the number of screen-detected cancers and adenomas and the number of follow-up colonoscopies. In addition, we explored model predicted stage distribution of all diagnosed cancers. Proportion per stage - localized (Dukes A and B) versus advanced (Dukes C and D) - were calculated and compared using a chi-squared test.
Model outcomes were considered consistent when predicted within 95% confidence intervals (95% CI) of the corresponding NORCCAP trial targets. Mathematical formulas for these outcomes are provided in Table 6 of Supplement 1.
RESULTS
Overall and distal CRC incidence and mortality rates and hazard ratios
During the 10- to 12-year follow up of the NORCCAP trial, 141.0 CRC cases per 100,000 person-years occurred in the control group (95% CI: 132.8–149.7) and 112.6 in the intervention group (95% CI: 99.3–127.7), resulting in a lower risk of CRC incidence in those invited to FS screening (HR=0.80, 95% CI: 0.70–0.92).10 Using MISCAN-Colon, we predicted an overall CRC incidence of 141.8 cases per 100,000 person-years in the control group and 120.1 cases in the intervention group (Table 1A, Figure 1). These predicted CRC incidence rates were similar to the trial results and resulted in a HR for CRC incidence with once-only FS screening (with or without once-only FOBT) versus no screening of 0.85, consistent with the NORCCAP trial results.
1.
Outcome | Source | HR | Per 100,000 person years | |
---|---|---|---|---|
Control | Screened | |||
CRC | NORCCAP trial | 0.73 (0.56, 0.94) | 43.1 (38.7, 48.1) | 31.4 (24.8, 39.7) |
mortality | MISCAN-Colon | 0.68 | 40.5 | 27.8 |
CRC incidence |
NORCCAP trial | 0.80 (0.70, 0.92) | 141.0 (132.8, 149.7) | 112.6 (99.3, 127.7) |
MISCAN-Colon | 0.85 | 141.8 | 120.1 | |
Outcome | Source | HR | Per 100,000 person years | |
Control | Screened | |||
Distal CRC | NORCCAP trial | 0.79 (0.55, 1.11) | 21.8 (18.7, 25.4) | 17.2 (12.6,23.5) |
mortality | MISCAN-Colon | 0.62 | 21.6 | 13.4 |
Distal CRC | NORCCAP-trial | 0.76 (0.63, 0.92) | 80.1 (74, 86.7) | 60.9 (51.4, 72.2) |
incidence | MISCAN-Colon | 0.82 | 78.2 | 64.0 |
Abbreviations: CRC, colorectal cancer; NORCCAP, Norwegian Colorectal Cancer Prevention Trial; MISCAN, Microsimulation Screening Analysis; HR, hazard ratio
During the same simulated time frame in the NORCCAP trial, 43.1 CRC deaths per 100,000 person-years were reported in the control group (95% CI: 38.7–48.1) and 31.4 in the intervention group (95% CI: 24.8–39.7), showing in those invited to once-only FS screening a lower probability of dying of CRC (HR=0.73, 95% CI: 0.56–0.94).10 MISCAN-Colon predicted 40.5 CRC deaths per 100,000 person-years in the control group and 27.8 in the intervention group, similar to the trial results. In addition, among those invited to FS screening, MISCAN-Colon predicted a lower probability of dying of CRC (HR=0.68), consistent with trial results (Table 1A, Figure 1).
When considering only trial results on distal CRC incidence and mortality, MISCAN-Colon performances were similar: model predicted distal CRC incidence and mortality rates, and HRs of distal CRC incidence and mortality, were all consistent with the observed trial results (Table 1B, Figure 1).
Cumulative probability of overall and distal CRC incidence and mortality
The majority of MISCAN-Colon predictions of the cumulative probability of overall CRC incidence and mortality and distal CRC mortality in the control and intervention group were consistent with the NORCCAP trial results (Figure 2). MISCAN-Colon underestimated some of the cumulative probabilities in the first half of the trial follow-up. In the last years of follow-up, the predicted cumulative probability of overall CRC incidence in control and intervention group increased more than expected based on trial results, leading to a small but significant difference in the final years.
Yearly risk ratios of CRC incidence and mortality
The majority of the MISCAN-Colon predictions were consistent with the NORCCAP trial results regarding yearly RRs for overall CRC incidence of the intervention group relative to the control group (Figure 3). MISCAN-Colon significantly overestimated relative overall CRC incidence risk in the intervention group in year 1 and 10 and significantly underestimated this risk in year 8. Regarding yearly RRs for overall CRC mortality of the intervention group relative to the control group, the MISCAN-Colon predictions were consistent with the NORCCAP trial results.
Similar patterns were observed when comparing the RRs for overall CRC incidence and distal CRC incidence of the adherers to screening relative to the control group (Figure 4). MISCAN-Colon overestimated relative overall CRC incidence risk in year 1 and underestimated this risk in year 8. The predictions of relative distal CRC incidence risk in adherers were all within the confidence intervals of the NORCCAP trial results.
Disease detection at screening
MISCAN-Colon predicted 41 screen-detected CRCs, which was consistent with the NORCCAP trial results. For the number of follow-up colonoscopies and screen-detected adenomas, the MISCAN-Colon predictions were significantly lower than what was actually observed. While in the NORCCAP trial 2524 (95% CI: 2432–2616) colonoscopies were performed and 2210 (95% CI: 2123–2297) adenomas were detected, MISCAN-Colon predicted 2408 colonoscopies performed and 2105 adenomas detected (Table 2).
2.
Outcome | Source | Number | 95% interval |
---|---|---|---|
Diagnostic colonoscopies | NORCCAP trial | 2524 | (2432, 2616) |
MISCAN-Colon | 2408 | ||
CRC detected at screening | NORCCAP trial | 41 | (28, 54) |
MISCAN-Colon | 52 | ||
Adenomas detected at colonoscopy | |||
Total | NORCCAP trial | 2210 | (2123,2297) |
MISCAN-Colon | 2105 | ||
Advanced adenomas | NORCCAP trial | 582 | (535, 629) |
MISCAN-Colon | 519 | ||
Non-advanced adenomas | NORCCAP trial | 1628 | (1552, 1704) |
MISCAN-Colon | 1586 |
Abbreviations: CRC, colorectal cancer; NORCCAP, Norwegian Colorectal Cancer Prevention Trial; MISCAN, Microsimulation Screening Analysis; HR, hazard ratio
Stage distribution
For stage distribution, MISCAN-Colon predictions of both intervention and control group were similar to those observed in the trial (Table 3).
3.
NORCCAP trial* | MISCAN-Colon | ||||
---|---|---|---|---|---|
Control Group | No. | (%) | No. | (%) | P value |
Localized CRC | 470 | (45.5%) | 489 | (47.3%) | |
Advanced CRC | 562 | (54.5%) | 545 | (52.7%) | 0.45 |
Intervention group | |||||
Localized CRC | 117 | (49.4%) | 155 | (52.6%) | |
Advanced CRC | 120 | (50.6%) | 139 | (47.4%) | 0.50 |
Abbreviations: CRC, colorectal cancer; NORCCAP, Norwegian Colorectal Cancer Prevention Trial; MISCAN, Microsimulation Screening Analysis
Unclassified cancers in the control group (N=16) and in the intervention group (N=4) were excluded from this table.
Prediction of future follow-up results
For the 15- to 17- year follow-up of the NORCCAP trial, MISCAN-Colon predicted a HR of 0.84 for overall CRC incidence, a HR of 0.72 for overall CRC mortality, a HR of 0.81 for distal CRC incidence and a HR of 0.66 for distal CRC mortality (Table 4). NORCCAP trial results for these years are not yet available.
Discussion
In this study, we tested the validity of the MISCAN-Colon model using data from the NORCCAP trial. Regarding our primary validation targets, we showed that MISCAN-Colon can accurately estimate the impact of an once-only FS screening trial on CRC incidence and mortality. In addition, we expect the follow-up results of the NORCCAP trial to be published in the near future and then we will be able to compare our predictions to these results to test predictive validity of MISCAN-Colon. Regarding our secondary validation targets, MISCAN predictions for cumulative probabilities of incidence and mortality and yearly RRs of CRC incidence and mortality in the intervention group relative to the control group were in line with the NORCCAP trial results as well. The predicted number of screen-detected CRCs was also similar to the number found in the NORCCAP trial but the model significantly underestimated the number of screen-detected adenomas.
It is essential that microsimulation models, such as MISCAN-Colon, are validated regularly to provide transparency regarding their performance. External validation requires data of (large) clinical trials with sufficient follow-up.8, 9 Previously, MISCAN-Colon was validated externally using the results of another once-only FS screening trial, namely the UKFSS trial.5 MISCAN-Colon then underestimated CRC incidence reduction due to screening, and overestimated the screen-detection of adenomas and cancers in the intervention arm. These outcomes suggested that the assumed values for the duration of adenoma formation to symptomatic CRC were too short in MISCAN-Colon. As a consequence of these validation findings, MISCAN-Colon was re-calibrated using the UKFSS trial data, resulting in a longer average duration of adenoma progression to symptomatic CRC. In the current validation study, MISCAN-Colon predictions were highly similar to the NORCCAP trial results, suggesting that the re-calibrated MISCAN-Colon allows for accurate predictions of CRC incidence and mortality reduction of FS screening.
Since age-specific CRC incidence, stage distribution and survival differ per country, region and timeframe, we adjusted the model to the Norwegian population during the NORCCAP trial period. The adjustments specific for Norway were independent from the trial data. In two instances we decided to adjust the model inputs based on the control group of the NORCCAP trial, which makes this external validation partially dependent (as described in Eddy et al. 20129). First of all, we noticed that CRC incidence in the NORCCAP trial control group was 11% lower than in the Norwegian CRC registry, which may be attributed to regional differences in CRC incidence. We therefore lowered the age-specific adenoma onset for all ages with 11%. This adjustment may impact some absolute outcomes such as CRC incidence and mortality rate in intervention and control group, but not the relative impact of the screening intervention. Second, since CRC incidence in non-adherers may be higher due to ‘healthy screenee bias’ we compared the control group CRC incidence to the CRC incidence in non-adherers in the intervention group. Consequently we raised the age-specific adenoma onset for all ages in the non-adherers with 5% and lowered this multiplier for adherers to screening such that the modelled overall CRC risk in the intervention group was equal to the CRC risk in the control group. This adjustment had no substantial impact on the validation targets. In supplement 3 the results without these corrections based on CRC incidence in the control group and in the non-adherers are shown, and indeed, some absolute outcomes are different but the relative outcomes are largely the same. These type of adjustments are needed to ensure appropriate external validation of any screening simulation model. Importantly, we did not use any information on screening participants of the NORCCAP trial in our model adjustments. Therefore, we consider this analysis to be an external validation of screening effectiveness and unobservable parameter values in MISCAN-Colon.
Despite the encouraging findings of well-predicted HRs, some secondary outcomes were not consistent with the NORCCAP trial. We observed three discrepancies between the simulated and observed data. First, we observed that the number of screen-detected adenomas predicted by MISCAN-Colon was lower than the actual number of screen-detected adenomas in the NORCCAP trial, while incidence (reduction) was correctly predicted. We have three possible explanations for these seemingly conflicting outcomes. First, the outcomes may not be as conflicting as they seem. Having too few adenomas in the model implies that we may have overestimated progression of distal adenomas to match distal cancer incidence, which is consistent with the slightly underestimated distal CRC incidence reduction as simulated by MISCAN-Colon compared to that observed. Second, we lowered the CRC risk in the model to reflect the lower incidence in the control group compared to the Norwegian incidence rate. However, we don’t know whether the lower risk holds for all ages, or just for those ages included in the trial. Consequently, we may have underestimated the CRC risk at older ages, and thus the prevalence of adenomas in the ages before that (i.e. in the ages being screened). Finally, the NORCCAP trial is just one trial with data on adenoma detection rates. In previous validations to other studies, model-predicted adenoma detection rates have been close to those observed. The NORCCAP data may be an outlier in this respect, as for instance, the distal adenoma detection rate was 12.1% in the UKFSS trial, and 17.4% in the NORCCAP trial.10, 35 MISCAN-Colon predicted a distal adenoma detection rate of 15.4%.
The second discrepancy we found was that not all predicted yearly RRs of overall and distal CRC incidence and mortality of the intervention group relative to the control group were within the confidence intervals of the NORCCAP trial results. We suggest two explanations for this. First, performance dates of surveillance colonoscopies were not registered in the study. We suspect that incorrect predictions of yearly RRs in the intervention group relative to the control group (as shown in Figure 3) are related to the adherence to surveillance after a positive colonoscopy. MISCAN-Colon simulated surveillance at exactly 5 and 10 years after initial screening, which is consistent with Norwegian screening guidelines.33 However, the RRs of CRC incidence in the intervention group of the NORCCAP trial showed peaks 1 to 3 years earlier than we would expect if surveillance would have been performed at 5 and 10 years after a positive colonoscopy. It seems plausible that some of the participants might have undergone surveillance 1 to 3 years earlier. Second, the observed yearly RRs of mortality in the NORCCAP trial fluctuate. Therefore, rather than an underestimation of MISCAN-Colon of the RR of mortality in the intervention group compared to the control group in year 7 (as shown in Figure 3), the high mortality in the intervention group in that year may have been the result of chance.
Last, despite that all the MISCAN-Colon predictions are within the 95% confidence intervals, the prediction for distal CRC mortality deviates considerably from that observed. Although this deviation could be interpreted as a lack of fit, one should be careful with such an assessment. Confidence intervals reflect the level of plausibility of each estimation. It means that if a certain number of trials similar to NORCCAP were performed, in 95% of these trials distal CRC mortality reduction would have been reported in the 95% CI of the NORCCAP trial. From an inference point of view, the 95% CI represents the interval for which we are 95% confident that the true value falls within its limits. Since the numbers of distal CRC deaths occurring in both intervention and control group in the NORCCAP trial are very low (substantially less than the number of overall and distal CRC cases and the number of overall CRC deaths), this wide confidence interval reflects the uncertainty of the results. Therefore, it is too early to conclude whether our MISCAN-Colon predictions are correct or incorrect. Validation against the pooled results of several sigmoidoscopy trials such as been published in Holme et al. (2017)36 is an obvious next step to assess model fit against distal CRC mortality.
Despite the increased use of simulation models to inform cancer screening programs, very few of those models have been extensively validated. We searched for other publications regarding external validation of microsimulation models used for predicting cancer screening effectiveness. We found that publications explicitly demonstrating external validation of cancer screening microsimulation models are scarce and, to the best of our knowledge, publications regarding predictive validation were non-existent. In a systematic review of Koleva et al. (2015),37 it was concluded that none of the models used for breast cancer screening were externally validated. However this finding may be nuanced by arguing that external validation is sometimes performed without publishing the results.38 In addition to the review of Koleva et al., we found two external validation studies of models on ovarian cancer screening39, 40 and several on lung cancer screening.41–44 Although these models are designed to predict the impact of interventions, only one of these was validated for important screening effectiveness outcomes such as incidence and mortality reduction. In the current study, we validated, besides mortality and incidence reduction outcomes, a variety of other intermediate outcomes. These intermediate outcomes are also highly relevant for the validity of a model predicting cancer screening effectiveness, as they may lead to very different predictions with respect to the cost-effectiveness of screening. In our opinion, this elaborate validation is an important strength of the current work. In addition, assessing the predictive validity of the model is an additional novel feature in the validation of cancer screening simulation models.
Irrespective of these strengths, two limitations are noteworthy. First, we did not vary the sensitivity of screening and follow-up tests by location of adenomas. Although studies indicate that the sensitivity of FOBT and follow-up colonoscopy for right-sided premalignant lesions in the colon may differ from the sensitivity for left-sided premalignant lesions, there is not yet consensus on this topic.45–47 Second, although this study offers promising evidence of the validity of our model, it does not directly imply that MISCAN-Colon predictions are also valid for other settings, such as other screening trials with different screening tests. Validation of MISCAN-Colon is a continuous process, that will be frequently repeated whenever new important results regarding CRC screening RCTs are published. In this continuous process, we have already validated our model using 5 of 9 RCTs included in the Cochrane Library on the benefits of CRC screening:48 three out of four guaiac FOBT trials;49 and, including this study, two out of five FS trials.5, 50 Model validation using two of the remaining FS trials may not be feasible or useful considering that the interpretation of one of the trials may be affected by the frequent occurrence of opportunistic screening among the trial population51 and the other includes only a small number of participants.48 We are currently in the process of validating the model against the remaining FS trial50 as well as data from the Italian fecal immunochemical test (FIT) screening program performed in Florence during 1993–2008 (mean follow-up: 11 years). The first results are promising, further indicating the validity of MISCAN-Colon for FS and FIT screening effectiveness.
In conclusion, this study demonstrates that the MISCAN-Colon model can accurately estimate the main outcomes of a trial that measures the effectiveness of once-only FS CRC screening. These findings, in combination with our other validation results, suggest that MISCAN-Colon is a useful decision-making tool for public health organizations and governments involved in CRC screening. Furthermore, we made predictive validation possible by presenting our model outcomes before publication of trial results. Finally, by publishing the results of this validation study we can provide more transparency regarding the performance of modelling in general, which is crucial for the role of modelling in public health decision making.
Supplementary Material
Acknowledgments:
The authors thank the leaders of the NORCCAP trial for providing data regarding essential model inputs: Ø. Holme; M. Løberg; M. Bretthauer; G. Hoff
In addition, we thank the Norwegian Cancer registry for providing data regarding CRC incidence and mortality in Norway.
Funding: Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA199335. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. In addition, this publication was made possible by the National Institute for Public Health and the Environment in the Netherlands, which supported part of the development of the MISCAN-Colon model
Footnotes
Competing interests: All authors disclose no conflicts of interest.
References
- 1.U. S. Preventive Services Task Force, Bibbins-Domingo K, Grossman DC, et al. Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement. Jama. 2016;315: 2564–2575. [DOI] [PubMed] [Google Scholar]
- 2.European Commissio. European Guidelines for Quality Assurance in Colorectal Cancer Screening and Diagnosis. 1 ed: LuxembourgPublications Office of the European Union, 2010. [Google Scholar]
- 3.Van Hees F, Zauber AG, Van Veldhuizen H, et al. The value of models in informing resource allocation in colorectal cancer screening: the case of the Netherlands. Gut. 2015;64: 1985–1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Knudsen AB, Zauber AG, Rutter CM, et al. Estimation of Benefits, Burden, and Harms of Colorectal Cancer Screening Strategies: Modeling Study for the US Preventive Services Task Force. Jama. 2016;315: 2595–2609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rutter CM, Knudsen AB, Marsh TL, et al. Validation of Models Used to Inform Colorectal Cancer Screening Guidelines: Accuracy and Implications. Med Decis Making. 2016;36: 604–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Holme O, Loberg M, Kalager M, et al. Long-Term Effectiveness of Sigmoidoscopy Screening on Colorectal Cancer Incidence and Mortality in Women and Men: A Randomized Trial. Ann Intern Med. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Clark JC, Collan Y, Eide TJ, et al. Prevalence of polyps in an autopsy series from areas with varying incidence of large-bowel cancer. Int J Cancer. 1985;36: 179–186. [DOI] [PubMed] [Google Scholar]
- 8.Caro JJ, Eddy DM, Hollingworth W, et al. ISPOR-SMDM task force’s recommendations for good modeling practices-reply to letter to the editor by Corro Ramos. Value Health. 2013;16: 1108. [DOI] [PubMed] [Google Scholar]
- 9.Eddy DM, Hollingworth W, Caro JJ, et al. Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force−-7. Value Health. 2012;15: 843–850. [DOI] [PubMed] [Google Scholar]
- 10.Holme O, Loberg M, Kalager M, et al. Effect of flexible sigmoidoscopy screening on colorectal cancer incidence and mortality: a randomized clinical trial. Jama. 2014;312: 606–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bretthauer M, Gondal G, Larsen K, et al. Design, organization and management of a controlled population screening study for detection of colorectal neoplasia: attendance rates in the NORCCAP study (Norwegian Colorectal Cancer Prevention). Scand J Gastroenterol. 2002;37: 568–573. [DOI] [PubMed] [Google Scholar]
- 12.Gondal G, Grotmol T, Hofstad B, Bretthauer M, Eide TJ, Hoff G. The Norwegian Colorectal Cancer Prevention (NORCCAP) screening study: baseline findings and implementations for clinical work-up in age groups 50–64 years. Scand J Gastroenterol. 2003;38: 635–642. [DOI] [PubMed] [Google Scholar]
- 13.Hoff G, Grotmol T, Skovlund E, Bretthauer M, Norwegian Colorectal Cancer Prevention Study G. Risk of colorectal cancer seven years after flexible sigmoidoscopy screening: randomised controlled trial. Bmj. 2009;338: b1846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Loeve F BR, van Ballegooijen M, van Oortmarssen GJ, Habbema JDF. Final Report MISCANCOLON, Microsimulation Model for Colorectal Cancer: Report to the National Cancer Institute Project. Rotterdam, The Netherlands: Department of Public Health, Erasmus University, 1998. [Google Scholar]
- 15.Loeve F, Boer R, Zauber AG, et al. National Polyp Study data: evidence for regression of adenomas. Int J Cancer. 2004;111: 633–639. [DOI] [PubMed] [Google Scholar]
- 16.van Hees F, Habbema JD, Meester RG, Lansdorp-Vogelaar I, van Ballegooijen M, Zauber AG. Should colorectal cancer screening be considered in elderly persons without previous screening? A cost-effectiveness analysis. Ann Intern Med. 2014;160: 750–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rutter CM, Johnson EA, Feuer EJ, Knudsen AB, Kuntz KM, Schrag D. Secular Trends in Colon and Rectal Cancer Relative Survival. Journal of the National Cancer Institute. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Blatt LJ. Polyps of the colon and rectum. Diseases of the Colon & Rectum. 1961;4: 277–282. [Google Scholar]
- 19.Arminski TC, McLean DW. Incidence and distribution of adenomatous polyps of the colon and rectum based on 1,000 autopsy examinations. Diseases of the Colon & Rectum. 1964;7: 249–261. [DOI] [PubMed] [Google Scholar]
- 20.Bombi JA. Polyps of the colon in barcelona, Spain. An autopsy study. Cancer. 1988;61: 1472–1476. [DOI] [PubMed] [Google Scholar]
- 21.Chapman I Adenomatous Polypi of Large Intestine: Incidence and Distribution. Annals of Surgery. 1963;157: 223–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jass JR, Young PJ, Robinson EM. Predictors of presence, multiplicity, size and dysplasia of colorectal adenomas. A necropsy study in New Zealand. Gut. 1992;33: 1508–1514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Johannsen LGK, Momsen O, Jacobsen NO. Polyps of the Large Intestine in Aarhus, Denmark. Scandinavian Journal of Gastroenterology. 1989;24: 799–806. [DOI] [PubMed] [Google Scholar]
- 24.Rickert RR, Auerbach O, Garfinkel L, Hammond EC, Frasca JM. Adenomatous lesions of the large bowel: an autopsy survey. Cancer. 1979;43: 1847–1857. [DOI] [PubMed] [Google Scholar]
- 25.Surveillance, Epidemiology, and End Results (SEER) Program SEER* Stat Database: Incidence—SEER 9 Regs Limited-Use, Nov 2002 Sub (1973–2002) National Cancer Institute, Division of Cancer Control and Population Sciences, Surveillance Research Program, Cancer Statistics Branch. Released April 2003, based on the November 2002 submission, 2003. Bethesda, MD: National Cancer Institute, 2003. [Google Scholar]
- 26.Vatn MH, Stalsberg H. The prevalence of polyps of the large intestine in Oslo: An autopsy study. Cancer. 1982;49: 819–825. [DOI] [PubMed] [Google Scholar]
- 27.Williams AR, Balasooriya BA, Day DW. Polyps and cancer of the large bowel: a necropsy study in Liverpool. Gut. 1982;23: 835–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hardcastle JD, Chamberlain JO, Robinson MHE, et al. Randomised controlled trial of faecal-occult-blood screening for colorectal cancer. The Lancet. 1996;348: 1472–1477. [DOI] [PubMed] [Google Scholar]
- 29.Jørgensen OD, Kronborg O, Fenger C. A randomised study of screening for colorectal cancer using faecal occult blood testing: results after 13 years and seven biennial screening rounds. Gut. 2002;50: 29–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Mandel JS, Church TR, Ederer F, Bond JH. Colorectal Cancer Mortality: Effectiveness of Biennial Screening for Fecal Occult Blood. Journal of the National Cancer Institute. 1999;91: 434–437. [DOI] [PubMed] [Google Scholar]
- 31.Lansdorp-Vogelaar I, van Ballegooijen M, Boer R, Zauber A, Habbema JDF. A novel hypothesis on the sensitivity of the fecal occult blood test. Cancer. 2009;115: 2410–2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hardcastle JD, Chamberlain JO, Robinson MH, et al. Randomised controlled trial of faecal-occult-blood screening for colorectal cancer. Lancet. 1996;348: 1472–1477. [DOI] [PubMed] [Google Scholar]
- 33.Hoff G, Sauar J, Hofstad B, Vatn MH. The Norwegian guidelines for surveillance after polypectomy: 10-year intervals. Scand J Gastroenterol. 1996;31: 834–836. [DOI] [PubMed] [Google Scholar]
- 34.Van Rijn JC, Reitsma JB, Stoker J, Bossuyt PM, Van Deventer SJ, Dekker E. Polyp miss rate determined by tandem colonoscopy: a systematic review. The American journal of gastroenterology. 2006;101: 343–350. [DOI] [PubMed] [Google Scholar]
- 35.Atkin WS, Cook CF, Cuzick J, et al. Single flexible sigmoidoscopy screening to prevent colorectal cancer: baseline findings of a UK multicentre randomised trial. Lancet. 2002;359: 1291–1300. [DOI] [PubMed] [Google Scholar]
- 36.Holme O, Schoen RE, Senore C, et al. Effectiveness of flexible sigmoidoscopy screening in men and women and different age groups: pooled analysis of randomised trials. Bmj. 2017;356: i6673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Koleva-Kolarova RG, Zhan Z, Greuter MJ, Feenstra TL, De Bock GH. Simulation models in population breast cancer screening: A systematic review. Breast. 2015;24: 354–363. [DOI] [PubMed] [Google Scholar]
- 38.de Koning HJ, Alagoz O, Schechter CB, van Ravesteyn NT. Reply to Koleva-Kolarova et al. The Breast. 2016;27: 182–183. [DOI] [PubMed] [Google Scholar]
- 39.Urban N, Drescher C, Etzioni R, Colby C. Use of a stochastic simulation model to identify an efficient protocol for ovarian cancer screening. Control Clin Trials. 1997;18: 251–270. [DOI] [PubMed] [Google Scholar]
- 40.Havrilesky LJ, Sanders GD, Kulasingam S, et al. Development of an ovarian cancer screening decision model that incorporates disease heterogeneity: implications for potential mortality reduction. Cancer. 2011;117: 545–553. [DOI] [PubMed] [Google Scholar]
- 41.Raji OY, Duffy SW, Agbaje OF, et al. Predictive accuracy of the Liverpool Lung Project risk model for stratifying patients for computed tomography screening for lung cancer: a case-control and cohort validation study. Ann Intern Med. 2012;157: 242–250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tammemagi CM, Pinsky PF, Caporaso NE, et al. Lung cancer risk prediction: Prostate, Lung, Colorectal And Ovarian Cancer Screening Trial models and validation. J Natl Cancer Inst. 2011;103: 1058–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ten Haaf K, Jeon J, Tammemagi MC, et al. Risk prediction models for selection of lung cancer screening candidates: A retrospective validation study. PLoS Med. 2017;14: e1002277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Katki HA, Kovalchik SA, Berg CD, Cheung LC, Chaturvedi AK. Development and Validation of Risk Models to Select Ever-Smokers for CT Lung Cancer Screening. Jama. 2016;315: 2300–2311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Brenner H, Niedermaier T, Chen H. Strong subsite-specific variation in detecting advanced adenomas by fecal immunochemical testing for hemoglobin. Int J Cancer. 2017;140: 2015–2022. [DOI] [PubMed] [Google Scholar]
- 46.de Wijkerslooth TR, Stoop EM, Bossuyt PM, et al. Immunochemical fecal occult blood testing is equally sensitive for proximal and distal advanced neoplasia. Am J Gastroenterol. 2012;107: 1570–1578. [DOI] [PubMed] [Google Scholar]
- 47.Brenner H, Hoffmeister M, Arndt V, Stegmaier C, Altenhofen L, Haug U. Protection From Right- and Left-Sided Colorectal Neoplasms After Colonoscopy: Population-Based Study. JNCI: Journal of the National Cancer Institute. 2010;102: 89–95. [DOI] [PubMed] [Google Scholar]
- 48.Holme O, Bretthauer M, Fretheim A, Odgaard-Jensen J, Hoff G. Flexible sigmoidoscopy versus faecal occult blood testing for colorectal cancer screening in asymptomatic individuals. Cochrane Database Syst Rev. 2013: CD009259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lansdorp-Vogelaar I, van Ballegooijen M, Boer R, Zauber A, Habbema JD. A novel hypothesis on the sensitivity of the fecal occult blood test: Results of a joint analysis of 3 randomized controlled trials. Cancer. 2009;115: 2410–2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schoen RE, Pinsky PF, Weissfeld JL, et al. Colorectal-cancer incidence and mortality with screening flexible sigmoidoscopy. N Engl J Med. 2012;366: 2345–2357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Doubeni CA. The Impact of Colorectal Cancer Screening on the United States Population: is it time to celebrate? Cancer. 2014;120: 2810–2813. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.