The use of magnetic resonance imaging to screen women with high risk mutations in the genes associated with breast cancer has raised debate on what constitutes sufficient evidence for the efficacy of new screening tests.1-4 The gold standard is evidence from randomised trials that early detection reduces mortality, as is the case for mammography and breast cancer,5 but how should we evaluate new tests that might detect cancer earlier?
Showing that a new test is more sensitive than others suggests that it has promise as a possible screening test,6 but detecting more apparent cases does not necessarily mean that using the test routinely will lead to a further reduction in breast cancer deaths. To fulfil the criteria for an effective screening test, the additional cancers detected must include ones that would both progress during the patient's lifetime and be curable by earlier treatment. The extra cancers picked up by new tests may count as cancers histopathologically but might not progress to cause symptoms in the women's lifetime: thus, a new test might lead to more overdetection rather than improved outcomes (figure). Overdetection of ductal carcinoma in situ (DCIS) is well documented,7 but it may also occur with cancers that seem, histologically, “invasive.”8,9 Overdetection may cause harm through unnecessary labelling and treatment of patients as having a cancer that, without screening, might never have been diagnosed.
Figure 1.
Balance between deaths averted and overdetection by more sensitive test
Overdetection can be identified best in a randomised controlled trial. Screening for several years should yield a higher average incidence of cancer in the screened group than in an unscreened control group during the years of screening. Once screening stops, the annual incidence of cancer in the screened group should drop below that in the unscreened group, and the eventual total number of cancers detected in the groups should equalise.10 A persisting excess of cancers in the screened group represents overdetection, as shown in the Malmö mammographic screening trial, for which the estimate of overdetection in women aged 55-69 at randomisation, followed for 15 years after the end of the trial, was 10% for all breast cancers and 7% for invasive breast cancers.11
There are three main research designs for evaluating new screening tests for cancer (table). Firstly, randomised controlled trials with long term follow-up provide the best evidence for comparing mortality from cancer among patients having different screening tests. They are unwarranted, however, once there is evidence that early detection confers benefit. Secondly, randomised controlled trials with short term follow-up can be used to compare interval cancer rates, the proportion of women whose screening yields negative results but who then present with cancer before the next scheduled screening test.12 Reducing the rate of interval cancer rates is crucial, representing the potential benefit of early detection rather than overdetection. To assess the impact of early detection further, such studies can also compare the rates of advanced cancers detected by subsequent screening rounds.
Table 1.
Methods of evaluating new and existing technology in cancer screening
| Measures compared between tests | Randomised controlled trial of mortality benefit | Randomised controlled trial of interval cancer | Comparison of test accuracy (cross sectional) |
|---|---|---|---|
| Sensitivity | Yes | Yes | Yes |
| Stage (size and nodal status) | Yes | Yes | Yes |
| Interval cancer rate | Yes | Yes | No |
| Mortality | Yes | Not directly | Not directly |
| Duration of follow-up | Decades (10-20 years) | 2-3 years | None |
| Sample size requirement | Large | Large | Small |
| Avoids overestimate of benefit because of overdetection | Yes | Yes | No |
Thirdly, cross sectional studies can be used to compare the sensitivities of different tests by comparing cancer detection rates in people randomised to one or other test, or by paired studies in which people have both tests. In paired studies, sensitivities are estimated as the number of cancers each test detects divided by the total number of cancers detected by either test. Assessing relative sensitivities is valid, despite the fact that cross sectional studies do not provide data on follow-up, because missed cancers will be common to both tests.13 But, contrary to others' suggestions,1 the interval cancer rate for each screening test cannot be obtained from such paired studies, even if followed over time, because cancers detected by either test will be treated: the only interval cancers are those which neither test detects.
A new screening test will seem more sensitive than an older one when it detects a larger proportion of the cancers that are collectively detected using both tests. At one extreme this apparently greater sensitivity could be due simply to overdetection. At the other extreme, all the extra cancers detected might have progressed if undetected and their earlier detection should lower mortality. The increase in cancers detected by a new screening test may reflect the fact that the test detects cancers which are smaller or of a lower grade of malignancy. Without comparing interval cancer rates, the extent to which such a shift in grade reflects clinically relevant earlier detection or overdetection will remain unclear.
Studies comparing new breast screening tests with mammography2-6 tend to use paired cross sectional designs. This may be adequate if the tests being compared are similar (for example, digital mammography versus film mammography 14) or if the aim of the comparison is to establish equivalence. More sensitive screening tests which differ substantially from the comparator tests should be evaluated in randomised studies with short term follow-up over two or three years. Randomised controlled trials of the accuracy of alternative screening tests, such as those evaluating different tests for faecal occult blood in a programme for bowel cancer screening, are also practical and realistic.15
Critics may consider trials to detect interval cancer rates unnecessary or even unethical in people who are at substantially increased risk of developing cancer—for example, women at high risk of breast cancer because of gene mutations. But rigorous scientific evaluation is both ethical and essential to establish that a test does more good than harm, whether for the general population of women or for those with a greater risk of breast cancer.
References
- 1.Yaffe M. What should the burden of proof be for acceptance of a new breast-cancer screening technique? Lancet 2004;364: 1111-2 [DOI] [PubMed] [Google Scholar]
- 2.Robson ME, Offit K. Breast MRI for women with hereditary cancer risk. JAMA 2004;292: 1368-70. [DOI] [PubMed] [Google Scholar]
- 3.Liberman L. Breast cancer screening with MR—what are the data for patients at high risk? N Engl J Med 2004;351: 497-500. [DOI] [PubMed] [Google Scholar]
- 4.Elmore JG, Armstrong K, Lehman CD, Fletcher SW. Screening for breast cancer. JAMA 2005;293: 1245-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.International Agency for Research on Cancer. IARC handbooks of cancer prevention, volume 7: breast cancer screening. Lyon: IARC Press, 2002.
- 6.Irwig L, Houssami N, van Vliet C. New technologies in screening for breast cancer: a systematic review of their accuracy. Br J Cancer 2004;90: 2118-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ernster V, Barclay J, Kerlikowske K, Grady D, Henderson I. Incidence of and treatment for ductal carcinoma in situ of the breast. JAMA 1996;275: 913-8. [PubMed] [Google Scholar]
- 8.Zahl PH, Strand BH, Maehlen J. Incidence of breast cancer in Norway and Sweden during introduction of nationwide screening: prospective cohort study. BMJ 2004;328: 921-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Paci E, Warwick J, Falini P, Duffy SW. Overdiagnosis in screening: is the increase in breast cancer incidence rates a cause for concern? J Med Screen 2004;11: 23-27. [DOI] [PubMed] [Google Scholar]
- 10.Boer R, Warmerdam P, de Koning H, van Oortmarssen G. Letter: Extra incidence caused by mammographic screening. Lancet 1994;343: 979. [DOI] [PubMed] [Google Scholar]
- 11.Taylor R, Supramaniam R, Rickard M, Estoesta J, Moreira C. Interval breast cancers in New South Wales, Australia, and comparisons with trials and other mammographic screening programmes. J Med Screen 2002;9: 20-5. [DOI] [PubMed] [Google Scholar]
- 12.Zackrisson S, Andersson I, Janzon L, Manjer J, Garne JP. Rate of over-diagnosis of breast cancer 15 years after end of Malmö mammographic screening trial: follow-up study. BMJ 2006;332: 689-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chock C, Irwig L, Berry G, Glasziou P. Comparing dichotomous screening tests when individuals negative on both tests are not verified. J Clin Epidemiol 1997;50: 1211-7. [DOI] [PubMed] [Google Scholar]
- 14.Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum JK, Acharyya S, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005;353: 1773-83. [DOI] [PubMed] [Google Scholar]
- 15.Australian Government Department of Health and Ageing. Bowel cancer screening pilot program. www.cancerscreening.gov.au/bowel/bcaust/pilot.htm (accessed 15 Nov 2005).

