Skip to main content
Canadian Family Physician logoLink to Canadian Family Physician
. 2022 Nov;68(11):815–822. doi: 10.46747/cfp.6811815

Assessing new screening tests

Panacea or profligate?

James A Dickinson 1,, Guylène Thériault 2, Roland Grad 3, Neil R Bell 4, Olga Szafran 5
PMCID: PMC9833156  PMID: 36376046

Enthusiastic early adopters of new screening tests or questionnaires often recommend that physicians start using these as soon as possible. Sometimes patients return from a visit with another health practitioner with the result of a new test purporting, for example, to reveal multiple food intolerances.1 The public is also bombarded with direct-to-consumer advertising for “new” screening tests. We strive to be at the forefront of medicine and ensure that patients benefit from the latest science, but new screening tests should be assessed cautiously. This article explores how physicians should respond to new proposals to screen for disease.

Case descriptions

Case 1. A resident in your practice suggests that you should be doing anal Papanicolaou tests for men who have ever had receptive anal intercourse. He had a student placement in a clinic that does these tests on all such patients.

Case 2. Your patient comes in with a newspaper report discussing a new blood test for circulating cell-free DNA that can detect multiple cancers long before they will manifest clinically.2 She has seen advertisements from a US company selling such a test and wants to know how to get it.

Case 3. One of your patients attended a private medical clinic for an executive medical assessment, paid for by her employer, where they did a series of tests and recommended that she do a whole-body magnetic resonance imaging (MRI) since they offer it at a special discount. She asks you what you think about this opportunity.

Case 4. A young man was having difficulty in his coursework at university. He went to the student support service for advice on studying. They asked him to complete a series of questionnaires and told him that he had scored high on his depression screen, so he would likely benefit from antidepressants and should ask you for a prescription.

How to assess a new screening test

Screening for disease is complex. It uses a test—essentially a sieve—to separate individuals with a higher chance of having a condition from the majority who are at lower risk. Sometimes screening is not focused on current disease but attempts to predict the development of a disease that will kill or disable the person at some time in the future—the “risk factor” approach. Generally, except for cardiovascular or metabolic disease, any specific disease will occur only in a small proportion of the population. Most of those for whom screening tests are used have no prospect of benefit; however, those with a positive result owing to a false positive or an overdiagnosis may endure the workup and treatment that follow, even though the disease would not progress to cause harm during their life. Therefore, we need to carefully weigh the potential benefits (ie, the chance of being 1 of the small minority with the disease) against potential harms (ie, the severity and likelihood of side effects from the cascade that follows).

A seminal 1968 paper from the World Health Organization reported on the principles of screening.3 These ideas have stood the test of time and were recently updated by a team in Ontario.4 Box 1 summarizes 12 consolidated principles that emphasize screening is not just a test, but a complex process that must be applied in the hope of improving outcomes—whether for a population (population screening) or a specified subgroup, usually identified during clinical contacts (case finding).4 We start the process with an individual who is not yet a patient, at least in regard to the issue screened for, but who may become one as a result.

Box 1. Summary of consolidated screening principles.

Disease or condition principles

  • 1.

    Epidemiology of the disease should be adequately understood and the disease should be an important health problem

  • 2.

    Natural history should be understood and the disease must have a detectable preclinical phase when treatment has a better outcome than after clinical presentation

  • 3.

    Target population for screening must be defined

Test or intervention principles

  • 4.

    Screening test performance characteristics: accurate, safe, acceptable, affordable

  • 5.

    Interpretation of screening test results: clear thresholds

  • 6.

    Postscreening test options: agreed-upon course of action for follow-up and treatment to improve outcomes. Effect of false-positive and false-negative test results should be minimal

Program or system principles

  • 7.

    Screening program infrastructure: adequate existing resources or plan to develop sufficient resources for all eligible participants

  • 8.

    Screening program coordination and integration: coordinated and integrated into broader health system

  • 9.

    Screening program acceptability and ethics: all components should be ethically acceptable to participants and professionals, and there should be methods to ensure informed choice

  • 10.

    Screening program benefits and harms: benefits (eg, increased function and quality of life, decreased mortality) should be greater than harms (eg, overdiagnosis, overtreatment)

  • 11.

    Economic evaluation of screening program: economic evaluation should assess full costs of operating the screening program compared with opportunity costs of allocating resources to alternatives

  • 12.

    Screening program quality and performance management: screening program should have clear goals and objectives and monitor for quality control and performance targets

Adapted with permission from Joule Inc.4 Copyright © 2018.

These criteria form a chain of logic, not a checklist. Thus, it is insufficient to identify that most of the criteria are met or mostly met. The whole chain is only as strong as its weakest link. If even one of the links (or criteria) is weak, so is the whole chain, and the program fails. When evaluated against these criteria, most newly suggested screening tests are missing evidence about some of the critical issues, and so they clearly fail. That should spur more research on the screening tests to fill the evidence gaps. Ideally, because screening is so complex and the odds of a new test being successful are so low, new programs should generally be evaluated by a randomized controlled trial.5 For example, ovarian cancer screening with transvaginal ultrasound and cancer antigen 125 seemed promising, but a trial showed no mortality benefit, with substantial harms from interventions.6 However, trials take an enormous amount of resources and a long time, so when a new or better test is substituted for one that had previously been studied, nontrial evidence may be used to extrapolate from what is already shown by a trial; for example, fecal immunochemical tests have better test characteristics than the older fecal occult blood tests.7 Rarely, other proposed screening tests (eg, for congenital diseases) may be so clearly effective that programs can be established based on the logic chain, but even those need evaluation to confirm their value in real-world practice.

graphic file with name 815fig1.jpg

Many would argue that principle 10 (Box 1), assessing benefits against harms,4 is the most important; unfortunately, the harms are seldom measured or discussed in medical literature.8 Harms usually appear soon after testing, while the benefits are uncommon and often come years later.9 Childhood neuroblastoma screening is an example that apparently fulfilled the criteria, but there was no trial. When evaluated in practice, screening caused considerable harm and gave no benefit.5 An often-forgotten harm of screening is labeling—finding an abnormality that may be of little importance and likely does not need treatment. This may happen with biochemistry, imaging, or genetic testing. Once the person knows about it, they become a “patient” and may worry about it with no prospect of any value from the knowledge. A costly consequence of labeling is that the price of insurance (life, travel, or mortgage) may rise or be unobtainable.

The history of screening shows a long series of failures with occasional successes, such as cardiovascular or metabolic disease screening, although depending on whom you screen the benefits may be marginal.10 The balance of harms against potential benefits is always a concern. Too often, tests that work moderately well for diagnosis of people with symptoms or a population at high risk are then used for screening in a lower-risk setting. The chance of benefit is much lower, while the chance of harm remains.5 For example, in Australia, a 43-year-old asymptomatic woman at low risk of cardiovascular disease was offered cardiac computed tomography (CT) screening by her well-meaning employer, then died of anaphylactic shock in reaction to the contrast media.11 If she had been informed about the risk (albeit low) compared with the minimal chance of benefit, would she have accepted the test?

Even when screening works, too often we apply these tests to the wrong population or use them too frequently on people at low risk. This produces an opportunity cost, as time and resources are diverted away from those with illness who are more likely to benefit from medical care.12 There is often a drive to expand screening to ensure every possible case is caught, which raises the danger of making “perfect” the enemy of “good.” For example, some women in Canada are still getting annual cervical screening at young ages, although many people at higher risk are screened irregularly or not at all. Some people with severe comorbidities and low life expectancy are still screened for breast or colon cancer, when their chance of benefit is small because the lag time to benefit is at least 5 years and often longer, although the potential for harm remains.13 Furthermore, too many people make the logical leap that if screening works it will eliminate mortality from the disease, when it may reduce it only by a fraction (eg, estimated as 15% for breast cancer for women between 50 and 74 years of age14). This means an absolute mortality reduction of less than 1 per 1000 women over 7 years of screening.15

The quality of every part of the screening process must be better than usual diagnostic practice because, if there is substantial error, the small benefits of screening will be lost among the harms caused. This has been addressed in many situations, such as the emphasis on best practices for measuring blood pressure,16 on appropriate techniques for neonatal screening,17 on standardization of pathology in cervical screening,18 and on mammography quality—in both capturing the image19 and reading it.20,21 For proposed new screening tests, such measurement issues should be worked through before they are widely used.5 Unfortunately, some screening programs are instituted for political reasons and then run by those who believe in the mission and are unwilling to undertake rigorous ongoing quality assessment and improvement programs.

Population screening versus case finding

Some make a categorical distinction between population screening and case finding (ie, focusing on at-risk patients), but these concepts are not binary—they are on a continuum. Nearly all screening should be reserved for selected populations at higher risk to improve the balance of potential benefit over potential harm. For example, tuberculosis screening is no longer appropriate for most Canadians, but those who have lived for a substantial period in Nunavut are potentially at high risk.22 Cervical cancer screening should be started only in women who have been sexually active for some years, usually sometime in their mid-20s or even 30s.23 Women who have emigrated from sub-Saharan Africa and some Caribbean, Central American, and South American countries are at higher risk,24 so we should make special efforts to ensure they are screened.

Choosing who should be screened sometimes leads to conflict between different recommendations. For example, hepatitis C virus is largely transmitted by intravenous blood transfer. As such, the Canadian Task Force on Preventive Health Care recommends it be screened for in those at high risk,25 whereas members of the Canadian Association for the Study of the Liver, who see more end-stage cases and are anxious to prevent them all, suggest screening the birth cohort of individuals born between 1945 and 1975.26 This would find a few more cases, but the cost in effort and health care resources is high. Thus, clear and justifiable delineation of who should enter the screening algorithm is critical.27

Screening for renal disease could be done universally in developed countries. However, false-positive results occur often, while preventable renal disease largely occurs among those with other known conditions—hypertension, diabetes, and chronically infected or obstructed renal tracts. Therefore, it makes more sense to focus regular renal function testing on case finding among people who are at higher risk.28

Box 2 provides a list of key questions to ask about a purported new screening test.

Box 2. Key questions to ask about a purported new screening test.

  • What is the evidence? Were the data on screening (ie, from a population without symptoms) or on investigating a symptom (ie, in a higher-prevalence population with symptomatic disease)?

  • What are the potential benefits of the screening and subsequent treatment? What proportion of the population receives these?

  • What are the potential harms, both direct and indirect? What are the frequency, magnitude, and severity of potential harms?

  • How were the screening tests studied? Can the results of those studies be extrapolated to your patient’s situation (external validity)?

Case resolutions

We performed a literature search on each topic for recent review articles that would provide evidence about the case patients’ suitability for screening. Table 1 evaluates each of the situations in the examples against the screening criteria.20,21,29-36

Table 1.

Consolidated principles for population-based screening applied to 4 examples

PRINCIPLE ANAL HPV SCREENING CANCER BLOOD TEST WHOLE-BODY MRI DEPRESSION SCREENING TEST
For diseases or conditions
1. Epidemiology of the disease must be understood and it must be an important health problem Anal cancer is a rare disease: Canada has 670 new cases/y and 181 deaths.29 Anal cancer is partly understood; is HPV related; and has higher prevalence in those having anal intercourse and in PLWHIV Cancers overall are common past middle age, but outside of breast, colon, prostate, and lung cancers, new cases of each type are rare. Each has different characteristics and must be evaluated separately MRI will find a variety of diseases, some of which may be serious. It also finds many variations that are best left alone (“incidentalomas”). Even if some incidentalomas are cancers, we do not know if treating these actually changes outcomes Moderate prevalence, but new serious depression is uncommon
2. Natural history should be understood and a preclinical phase must be detectable Like other HPV infections, some progress, others may resolve. Treatment of PLWHIV with HSIL (mean age 51 y) reduces 4-y progression to cancer from 0.95% to 0.40%30 For some cancers this is true. Maybe this test will prove able to detect other early-stage cancers For some detectable conditions, yes, but many incidentalomas are better not investigated Natural history of most depression is for resolution within 3-6 mo.31 Much of what is found by screening is unhappiness with a life crisis. Many cases are self-resolving, with “talk therapy”
3. Target population for screening must be defined PLWHIV and other immunosuppressed populations are at higher risk Insufficient information to judge at what age to start and stop, and frequency for testing Insufficient information to judge at what age to start and stop, and frequency for testing Groups at high risk are identifiable (eg, postpartum women who are immigrants)
For tests or interventions
4. Screening test performance characteristics: accurate, safe, acceptable, affordable Not yet clear. Specificity low: acceptable to some members of at-risk populations, but follow-up rates low32 No information yet. Likely acceptable Too much information is possible: minute variations are found and lead to further investigations. Interreader variations are likely high PHQ-9 often used in clinical situations. Safe and acceptable. Sensitivity and specificity of 85%, but when prevalence is low (eg, 10%) predictive value is low (eg, 40%). Unclear whether effective33
5. Interpretation of screening test results: clear thresholds Not yet clear34 Not yet fully described. Will need careful calibration Not clear Threshold score of 10 often used for PHQ-9, but then needs clinical assessment to make diagnosis
6. Postscreening test options: agreed-upon course of action for follow-up and treatment to improve outcomes High-resolution anoscopy and treatment of HSILs30 Some may follow standard protocols, others not yet worked through Variable, depending on anatomic location and features If clinical assessment confirms the diagnosis, treat with counseling, antidepressant medication, or both
For programs or systems
7. Screening program infrastructure: adequate existing resources or plan to develop resources sufficient for all Anoscopy assessment and treatment clinics not widely available; pathologists not widely trained in reading these tests No for most of the proposed screening target diseases. For some, infrastructure is available but it is unclear how to use it purposefully with earlier detection Not enough access for patients with clear indications requiring MRI investigation. Private MRI available in large centres Family physician treatment, referral to mental health staff or psychologist: not readily available
8. Screening program coordination and integration: coordinated and integrated into broader health system No No Done privately, ≥$2000. Any findings must be dealt with in standard facilities Usually
9. Screening program acceptability and ethics: all components should be ethically acceptable to participants and professionals, and methods should exist to ensure informed choice No Not worked through Without understanding possible risks, cannot engage in informed consent Most patients want dialogue about how they feel, not checklists like PHQ-9
10. Screening program benefits and harms: benefits should be greater than harms Unclear Not yet demonstrated Few benefits, harms likely greater35 Not yet demonstrated36
11. Economic evaluation of screening program: economic evaluation should assess full costs of operating screening program compared with opportunity costs of allocating resources to alternatives Unknown currently. May be useful among patients at high risk age >40 y Unknown Profitable for private facilities but effort and cost needed to follow up abnormalities most likely shift the balance toward more opportunity costs High opportunity cost36
12. Screening program quality and performance management: screening program should have clear goals and objectives and monitor for quality control and performance targets Not yet available Not yet possible Radiology quality improvement is variable in Canada20,21 Not applicable

HPV—human papillomavirus, HSIL—high-grade squamous intraepithelial lesion, MRI—magnetic resonance imaging, PHQ-9—Patient Health Questionnaire–9, PLWHIV—people living with HIV.

Case 1. In this assessment, anal human papillomavirus testing will require more evidence before we can decide whether to use it. While anal cancer is rare overall, it is more frequent among specific definable groups, particularly HIV-positive men who have sex with men. The incidence rises with age and other factors, especially immune deficiency.37 A trial of treatment for high-grade lesions in people living with HIV reduced the number of invasive cancer cases from about 10 per 1000 to 4 per 1000 over about 2 years.30 It is not yet clear whether or how much this reduces morbidity and mortality. Before we accept this test in practice, populations at high risk need to be better defined, possibly as people living with HIV (men and women) older than 40 years of age, or those with other immunosuppression for more than 20 years34; the best test (whether human papillomavirus or cytology) needs to be determined; pathologists need greater agreement on criteria for dysplastic lesions34; and we need data on best treatments (efficacy and tolerability) for different stages of disease.

Case 2. The same caution applies to the new cell-free circulating DNA cancer blood test.2 This is actually a package of multiple tests targeting more than 50 cancers. Some cancers have a preclinical phase when the treatment outcomes might be better than treatment at a later stage, but others do not, so their discovery will not change outcomes. The sensitivity of the test is low for most early cancers, so few will be diagnosed at this stage. The chance of false-positive results is unclear. More research is needed to elucidate how to follow up positive results without embarking on a diagnostic odyssey. Testing for circulating DNA is an exciting new idea that may prove valuable for some cancers. However, even the authors who developed the test point out that it is not yet ready for practice. Research is ongoing and the UK National Health Service is conducting a randomized controlled trial. If this trial shows evidence of the test’s usefulness, physicians will need education to understand the test’s complexities and to properly interpret the results for our patients.38

Case 3. Full-body MRI has not been properly assessed as a screening test. While full-body MRI may reveal unexpected important findings, it has a high rate of incidental findings that would be better left alone. A systematic review of such “incidentaloma” findings on imaging studies showed that up to 48% of chest CT (including cardiac), 22% of brain CT, and 38% of CT colonography found something unexpected, and these rates increase with age.39 Some lesions were new cancers, especially in breasts, kidneys, and thyroid glands, where much overdiagnosis is known to occur. Others were non-malignant; for example, many pituitary microadenomas. While 12% of people with adrenal incidentalomas had subclinical Cushing syndrome, less than 1% actually had Cushing syndrome. While that review focused on CT screening, MRI is likely to show similar results. Thus, a whole-body scan is much more likely to find an incidentaloma than something for which treatment will be beneficial.35 These findings lead to further worry and investigation with biopsies, subsequent surgery, or other treatment, with a very uncertain and minimal probability of any benefit. On balance, this test is much more likely harmful than beneficial for asymptomatic people.

Case 4. Universal depression screening has not yet been proven valuable, even among populations at high-risk such as postpartum women.36,40 The test often used, the Patient Health Questionnaire–9, may have some value when people are symptomatic but we need to take into account that measurement variability is high.41 In a situation of moderate prior probabilty, as in this case, the predictive value of a positive test is low. Thus, the test alone cannot be used for diagnosis. If used, a subsequent clinical interview is critical. With this patient, it was used as a case-finding test and produced a false-positive result. That was useful in a way, since careful history taking and consultation with the psychologist to whom he had also been referred suggested that he was failing his courses because of difficulty concentrating. His depressive symptoms disappeared when he was diagnosed with attention deficit hyperactivity disorder and commenced treatment. He has since done well.

Conclusion

The history of medicine frequently demonstrates unwarranted enthusiasm for screening tests followed by disheartening findings of minimal value, while causing harms. We need to be particularly aware when there is potential for substantial financial benefit to the proponents. Therefore, any new purported screening test should be very cautiously assessed. None of the 4 tests described in this article are ready for widespread use. New tests should be performed for screening only in the context of a research program. Even well-known measurements, such as Patient Health Questionnaire–9, should not be used thoughtlessly for screening, since proper diagnostic assessment is required for accurate diagnosis. Rather than undertaking unproven activities, physician time is better spent improving the care of people who present with symptoms.12

Acknowledgment

We thank Bruce Perreault for photographing the chain link.

Key points

  • ▸ New screening tests are appealing, but before use they must be assessed to ensure benefits outweigh harms for the appropriate target group.

  • ▸ Criteria for screening tests are rigorous, requiring a strong chain of evidence and usually needing evidence of benefit in a randomized controlled trial before adoption.

  • ▸ The authors explain how to assess proposed screening and case-finding tests to determine which are truly beneficial.

  • ▸ Ordering new and unproven tests is poor use of the health care system’s limited clinical time, energy, and resources.

Footnotes

Competing interests

None declared

This article is eligible for Mainpro+ certified Self-Learning credits. To earn credits, go to https://www.cfp.ca and click on the Mainpro+ link.

La traduction en français de cet article se trouve à https://www.cfp.ca dans la table des matières du numéro de novembre 2022 à la page e310.

References


Articles from Canadian Family Physician are provided here courtesy of College of Family Physicians of Canada

RESOURCES