Common misconceptions about validation studies

Matthew P Fox; Timothy L Lash; Lisa M Bodnar

doi:10.1093/ije/dyaa090

. 2020 Jul 2;49(4):1392–1396. doi: 10.1093/ije/dyaa090

Common misconceptions about validation studies

Matthew P Fox ^1,^2,^✉, Timothy L Lash ³, Lisa M Bodnar ⁴

PMCID: PMC7750925 PMID: 32617564

Abstract

Information bias is common in epidemiology and can substantially diminish the validity of study results. Validation studies, in which an investigator compares the accuracy of a measure with a gold standard measure, are an important way to understand and mitigate this bias. More attention is being paid to the importance of validation studies in recent years, yet they remain rare in epidemiologic research and, in our experience, they remain poorly understood. Many epidemiologists have not had any experience with validations studies, either in the classroom or in their work. We present an example of misclassification of a dichotomous exposure to elucidate some important misunderstandings about how to conduct validation studies to generate valid information. We demonstrate that careful attention to the design of validation studies is central to determining how the bias parameters (e.g. sensitivity and specificity or positive and negative predictive values) can be used in quantitative bias analyses to appropriately correct for misclassification. Whether sampling is done based on the true gold standard measure, the misclassified measure or at random will determine which parameters are valid and the precision of those estimates. Whether or not the validation is done stratified by other key variables (e.g. by the exposure) will also determine the validity of those estimates. We also present sample questions that can be used to teach these concepts. Increasing the presence of validation studies in the classroom could have a positive impact on their use and improve the validity of estimates of effect in epidemiologic research.

Keywords: Information bias, misclassification, validation studies, sensitivity, specificity

Key Messages

Information bias is common in epidemiology and can greatly impact the validity of study results, yet validation studies, which could help us to understand and mitigate this bias, are rarely done.
It has been our experience that students of epidemiology have limited training on how to conduct a validation study and as such, many misconceptions about validation studies remain.
Proper understanding of how to implement a validation study could have a strong impact on the validity of study results.

Introduction

Information bias is common in epidemiology¹ and can substantially diminish the validity of study results.^2–5 Validation studies, in which an investigator compares the accuracy of a measure with a gold standard measure,⁶ are an important way to understand and mitigate this bias. Validation data can be combined with quantitative bias analysis methods^7–10 to compute bias-adjusted estimates that account for the systematic error and yield uncertainty intervals that represent total uncertainty better than conventional confidence intervals. Further, their estimates of values for bias parameters, such as the sensitivity and specificity, may be transported to other studies that use the same measurement instrument, which improves the overall state of the science in the topic area.¹¹ As with other epidemiologic studies, validation studies must be carefully designed and implemented to be useful.

More attention is being paid to the importance of validation studies, and several journals¹²^,¹³ have even created submission categories for validation studies, yet they remain rare in epidemiologic research. To address this gap, we pose a series of questions that could be used on an exam about validation studies or in teaching validation concepts, and use the explanation of the answers to dispel misconceptions as well as to provide teaching examples that can be used to prevent these misconceptions from taking hold in the first place.

Validation study designs

We will work with an example of misclassification of a dichotomous exposure, though the principles we discuss apply equally to misclassification of dichotomous outcomes and covariates. Suppose we conducted an observational study of the relationship between self-reported human papillomavirus (HPV) vaccination and cancer precursor conditions among 7100 respondents, of whom 2100 were classified as exposed and 5000 were classified as unexposed (Table 1, Panel A). The self-reported history of HPV vaccination was likely misclassified compared with a gold standard, such as comprehensive medical record review. The questions pertain to whether we would need to conduct an internal validation study, or whether we can instead apply validation study estimates from the existing literature.

Table 1.

Full study population (Panel A) and three possible validation studies of 200 people and the bias parameters that would be estimated from each of the three validation study designs (note that whereas all four bias parameters are presented, they are not all valid; those that are not valid estimates have a line through them). Note: E is HPV vaccination, and respondents sampled by design are shown in bold italics

Panel A. Full study population
		Truth
		E+	E−	Total
Classified	E+	2000	400	2400	PPV = 0.83
	E−	100	4600	4700	NPV = 0.98
	Total	2100	5000	7100
		Se = 0.95	Sp = 0.92	Prevalence = 0.30
Panel B. Validation study design 1: 100 classified as exposed, 100 classified as unexposed
		Truth
		E+	E−	Total
Classified	E+	83	17	*100*	PPV = 0.83
	E−	2	98	*100*	NPV = 0.98
	Total	85	115	200
		Se = 0.98	Sp = 0.85	Prevalence = 0.43
Panel C. Validation study design 2: 100 truly as exposed, 100 truly as unexposed
		Truth
		E+	E-	Total
Classified	E+	95	8	103	PPV = 0.92
	E−	5	92	97	NPV = 0.95
	Total	*100*	*100*	200
		Se = 0.95	Sp = 0.92	Prevalence = 0.50
Panel D. Validation study design 3: 200 subjects chosen randomly
		Truth
		E+	E−	Total
Classified	E+	56	11	68	PPV = 0.83
	E−	3	130	132	NPV = 0.98
	Total	59	141	*200*
		Se = 0.95	Sp = 0.92	Prevalence = 0.30

Open in a new tab

Question 1: True/False: It is usually valid to apply estimates of positive and negative predictive value found from external sources (e.g. the literature) to another study population to conduct a bias analysis that will adjust study estimates for misclassification.

Question 2. True/False: It is usually valid to apply estimates of sensitivity and specificity found from external sources (e.g. the literature) to another study population to conduct a bias analysis that will adjust study estimates for misclassification.

We consider the first statement to be false and the second to be true, though with caution. Predictive values depend on sensitivity and specificity as well as the true prevalence (p) of the variable measured. Formulas relating Sensitivity (Se), Specificity (Sp) and prevalence to positive predictive value (PPV) and negative predictive value (NPV) are as follows.

PPV = [S e * p] / [S e * p + (1 - S p) * (1 - p)]

NPV = [S p * (1 - p)] / [(1 - S e) * p + S p * (1 - p)]

Whereas Se and Sp can vary between populations, PPV and NPV can vary more strongly because the prevalence of the variable is likely to change between populations. Furthermore, if the exposure is associated with the outcome, then the prevalence of exposure would differ within outcome groups, and estimates of PPV and NPV from an external source would have to be available within the outcome categories. In our example, transportability of PPV and NPV from an external source to our study population would require that the prevalence of HPV vaccination is the same in the external population as in our population, that the association between HPV vaccination and cancer precursor conditions is the same in the external population as in our population, and that estimates of PPV and NPV are available within categories of cancer precursor conditions defined in the same way. These conditions are difficult to achieve, illustrating why PPV and NPV are therefore ordinarily considered to be less transportable between populations than Se and Sp, which do not require the same conditions to be transportable.

Without sensitivity and specificity data from the literature to apply to our population, there are three main approaches for how to sample participants into an internal validation study.⁶ However, only certain parameters can be estimated from each design.

Parameters that can be estimated from each validation study design

Question 3. In validation design 1, we sample respondents conditional on their imperfectly classified measure (e.g. sample 100 respondents who self-reported as vaccinated and 100 respondents who self-reported as unvaccinated) (Table 1, Panel B). What parameters can be validly calculated? Check all that apply.

i. Se of exposure classification.
ii. Sp of exposure classification.
iii. PPV of exposure classification.
iv. NPV of exposure classification.

The sample is taken within strata of those classified as exposed and unexposed. Sampling based on classified status changes the marginal exposure prevalence in the validation sample (43%) vs the study population (30%). The estimated predictive values within strata of classified exposure status will be valid because they do not rely on the marginal prevalence, but estimates of Se and Sp will be biased due to the change in exposure prevalence on the margins. With design 1, investigators can validly estimate predictive values, but not Se and Sp (at least not directly). Accordingly, the results generated are unlikely to be transportable to another study population, because predictive values depend on prevalence, as discussed above. The correct answer to question 3 should be PPV and NPV only [i.e. (iii) and (iv)].

In validation design 2, we sample respondents conditional on their gold standard measure (e.g. sample 100 respondents who have evidence of vaccination in the medical record and 100 respondents who do not) (Table 1, Panel C). This approach allows for the valid calculation of Se and Sp, but not predictive values, again because the sampling changes the marginal prevalence and therefore biases the estimates of the predictive values. Because of this sampling approach, other investigators may be able to apply the generated estimates of Se and Sp to their study. However, this design is seldom feasible because, except in some unusual cases, investigators would rarely have the true gold standard measure on everyone in the study to sample. Thus design 2 is unlikely to be implemented in practice. It is sometimes possible that a subset of the study population has both the gold standard measure and the misclassified measure. For example, a subset of the 7100 respondents might receive healthcare from a healthcare system that maintains a high-quality vaccine registry, and this subset could be included in a validation study that estimates sensitivity and specificity. This design does not truly sample conditional on the gold standard, however; it is a convenience sampling design.

In design 3, we take a random sample of the study population (e.g. sample 200 respondents independent of their true or classified vaccination status). Design 3 (Table 1, Panel D) samples independent of classification or truth, so Se, Sp, PPV and NPV can be estimated and the estimates of Se and Sp can be useful to the investigators and other stakeholders using the same misclassified variable. However, this design allows no control over the expected sample size in each cell, which is determined by the distribution of the classified and true variables in the population. As a result, the bias parameters may be imprecisely estimated¹⁴.

Importance of stratification in validation studies

In our study of HPV vaccine and cancer precursor conditions, should we stratify estimates of exposure classification by a second variable, in this case, the outcome?

Question 4. In design 1, we validate self-reported vaccination status using a gold standard in a random sample of those classified as exposed and a random sample of those classified as unexposed. What assumption did we make about exposure misclassification with respect to the outcome? Check all that apply. We assume that the exposure misclassification is as follows.

Non-differential with respect to the outcome.
Differential with respect to the outcome.
Independent with respect to outcome classification.
Dependent with respect to outcome classification.

By ‘non-differential’ we mean Se and Sp of exposure classification do not differ by outcome status. By ‘differential’ we mean either Se or Sp of exposure classification differs by outcome status. By independent misclassification we mean that the rate of misclassification of the exposure does not depend on the rate of misclassification of the outcome (i.e. the errors are not correlated).

As noted above, design 1 only allows us to calculate predictive values after sampling conditional on the self-reported vaccination status. As explained above, if vaccination exposure is associated with cancer precursor conditions, then the prevalence of vaccination will be different in cases and non-cases. When estimating predictive values, therefore, it is imperative to stratify the estimates within outcome groups. Failure to do so implicitly assumes differential misclassification, although possibly in error. The lesson here is simple, when conducting a validation study, stratify results by other key variables (e.g. with exposure misclassification stratify by outcome, and with outcome misclassification, stratify by the exposure). This of course, requires more study resources, but it is essential to generating valid estimates of classification parameters to inform bias-adjusted estimates of exposure effects.

Biases in validation studies

Question 5. Validation studies can suffer from which of the following sources of error? Check all that apply:

random error
confounding
selection bias
measurement error

Random error (as the full study population is not included), selection bias (if selection is related to both inclusion in the study and classification accuracy) and measurement error (if the gold standard is not a true gold standard¹⁵) can all occur in validation studies. Confounding is more complicated. Confounding occurs when two variables (say exposure and outcome) have a common cause, and confounding results when the effects of the third variable mix with the effect of the exposure on the outcome. In validation studies, we have a single variable measured two different ways. It is theoretically possible to represent a validation study in a way that would present a common cause of both measures, but we do not see this as confounding in the typical sense that it is used in epidemiology. This is not to say that there are not factors that predict the rates of misclassification,¹⁶ only that we do not consider this confounding in the usual structural sense of the bias.

Conclusions

We recommend adding design and analysis of validation studies to the curriculum of graduate epidemiology programmes. This addition is recommended at the master’s level and is essential for anyone training at the doctoral level. Further, hands-on training in validation studies as part of graduate epidemiology programmes would allow students to see real-world challenges that occur in conducting validation studies. Our survey results suggest that few epidemiologists have had formal training in validation study design and analysis, lack confidence in their ability to conduct such studies, and make predictable errors. Asking doctoral students to conduct a validation study as part of their thesis would have the added benefit of ensuring students had some experience with primary data collection, which not all doctoral programmes require.

There is more work to be done to educate epidemiologists on the need for high-quality validation studies and the optimal design of these studies. Given the ubiquity of misclassification within epidemiologic research and the known implications it can have for study results, it is essential that we increase competence on this topic and encourage students and training programmes to consider validation studies as part of original research.

Funding

This work was supported in part by the US National Library of Medicine [R01LM013049] awarded to T.L.L.

Conflict of interest

M.P.F. and T.L.L. have both published a textbook on bias analysis for which they receive royalties. T.L.L. provides epidemiologic methods consulting services to the Amgen Methods Council, including services on the topic of quantitative bias analysis. He receives <$5000 per year in consulting fees and travel support.

Contributor Information

Matthew P Fox, Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA; Department of Global Health, Boston University School of Public Health, Boston, MA, USA.

Timothy L Lash, Department of Epidemiology, Rollins School of Public Health, Emory University, Boston, MA, USA.

Lisa M Bodnar, Department of Epidemiology, University of Pittsburgh School of Public Health, Pittsburgh, PA, USA.

References

1. Jurek AM, Greenland S, Maldonado G, Church TR. Proper interpretation of non-differential misclassification effects: expectations vs observations. Int J Epidemiol 2005;34:680–87. [DOI] [PubMed] [Google Scholar]
2. Marshall JR, Hastrup JL. Mismeasurement and the resonance of strong confounders: uncorrelated errors. Am J Epidemiol 1996;143:1069–078. [DOI] [PubMed] [Google Scholar]
3. Kim MY, Goldberg JD. The effects of outcome misclassification and measurement error on the design and analysis of therapeutic equivalence trials. Stat Med 2001. ;20:2065–078. [DOI] [PubMed] [Google Scholar]
4. Jurek AM, Greenland S, Maldonado G. How far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null? Int J Epidemiol 2008;37:382–85. [DOI] [PubMed] [Google Scholar]
5. Brenner H, Savitz DA. The effects of sensitivity and specificity of case selection on validity, sample size, precision, and power in hospital-based case-control studies. Am J Epidemiol 1990;132:81–92. [DOI] [PubMed] [Google Scholar]
6. Marshall RJ. Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J Clin Epidemiol 1990;43:941–47. [DOI] [PubMed] [Google Scholar]
7. Lash TL, Fox M, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. New York: Springer, 2009. [Google Scholar]
8. Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol 1996;25:1107–116. [PubMed] [Google Scholar]
9. Greenland S, Lash T. Bias Analysis In: Rothman K, Greenland S, Lash T (eds). Modern Epidemiology. 3rd edn Philidelphia: Lippincott Williams and Wilkins, 2008:345–80. [Google Scholar]
10. Greenland S. Multiple bias modelling for analysis of observational data. J R Stat Soc Ser A 2005;168:1–25. [Google Scholar]
11. Ioannidis J. Why most published research findings are false. PLoS Med 2005;2:e124. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Ehrenstein V, Petersen I, Smeeth L, et al. Helping everyone do better: a call for validation studies of routinely recorded health data. Clin Epidemiol 2016;8:49–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Lash TL, Olshan AF. EPIDEMIOLOGY announces the ‘validation study’; submission category. Epidemiology 2016;27:613–14. [DOI] [PubMed] [Google Scholar]
14. Holcroft CA, Spiegelman D. Design of validation studies for estimating the odds ratio of exposure- disease relationships when exposure is misclassified. Biometrics 1999;55:1193–201. [DOI] [PubMed] [Google Scholar]
15. Wacholder S, Armstrong B, Hartge P. Validation studies using an alloyed gold standard. Am J Epidemiol 1993;137:1251. [DOI] [PubMed] [Google Scholar]
16. Banack HR, Stokes A, Fox MP, et al. Stratified probabilistic bias analysis for body mass index–related exposure misclassification in postmenopausal women. Epidemiology 2018;29:604–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[dyaa090-B1] 1. Jurek AM, Greenland S, Maldonado G, Church TR. Proper interpretation of non-differential misclassification effects: expectations vs observations. Int J Epidemiol 2005;34:680–87. [DOI] [PubMed] [Google Scholar]

[dyaa090-B2] 2. Marshall JR, Hastrup JL. Mismeasurement and the resonance of strong confounders: uncorrelated errors. Am J Epidemiol 1996;143:1069–078. [DOI] [PubMed] [Google Scholar]

[dyaa090-B3] 3. Kim MY, Goldberg JD. The effects of outcome misclassification and measurement error on the design and analysis of therapeutic equivalence trials. Stat Med 2001. ;20:2065–078. [DOI] [PubMed] [Google Scholar]

[dyaa090-B4] 4. Jurek AM, Greenland S, Maldonado G. How far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null? Int J Epidemiol 2008;37:382–85. [DOI] [PubMed] [Google Scholar]

[dyaa090-B5] 5. Brenner H, Savitz DA. The effects of sensitivity and specificity of case selection on validity, sample size, precision, and power in hospital-based case-control studies. Am J Epidemiol 1990;132:81–92. [DOI] [PubMed] [Google Scholar]

[dyaa090-B6] 6. Marshall RJ. Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J Clin Epidemiol 1990;43:941–47. [DOI] [PubMed] [Google Scholar]

[dyaa090-B7] 7. Lash TL, Fox M, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. New York: Springer, 2009. [Google Scholar]

[dyaa090-B8] 8. Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol 1996;25:1107–116. [PubMed] [Google Scholar]

[dyaa090-B9] 9. Greenland S, Lash T. Bias Analysis In: Rothman K, Greenland S, Lash T (eds). Modern Epidemiology. 3rd edn Philidelphia: Lippincott Williams and Wilkins, 2008:345–80. [Google Scholar]

[dyaa090-B10] 10. Greenland S. Multiple bias modelling for analysis of observational data. J R Stat Soc Ser A 2005;168:1–25. [Google Scholar]

[dyaa090-B11] 11. Ioannidis J. Why most published research findings are false. PLoS Med 2005;2:e124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[dyaa090-B12] 12. Ehrenstein V, Petersen I, Smeeth L, et al. Helping everyone do better: a call for validation studies of routinely recorded health data. Clin Epidemiol 2016;8:49–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[dyaa090-B13] 13. Lash TL, Olshan AF. EPIDEMIOLOGY announces the ‘validation study’; submission category. Epidemiology 2016;27:613–14. [DOI] [PubMed] [Google Scholar]

[dyaa090-B14] 14. Holcroft CA, Spiegelman D. Design of validation studies for estimating the odds ratio of exposure- disease relationships when exposure is misclassified. Biometrics 1999;55:1193–201. [DOI] [PubMed] [Google Scholar]

[dyaa090-B15] 15. Wacholder S, Armstrong B, Hartge P. Validation studies using an alloyed gold standard. Am J Epidemiol 1993;137:1251. [DOI] [PubMed] [Google Scholar]

[dyaa090-B16] 16. Banack HR, Stokes A, Fox MP, et al. Stratified probabilistic bias analysis for body mass index–related exposure misclassification in postmenopausal women. Epidemiology 2018;29:604–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Common misconceptions about validation studies

Matthew P Fox

Timothy L Lash

Lisa M Bodnar

Abstract

Key Messages

Introduction

Validation study designs

Table 1.

Parameters that can be estimated from each validation study design

Importance of stratification in validation studies

Biases in validation studies

Conclusions

Funding

Conflict of interest

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Common misconceptions about validation studies

Matthew P Fox

Timothy L Lash

Lisa M Bodnar

Abstract

Key Messages

Introduction

Validation study designs

Table 1.

Parameters that can be estimated from each validation study design

Importance of stratification in validation studies

Biases in validation studies

Conclusions

Funding

Conflict of interest

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases