Skip to main content
Epidemiologic Reviews logoLink to Epidemiologic Reviews
. 2021 Oct 19;43(1):94–105. doi: 10.1093/epirev/mxab011

The Measurement Error Elephant in the Room: Challenges and Solutions to Measurement Error in Epidemiology

Gabriel K Innes, Fiona Bhondoekhan, Bryan Lau, Alden L Gross, Derek K Ng, Alison G Abraham
PMCID: PMC9005058  PMID: 34664648

Abstract

Measurement error, although ubiquitous, is uncommonly acknowledged and rarely assessed or corrected in epidemiologic studies. This review offers a straightforward guide to common problems caused by measurement error in research studies and a review of several accessible bias-correction methods for epidemiologists and data analysts. Although most correction methods require criterion validation including a gold standard, there are also ways to evaluate the impact of measurement error and potentially correct for it without such data. Technical difficulty ranges from simple algebra to more complex algorithms that require expertise, fine tuning, and computational power. However, at all skill levels, software packages and methods are available and can be used to understand the threat to inferences that arises from imperfect measurements.

Keywords: bias correction, epidemiologic methods, epidemiologic review, measurement error, sensitivity analyses

Abbreviation

CI

confidence interval

MIME

multiple imputation for measurement error

SIMEX

simulation-extrapolation

RC

regression calibration

INTRODUCTION

Epidemiology has few universal truths; one may be that all measurements have error. As a result, likely all epidemiologic studies suffer from some degree of bias, loss of power, coarsening of relationships, or all 3, as a result of measurement error (1). Although epidemiologists are often diligent in accounting for confounding bias and at least consider the possibility of selection bias, all too frequently the error of our measurements passes with little to no scrutiny (24). Our aim for this review is not to provide a comprehensive summary of the vast literature on measurement error, but rather to provide a gentle introduction to the theory with a focus on classical measurement error and a practical guide to implementing a few more accessible measurement error–correction tools. We focus on observational studies primarily, though it is important to note that no studies are immune to measurement error.

WHY DO WE OFTEN FAIL TO DO SOMETHING ABOUT MEASUREMENT ERROR?

A primary barrier to addressing measurement error may lie in our inability to recognize that we have a general problem with measurement. Much of the literature on construct validity (5, 6) and measurement theory was written by sociologists and psychometricians, who wrestled with highly abstract constructs like “intelligence” or “quality of life” that were more obviously latent or not directly measurable.

As epidemiologists, we recognize the challenge of measuring nondirectly observable characteristics, but we often fail to acknowledge that other common measures, such as blood pressure or the level of a biomarker, are equally prone to error in measurement. Such measures seem tangible and concrete, but as Sechrest (5) pointed out, constructs like blood pressure can be just as poorly defined as constructs like intelligence. Casual blood pressure measurement in the office or clinic with a sphygmomanometer—the standard in research studies—has long been recognized as error prone (7). Biomarker measurements are subject to myriad potential sources of variability, including errors in specimen collection, processing, and storage; within- and between-batch laboratory error; differences in the sample matrix that affect the assay signal; variability due to different assays or different laboratories if reference standards do not exist; and biological variability (8, 9). These errors can add up to a high degree of imprecision and bias. In fact, we should consider most measures to be error prone and assume that all measurements require careful consideration of possible sources and magnitudes of error. This is analogous to our assumption that all associations are likely confounded, potentially by things we did not capture in our data. As with unmeasured confounding, the extent to which unquantified measurement errors influence our inferences is difficult to predict.

In discussions of measurement error, it is important to differentiate reliability from validity. Validity pertains to whether an instrument measures what it purports to measure (10) and has little to do with the measure’s precision or reliability. Reliability is often used as a surrogate for establishing validation evidence for an instrument. However, because of correlated errors, a measure can be highly reliable (i.e., repeatable). Therefore, whereas low reliability often hints at poor validation evidence, high reliability does not imply high validity. Nevertheless, researchers often fall back on reliability when validation evidence was not collected or when a gold standard simply does not exist.

Another key point is that validity is often confused with the absence of bias. Epidemiologists take validity to mean the overall quality of the measurement. However, bias is something quantifiable with methodology (bias or variance trade-off is testable in simulations). Validity, on the other hand, is a substantive problem with a yes or no answer: Is my tool measuring the attribute that I think it is measuring? To quote Ebel (11, p. 646), “The term valid is not to be made synonymous with the term ‘good.’”.

As discussed in the following sections, most methods that correct for measurement error require validation data—ideally internal validation data, which are collected within the parent study, but with certain assumptions, validation data can be external to the study—to quantify the error and appropriately correct for it. Regardless, a lack of validation data should not be an excuse to ignore measurement error, because there remain easily implementable options for correcting for or, at the minimum, evaluating the potential impact of measurement error.

FORMALIZING OUR THINKING ABOUT MEASUREMENT ERROR

Classical measurement error

Most tools that correct measurement error rely on a classical measurement error model, which assumes that the measured value of a variable A* varies around the true value, A, such that A* = A + UA, where the error, UA, is often assumed to be normally distributed with mean 0 and variance Inline graphic (Figure 1A). The classical error model assumes the error is independent of the true A and additive on some scale, which means that the variance of the measured variable will always be greater than that of the true variable, A. Furthermore, A* is assumed to be unbiased for A (i.e., random error as opposed to systematic error) and often the error variance is assumed to be constant (homoscedastic). However, one can easily think of extensions to this model (e.g., heteroscedastic, as opposed to homoscedastic, variance of UA; UA as a function of another variable; an added bias coefficient such as might occur as the result of faulty equipment or alternative error models (e.g., multiplicative error). Therefore, it is important to consider the measurement error mechanism and the inherent assumptions of a particular correction tool. Classical error will lead to bias in associations with a few exceptions; we discuss those in the following sections.

Figure 1.

Figure 1

Classical and Berkson measurement error models. This figure demonstrates 2 measurement error models, highlighting the relationship between the true underlying exposure, A, the measured value A*, and outcome, Y. A) In the classical measurement error, A* is influenced by both the exposure construct and some error factor, UA. Conversely, in (B) the Berkson error model, A varies around A* with some error function, UA.

Misclassification

The classical measurement error model, which arises from classical test theory, is most appropriately applied to measurement error in a continuous variable. However, measurement error can also occur in discrete variables, and this is termed misclassification. Error in discrete variables is not additive, thus methods for measurement error correction that assume additive errors should not be used without rescaling. Rather, models for misclassification use probabilities: the probability of A* = 1 given A = 0 and the probability of A* = 0 given A = 1. These probabilities define the familiar concepts of sensitivity (Pr(A* = 1|A = 1)) and specificity (Pr(A* = 0|A = 0)) of a given diagnostic test.

Berkson error model

Although the classical error model is the most recognized measurement error mechanism in epidemiologic studies, a second model exists that is more appropriate for clinical trials and many environmental studies. The Berkson error model posits a fixed value of the measured variable, A*, around which the true value, A, varies such that A = A* + UA (Figure 1B). Thus, the direction of causality is now flipped, with the true value a function of the measured value and error. Heid et al. (12) provide an example in a study of radon exposure and lung cancer. Radon exposure levels were assessed through measurements from a monitor in participants’ homes. Individual radon exposure, however, is influenced by factors such as the amount of time spent in the home and proximity to ventilation in the home; thus, the true exposure, A, will vary about the monitor measurement, A*. Berkson error in an exposure or covariate does not lead to bias but only to increased imprecision of the estimated association as long as the errors are independent of the outcome variable and other covariates in the model (12). Tools have been developed to account for Berkson error (e.g., Bayesian methods) (1216). However, in this review, we focus on methods that are suited for classical error correction.

PICTURING MEASUREMENT ERROR MECHANISMS

A first step to an etiologic model and causal inference is to consider causal relationships among the exposure, the outcome, and covariates. Directed acyclic graphs are typically, although not exclusively, used to posit and graphically depict these relationships. Developing an understanding of the measurement error mechanism using directed acyclic graphs can help guide us toward an appropriate correction tool. Figure 1A illustrates a simple exposure-outcome relationship between the true exposure, A, and true outcome Y, where A* is the measured value of the exposure, which is a function of A and the error UA (i.e., a classical error model). Note in this diagram there are no covariates and Y is perfectly captured.

We can now extend this simple directed acyclic graph to examine scenarios of systematic error. Differential error (Figure 2A and 2B) occurs when the errors in 1 variable (e.g., the exposure) may be a function of another variable (e.g., the outcome). Practically, this means that A* contains information about the outcome Y above and beyond what is available in A (or in confounders Z). Differential error is common when knowledge of the outcome can influence measurement of the exposure, A, like in a case–control or a cross-sectional study. Consider the scenario pictured in Figure 3, where the hypothesis is that opioid use increases risk of nonadherence to antiviral regimens among people living with HIV. An individual’s opioid use could affect recall of their antiretroviral use if therapy adherence is self-reported through, for example, a phone interview with a participant. Prevention of differential measurement error is the reason why investigators often mask exposure status when ascertaining outcome status. The direction and magnitude of the bias resulting from differential measurement error will be unique to the situation and can only be predicted if the errors in each stratum are known.

Figure 2.

Figure 2

Classical measurement error and extensions to the model. The directed acyclic graphs in the figure illustrate 4 types of measurement error defined based on the ideas of differential versus nondifferential and dependent versus independent error: A) dependent and differential; B) independent and differential; C) dependent and nondifferential; and D) independent and nondifferential. In each example, relationships between true exposure, A, and outcome, Y, are shown, where * represents the measured value, affected by the truth and some error term, U, specific to exposure (Ua), outcome (Uy), or both (Uay).

Figure 3.

Figure 3

Dependent, differential measurement error. Consider the research question, “Does opioid use increase risk of nonadherence to antiretroviral regimen among people infected with HIV?” In this example, both opioid and antiretroviral use are measured through the same self-reported survey (e.g., a phone call); because the survey is the same measurement tool, this leads to dependent measurement error. In this example, recollection of antiretroviral use (Y) may be systematically different among opioid users and opioid nonusers. Images by various artists from the Noun Project.

Dependent error (shown in Figure 2A and 2C) occurs when errors in 2 or more variables are correlated with each other (e.g., the error in the exposure is correlated with the error in the outcome). Returning to our scenario in Figure 3, what would happen if both opioid use and antiretroviral adherence were assessed through a phone interview? Individuals may experience cognitive effects from opioid use that influence their reporting of both the outcome and the exposure in similar ways. Or participants may systematically over-report adherence and under-report opioid use as a result of stigma around their behavior. This is commonly referred to as same-source bias and is an example of dependent measurement error. Similar to differential measurement error, the direction and magnitude of the bias that results from dependent measurement error are unique to the situation and very hard to predict.

THE MANY WAYS IN WHICH MEASUREMENT ERROR CAUSES PROBLEMS

Bias, imprecision, and random error

Measurement error in exposure variables, except in special cases with truly null associations, will always cause bias in measures of association. This includes error that is purely random, that is, error that is nondifferential and independent of other factors where E(A*|A) = A. In the case of random error in a single covariate consistent with the classical additive error model, regression dilution bias, which has been written about extensively (1, 1719), generally biases associations toward the null with the degree of attenuation dependent upon the variability of the error, Inline graphic. Looking at a slope estimate, Inline graphic, from a univariate least squares regression, we can see that replacing the true exposure, A, with a measured exposure, A*, that includes some error UA results in the following estimate for Inline graphic:

graphic file with name DmEquation1.gif (1)

where Inline graphic and Inline graphic. From this we can quantify the degree of bias and infer that it will always move toward the null for the univariate case:

graphic file with name DmEquation2.gif (2)

The bias factor, Inline graphic, is a function of the variance of the true exposure, A, and the error, Inline graphic, and will be between 0 and 1. It should be noted that random error in the outcome will not result in bias. In the multivariate case, random error in an exposure and correlated covariates will cause bias but no longer predictably toward the null.

In the case of systematic error (i.e., all other flavors of measurement error), the direction of bias in effect estimates that result from error in the exposure is less certain. That is because these error structures necessitate a departure from the classic additive error model. For example, Carroll et al (1) considered 2 different departures: one where A* is no longer unbiased for A, and a second where A* contains differential measurement error. To show how these error structures affect the bias of the slope, we must extend the classic measurement error model to

graphic file with name DmEquation3.gif (3)

where Inline graphic expresses the degree of bias in A*, and specify that Inline graphic is correlated with Inline graphic, the residuals from the univariate least squares regression (i.e., Inline graphic). The expression of Inline graphic now becomes

graphic file with name DmEquation4.gif (4)

We can see that, depending on the values of Inline graphic and Inline graphic, Inline graphic may be either farther away or closer to the null compared with Inline graphic.

Beyond bias in associations, measurement error also obscures epidemiologic relationships. From equation 4, we note that, whereas the variance of A is Inline graphic, the variance of A* is Inline graphic. This means the data from measures with error are noisier than the true underlying data, and the resulting residual error of the regression becomes Inline graphic, which is greater than Inline graphic, the residual error of the regression using the true exposure A. One may infer that this extra noise in the data results in imprecision of the estimated slope, Inline graphic. However this is not necessarily true (20). The estimate of the slope may even be more precise (i.e., have a narrower confidence interval (CI)) than the estimated slope from a regression on the true exposure, A, meaning that one could estimate the wrong thing (as a result of bias) with inappropriately greater confidence. However, the study will still suffer a loss of power, making it harder to detect true associations. This loss of power happens because the power (or, equivalently, sample size) required to detect a specific magnitude of effect is a function of the residual error and the variance of the exposure. Devine (21) demonstrated this loss of statistical power in a study in which the relationship between exercise and change in body mass index was investigated. With perfect capture of daily exercise, 235 participants were needed to achieve 90% power to detect a specified association. However, when a measured value of daily exercise was used with a ρ = 0.8 correlation with the truth, 586 participants were needed to detect the same association—2.5 times the sample size (21).

Measurement error in discrete variables

Misclassification occurs when discrete variables are imperfectly assessed. Sensitivity and specificity, which quantify the degree and nature of the error, are a function of the prevalence of the condition (e.g., exposure or outcome) (1, 17, 19, 22, 23). For rare exposures, for example, poor sensitivity (i.e., the inability to detect the exposure when it has truly occurred) will do little to bias an association, because the misclassification of a few exposed as unexposed (i.e., false negatives) will not markedly change the numbers of observed unexposed (true negatives plus false negatives). However poor specificity (i.e., the inability to detect the absence of exposure when it has truly not occurred) will produce many additional exposed (i.e., false positives), greatly shifting the number of observed exposed (true positives plus false positives) and biasing observed associations. Additional investigation to evaluate and describe misclassification is needed to accurately define how a particular study may be affected. In extreme cases, misclassification has been shown to reduce the desired power of the study (21), distort effect estimates and sometimes qualitatively alter interpretations (24), as well as misrepresent precision and uncertainty with narrow, yet inaccurate CIs (1, 19, 25). Measurement error in this setting, particularly for a 2-level variable, is readily visualized with a 2 × 2 table. However, complexity of visualizing the misclassification mechanism increases when discrete variables involve more than 2 levels or tables are stratified across levels of an adjustment variable.

METHODS FOR CORRECTION

Methods for measurement error correction

Approaches to measurement error correction can be loosely classified into 3 categories. The first approach is to adjust the effect estimate given information on the magnitude of error in a variable. This is the aim of algebraic correction, regression calibration, and simulation-extrapolation (SIMEX). A second approach is to correct measurement error in the measured values themselves before proceeding to analysis. This approach is exemplified by multiple imputation for measurement error (MIME). Last, one can model the measurement error mechanism simultaneously with the association of interest using a full likelihood approach, which is beyond the scope of this review (1, 3, 2630). It should be noted that most of the methods described do not deal with differential error or dependent errors. Table 1 summarizes the assumptions, error mechanisms, and available software for the methods described in the following sections.

Table 1.

Summary of Measurement Error Correction Tools

ME Correction Method Type of Data Validation Data Needed? What It Does Characteristics Software Packages
Algebraic adjustment Categorical (binary) Yes Adjusts the modeled effect estimate given information from a validation study Computationally simple
Accommodates differential ME
Can use external data for specificity and sensitivity estimates
Difficult to adjust for confounding; requires estimates to be pooled across many contingency tables (e.g., with the Mantel–Haenszel method)
Cannot accommodate dependent ME
STATA
“roctab”
SASa
“SENSPEC” option in “TABLES”
R
“Caret”
Regression calibration Continuous Yesb Adjusts the modeled effect estimate given information from a validation studyc Intuitive and straightforward to implement
Ideally suited to a classical error model, but extensions to multiplicative error are possible
Only accommodates nondifferential, independent ME
Most suited to linear models but can also handle misclassification in binary variables
For nonlinear models, output is only an approximation.
STATA
rcal”
SAS
%binplus”
R
investr” package
Simulation-extrapolation Categorical and continuous Not necessary Adjusts the modeled effect estimate given information from a validation study Can be applied in settings with covariate error and outcome error
Can be applied even if no information on the magnitude of error is available from a validation study
Can be used in cases of both homoscedastic and heteroscedastic error
Although suited to additive measurement error, it can be adapted to more general error mechanisms.
Highly dependent on the extrapolation model
Corrections are approximate.
STATA
“simex”
R
SIMEX” or “SIMEXaft” package
Multiple imputation for measurement error Categorical and continuous Yes Correct measurement error in the measured values prior to modeling Accommodates differential ME
Can be done once and used with a multitude of analyses and models, because the correction is applied to the individual-level data.
Appropriate for missing data mechanisms of MCAR or MAR for the gold standard
Can propagate bias if relationship between gold standard and mismeasured covariate is poorly estimated
STATA
mi impute”
SAS
PROC MI” and
PROC MIANALYZE”
R
MICE” and “smcfcs” package

Abbreviations: MAR, missing at random; MCAR, missing completely at random; ME, measurement error.

a SAS Institute, Cary, NC. Regression calibration can also be implemented with internal replicate data under certain assumptions in the absence of validation data.

b Regression calibration can also be implemented in a manner similar to multiple imputation for measurement error, using estimates of the gold standard value for each individual in the outcome regression.

Validation data and study planning

The term validation data generally refers to a set of data in which both the measured variable, A*, and a gold standard, A′, are captured. Notice that A′ is used instead of A to indicate that the gold standard is not necessarily the truth (i.e., the unobservable construct of interest). Rather, A′ is a better measurement that we believe has less error than the mismeasured A* (i.e., A′ is a better proxy for A). Some methods for measurement error correction require a validation data set to impute missing gold standard values, whereas others need validation data to estimate the error variance or a calibration parameter. Ideally, researchers use a subset nested within the study sample (e.g., a 10% or 20% random sample) to collect A′. However, this requires a priori study-design planning, because internal validation data are difficult to obtain after the primary data collection phase. If an internal validation subset is not available, external validation data (i.e., validation data from outside the study sample) can be used under the assumption that the relationship between A* and the gold standard A′ is transportable across samples or contexts (3135). There is no way to test that assumption in the absence of internal validation data, and different settings, populations, and periods may introduce heterogeneity in the relationship between A* and A′ that limit the usefulness of the external validation data (36). For simplicity, we assume in the following sections that the gold standard, A′, perfectly captures the truth, A.

Algebraic correction of misclassification

Algebraic correction adjusts measures of association derived from contingency tables including odds ratios, risk ratios, and incidence rate ratios. Therefore, algebraic correction is limited to binary variables (i.e., cases and controls, or exposed and unexposed categorizations). This method requires validation data from which estimates of sensitivity and specificity relative to a gold standard are derived that describe how the measurement, A*, misclassifies the number of truly positive individuals, Inline graphic, such that some truly positive individuals are incorrectly classified as negative, Inline graphic (this is the false-negative rate (Fn): Fn = 1 − sensitivity) and misclassifies the number of truly negative individuals, Inline graphic such that some truly negative individuals are incorrectly categorized as positive, Inline graphic (this is the false-positive rate (Fp): Fp = 1 − specificity). The estimates of sensitivity and specificity can be used to undo the misclassification in a contingency table using the formulas:

graphic file with name DmEquation5.gif (5)

and

graphic file with name DmEquation6.gif (6)

where N is the total study sample, Sp is specificity, Se is sensitivity. From the corrected contingency table, one can calculate the corrected measure of association. An example of algebraic correction can be found in the report by Greenland et al. (37). This method can be used with differential misclassification (where sensitivity and specificity differ across strata of another covariate, C) by stratifying the contingency table by levels of C and performing algebraic correction in each stratum using the stratum specific estimates of sensitivity and specificity. However, algebraic correction is not appropriate in cases of dependent measurement error. Although it is possible to use algebraic correction and still control for confounding, the contingency table must again be stratified by levels of all confounders. Correction and estimation of the association is then performed for each stratum, with estimates recombined using the Mantel–Haenszel method to obtain a final corrected measure of association (38). This can be a tedious process and may result in coarse adjustment for confounders or unstable strata-specific estimates of the measure of association, due to sample-size limitations. Software packages for implementation can be found in Table 1.

Regression calibration for measurement error

Regression calibration (RC) is a clever and intuitive method suitable for nondifferential independent measurement error that corrects point and variance estimates for bias due to measurement error in 1 or more continuous covariates (but not the outcome) with a continuous or dichotomous outcome (39). The most straightforward implementation of RC requires 3 steps to correction: first, the outcome (Y) is regressed on the measured covariate (A*) and confounders (Z, which are assumed to be error free) as follows:

graphic file with name DmEquation7.gif (Step 1)

The resulting Inline graphic is the biased effect estimate for the relationship between the measured exposure, A*, and Y. Second, A′, the gold standard, is regressed on A* and all confounders from step 1 in a calibration model:

graphic file with name DmEquation8.gif (Step 2)

where Inline graphic is the calibration parameter estimate, which describes the adjusted relationship between the measured A* and the gold standard A′. Finally, the biased Inline graphic from step 1 is divided by the calibration parameter Inline graphic from step 2 to yield the corrected parameter estimate, Inline graphic:

graphic file with name DmEquation9.gif (Step 3)

Standard errors for the corrected Inline graphic can be derived from a bootstrap variance estimator or an asymptotic variance estimator (30, 40). An example of the application of RC can be found in the study by Freedman et al. (41) in which they used data from the National Institutes of Health–AARP Diet and Health. The investigators examined the association between percentage calories from fat in the diet and death in women, initially finding a harmful association using the naïve estimator (odds ratio = 2.03) and an increased risk using the regression calibration estimator, drawn from an internal calibration study (odds ratio = 4.66).

To correctly estimate the variance, the validation data set should be independent of the analytic data set (in which β* is estimated). In practice, the internal validation set is often excluded from the analytic data set; however, there are more efficient estimators that can use the validation data in the estimation of β (42). RC works best in cases where errors are small to moderate in magnitude. RC is most suited to problems in generalized linear models. For dichotomous outcomes, RC works best in contexts of rare outcomes and smaller relative measures of association when the logit scale is approximately linear; however, approximations exist for more highly nonlinear cases. RC can also be used with internal replicate data (i.e., repeated measures of A*) under the assumption that the errors have mean zero and constant variance (i.e., classical error model) (40, 4345). In general, the basic implementation assumes additive errors from a classical error model, but extensions to multiplicative errors are possible through scale transformation or using a regression calibration function that is nonlinear in A* (1). RC is relatively straightforward to implement; several good resources are available (1) along with software adapted for a variety of models (30). It should be noted that an alternate implementation of RC replaces A* in step 1 by its estimate from step 2 (1), which turns the procedure into a version of imputation for measurement error, as described in a subsequent section.

Simulation-extrapolation

SIMEX is a functional method proposed by Cook and Stefanski (46) to correct for measurement error using a simulation-based approach. Conveniently for the researcher, this method does not necessarily require an internal validation data set; SIMEX does require, however, a good estimate of the error variance from some source, such as transportable external validation data, a priori knowledge from the literature, or expert opinion. The SIMEX method assumes a smooth relationship between the error and the (biased) effect estimate, Inline graphic, and further assumes that the effect of the error on the estimator can be simulated. Recall that the classical error model has the form A* = A + U. In SIMEX, the error term, U, is described by Inline graphic, where Inline graphic can be used to increase the amount of measurement error. When Inline graphic, θ corresponds to the actual error in A*, Inline graphic, which is assumed to be known or estimated from validation data, replicate data, or other sources. We can then use Inline graphic to add error in units of Inline graphic until the shape of the relationship between the error and the resulting effect estimate can be approximated. Then, we extrapolate back to a point of zero error, which corresponds to an unbiased effect estimate, Inline graphic. SIMEX does not assume any distribution for the true variable, A. An application of SIMEX can be found in a report by Ngantcha et al. (47) in which the association between hospital process indicators and hospital standardized mortality ratios was evaluated. Relying on the assumption that hospital process indicators are prone to classical additive error, the authors simulated hospital process indicator data with increasing measurement error and estimated the association with hospital standardized mortality ratios to obtain 4 additional points relating the error to the association of interest. The relationship was fit using quadratic regression, and the corrected estimate of the association was determined through extrapolation back to the theoretical point of zero measurement error (47).

SIMEX does depend heavily on how the simulated data are fit (i.e., the assumed functional form that describes the effect of the error on the estimate). Furthermore, when measurement error affects more than 1 variable, SIMEX can become less reliable because there is even more reliance on the extrapolation model (46). However, SIMEX has several advantages. It is versatile and can handle measurement error in multiple covariates as well as the outcome (both continuous and dichotomous) (48, 49); it is applicable to general estimation methods (5052); it can be used with a heteroscedastic error model (i.e., error that is not constant across A) (53); and, although suited to additive measurement error, it can be generalized to any error mechanism that can be simulated using Monte Carlo methods (54). Table 1 lists SIMEX software packages in R (R Foundation for Statistical Computing, Vienna, Austria) and Stata (StataCorp LP, College Station, Texas).

Multiple imputation for measurement error correction

Multiple imputation for measurement error (MIME) treats measurement error as a missing data problem. This method, which requires internal validation data, can be used with both differential and nondifferential independent measurement error. In MIME, the gold standard, A′, is the variable we need to impute for those outside the validation subset. The validation data set is ideally a random subset; therefore, the missing data mechanism is missing completely at random within the full sample. However, differential missingness of the gold standard by levels of observed covariates can also be accommodated. MIME draws on the relationship between A′ and A* in the validation subset as well as the relationships between A′ and other covariates—including the intended outcome, Y—to impute the missing A′ for the full sample. As with any imputation procedure, all variables to be used in the analysis should be included in the imputation model. This relates to the concept of congeniality and helps assure that the imputation and analysis models are compatible (i.e., arise from the same joint model of the data (55)). One didactic example of MIME can be found in a study by Cole et al. (57) in which they used MIME to estimate the hazard ratio for end-stage renal disease in a simulated data set of 600 children with chronic kidney disease. For the exposure of low glomerular filtration rate at study entry, they estimated the true hazard ratio as 2.0 (95% CI: 1.4, 2.8). Misclassification in this exposure yielded a naïve estimate of 1.5 (95% CI: 1.1, 2.1), but following MIME, the corrected estimate was 2.0 (95% CI: 1.2, 3.3). In this example, MIME outperformed RC in simulations where the validation substudy sample was small or the measurement error magnitude was relatively large. MIME can correct for differential measurement error in observed variables by including interaction terms in the imputation model. A further advantage is that, because the measurement error correction is performed at the data level rather than correcting the estimated measure of association, MIME’s application is not limited to specific models or data types. In other words, it is as flexible as multiple imputation for missing data (56). The main limitation is that if the relationship between A′ and A* is uncertain (e.g., due to a small validation set), MIME will propagate this uncertainty by increasing the estimated variance, resulting in large CIs (57). Table 1 lists software packages for MIME.

SENSITIVITY ANALYSES FOR MEASUREMENT ERROR.

There will be cases where none of the methods we have described thus far are suitable, either because knowledge of the magnitude of the error is unavailable or the assumptions for the methods are violated. In those cases, sensitivity analysis is always an option. A sensitivity analysis for measurement error uses a range of plausible error mechanisms and magnitudes, elucidating the scenarios in which measurement error would completely explain an observed exposure–outcome association. There have been numerous calls for use of sensitivity analyses as standard practice to push the boundaries of our inferences with regard to systematic (as opposed to random) error (28, 29, 58). Methods vary for the implementation of sensitivity analyses, ranging from deriving mathematical relationships between the magnitude of the error and the effect size (59) to more complicated Bayesian approaches by incorporating prior probability distributions that bound the likely magnitude of errors (60). Some software has been developed for easy implementation of misclassification sensitivity analyses using a Monte Carlo simulation in the case of dichotomous variables (61). Given that often analysts will encounter studies in which validation data are not available, sensitivity analysis methods that assess the potential or plausible impact of measurement error may be the most useful tools in the measurement error toolbox. Directed acyclic graphs can be helpful to uncover bias structures resulting from measurement error that resemble confounding, thereby revealing possibilities to experiment with sensitivity analysis tools for confounding and evaluate measurement error bias (6264).

DISCUSSION

Epidemiologists are often ill prepared to deal with measurement error. It is, in some ways, the Achilles’ heel in our studies, despite its ubiquitous nature. Programs may not provide sufficient training in measurement theory or tools for correcting measurement error; in a recent survey of 57 epidemiology doctoral program directors and representatives, approximately 30% of respondents reported they felt developing competency in sound methods of measurement for primary data collection was not very or extremely important (65). Furthermore, there are often limited resources for implementing a validation substudy to quantify the error in our measurements. If there is a trade-off between adding another longitudinal visit or conducting an internal validation study, many researchers would rather include another study visit. There are also reasons why an internal validation study may not be feasible: The cost of measuring the gold standard is too high or the gold standard may be an invasive test that could increase participant burden and risk. Examination of liver biopsy specimens has long been the gold standard (albeit alloyed) for diagnosing liver disease, but the procedure involves potential risks (although rare), including bleeding, damage to other organs, and infection (66). These barriers limit our understanding of the variability in many of our measures, to the detriment of our research.

Devoting resources and time to collecting validation data and better designing our studies to deal with measurement error would improve the robustness of our evidence base. Gold standards in internal validation studies rarely are truly flawless, and correlation with an established standard (i.e., criterion validity) or consistency with other indicators of the underlying construct (e.g., as assessed with Cronbach α) does not guarantee freedom from bias and imprecision (67). However, there is still much that can be learned from validation studies in terms of variability.

Even without internal validation data, there are tools to help quantify the effect of measurement error and even correct it. Many simple strategies can be used with error information from external sources (e.g., algebraic correction or SIMEX). More advanced methods like Bayesian adjustment allow the user to make educated guesses about the magnitude of error (1, 26, 68). Full-likelihood and quasi-likelihood methods can be used to construct joint distributions to incorporate a model of the error mechanism with the outcome model (1, 3, 2630). Lastly, one can always perform sensitivity analyses using worst-case and best-case scenarios to bound the possible effect of measurement error. Sensitivity analyses should be standard in scientific research to evaluate the impact of quantitative biases, including measurement error (28, 29, 58).

Measurement error has important implications for understanding average causal treatment effects and contextualizing magnitude of associations in many epidemiologic applications. As Big Data continues to provide a wealth of opportunities for research, epidemiologists must be aware that statistically significant associations with high precision (as a result of large samples) may be less meaningful at the individual level if the associations are within the bounds of measurement error. Conversely, measurement error will not always affect inferences (e.g., measurement error in individual confounders may not lead to meaningful bias in exposure-outcome relationships) (69). However, there is little available guidance for when to worry about and when to ignore measurement error. A look at the literature on the impact of measurement error indicates the answer is “it depends” (70, 71).

This review is by no means exhaustive in either theory or technical detail; there is a wealth of literature on measurement and practical applications of measurement error–correction tools as well as good, general technical sources within causal and association frameworks where measurement error may plague exposure or outcome variables (1, 19, 23, 72). Our objective was to provide an approachable description of the problem of measurement error and offer guidance for implementing several accessible methods for epidemiologists regardless of statistical background. We should all spend more time digging into the problem of measurement error and learning the tools to address it. Doctoral programs in epidemiology could begin to encourage students, through curriculum, to spend as much time thinking about measurement error as we currently do confounding bias. Unlike in the case of confounding, randomization does not protect us from measurement error. This is a problem that will continue to plague our studies and cause a meaningful share of the noise in the evidence base. However, correction methods with strong theoretical and statistical underpinnings are accessible to deal with measurement error and obtain less-biased estimates if epidemiologists are willing to incorporate them into their analytic repertoire.

ACKNOWLEDGMENTS

Author affiliations: Department of Epidemiology, Rutgers School of Public Health, Piscataway, New Jersey, United States (Gabriel K. Innes); Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States (Fiona Bhondoekhan, Bryan Lau, Alden L. Gross, Derek K. Ng, Alison G. Abraham); and Department of Epidemiology, University of Colorado, Anschutz Medical Campus, Denver, Colorado, United States (Alison G. Abraham).

This work was funded by National Institutes of Health grants U01-AA020793; R01-CA250851; P30-AI094189; R24-AI067039; U24-OD023382; U01-HL146193; 1R01AG052412; U01-DK-66143, U01-DK-66174, U01DK-082194, and U01-DK-66116.

Conflict of interest: none declared.

REFERENCES

  • 1. Carroll  RJ, Ruppert  D, Stefanski  LA, et al.  Measurement Error In Nonlinear Models: A Modern Perspective. 2nd ed. Boca Raton, FL: Chapman and Hall; 2006. [Google Scholar]
  • 2. Brakenhoff  TB, Mitroiu  M, Keogh  RH, et al.  Measurement error is often neglected in medical literature: a systematic review. J Clin Epidemiol.  2018;98:89–97. [DOI] [PubMed] [Google Scholar]
  • 3. Shaw  PA, Deffner  V, Keogh  RH, et al.  Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. Ann Epidemiol.  2018;28(11):821–828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Fosgate  GT. Non-differential measurement error does not always bias diagnostic likelihood ratios towards the null. Emerg Themes Epidemiol.  2006;3(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Sechrest  L. Validity of measures is no simple matter. Health Serv Res.  2005;40(5p2):1584–1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Cronbach  LJ, Meehl  PE. Construct validity in psychological tests. Psychol Bull.  1955;52(4):281–302. [DOI] [PubMed] [Google Scholar]
  • 7. Pickering  TG, Hall  JE, Appel  LJ, et al.  Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans - a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on high blood pressure research. Circulation.  2005;111(5):697–716. [DOI] [PubMed] [Google Scholar]
  • 8. Tworoger  SS, Hankinson  SE. Use of biomarkers in epidemiologic studies: minimizing the influence of measurement error in the study design and analysis. Cancer Causes Control.  2006;17(7):889–899. [DOI] [PubMed] [Google Scholar]
  • 9. White  E. Measurement error in biomarkers: sources, assessment, and impact on studies. IARC Sci Publ.  2011;(163):143–161. [PubMed] [Google Scholar]
  • 10. Kelley  TL. Interpretation of Educational Measurements. New York, NY: Macmillan; 1927:14. [Google Scholar]
  • 11. Ebel  RL. Must all tests be valid?  Am Psychol.  1961;16(10):640–647. [Google Scholar]
  • 12. Heid  IM, Küchenhoff  H, Miles  J, et al.  Two dimensions of measurement error: classical and Berkson error in residential radon exposure assessment. J Expo Anal Environ Epidemiol.  2004;14(5):365–377. [DOI] [PubMed] [Google Scholar]
  • 13. Hoffmann  S, Rage  E, Laurier  D, et al.  Accounting for Berkson and classical measurement error in radon exposure using a Bayesian structural approach in the analysis of lung cancer mortality in the French cohort of uranium miners. Radiat Res.  2017;187(2):196–209. [DOI] [PubMed] [Google Scholar]
  • 14. Szpiro  AA, Paciorek  CJ. Measurement error in two-stage analyses, with application to air pollution epidemiology. Environ.  2013;24(8):501–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hoffmann  S, Guihenneuc  C, Ancelet  S. A cautionary comment on the generation of Berkson error in epidemiological studies. Radiat Environ Biophys.  2018;57(2):189–193. [DOI] [PubMed] [Google Scholar]
  • 16. Haber  G, Sampson  J, Graubard  B. Bias due to Berkson error: issues when using predicted values in place of observed covariates. Biostatistics.  2021;22(4):858–872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Berglund  L. Regression dilution bias: tools for correction methods and sample size calculation. Ups J Med Sci.  2012;117(3):279–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Frost  C, Thompson  SG. Correcting for regression dilution bias: comparison of methods for a single predictor variable. J R Stat Soc Ser A Stat Soc.  2000;163(2):173–189. [Google Scholar]
  • 19. Gustafson  P. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Boca Raton, FL: CRC Press; 2003. [Google Scholar]
  • 20. Buzas  JS, Stefanski  LA, Tosteson  TD. Measurement error BT. In: Ahrens  W, Pigeot  I, eds. Handbook of Epidemiology. New York, NY: Springer New York; 2014:1241–1282. [Google Scholar]
  • 21. Devine  O. The impact of ignoring measurement error when estimating sample size for epidemiologic studies. Eval Health Prof.  2003;26(3):315–339. [DOI] [PubMed] [Google Scholar]
  • 22. Hutcheon  JA, Chiolero  A, Hanley  JA. Random measurement error and regression dilution bias. BMJ.  2010;340:c2289. [DOI] [PubMed] [Google Scholar]
  • 23. Fuller  WA. Measurement Error Models. New York, NY: John Wiley and Sons; 2006. [Google Scholar]
  • 24. Copeland  KT, Checkoway  H, McMichael  AJ, et al.  Bias due to misclassification in the estimation of relative risk. Am J Epidemiol.  1977;105(5):488–495. [DOI] [PubMed] [Google Scholar]
  • 25. Funk  MJ, Landi  SN. Misclassification in administrative claims data: quantifying the impact on treatment effect estimates. Curr Epidemiol Rep.  2014;1(4):175–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Rice  K. Full-likelihood approaches to misclassification of a binary exposure in matched case-control studies. Stat Med.  2003;22(20):3177–3194. [DOI] [PubMed] [Google Scholar]
  • 27. McNeece  G, Naughton  V, Woodward  MJ, et al.  Array based detection of antibiotic resistance genes in gram negative bacteria isolated from retail poultry meat in the UK and Ireland. Int J Food Microbiol.  2014;179:24–32. [DOI] [PubMed] [Google Scholar]
  • 28. Greenland  S. Basic methods for sensitivity analysis of biases. Int J Epidemiol.  1996;25(6):1107–1116. [PubMed] [Google Scholar]
  • 29. Fox  MP, Lash  TL. On the need for quantitative bias analysis in the peer-review process. Am J Epidemiol.  2017;185(10):865–868. [DOI] [PubMed] [Google Scholar]
  • 30. Keogh  RH, Shaw  PA, Gustafson  P, et al.  STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: part 1—basic theory and simple methods of adjustment. Stat Med.  2020;39(16):2197–2231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Nieboer  D, van der  Ploeg  T, Steyerberg  EW. Assessing discriminative performance at external validation of clinical prediction models. PLoS One.  2016;11(2):e0148820–e0148820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Justice  AC, Covinsky  KE, Berlin  JA. Assessing the generalizability of prognostic information. Ann Intern Med.  1999;130(6):515–524. [DOI] [PubMed] [Google Scholar]
  • 33. Luijken  K, Groenwold  RHH, Van Calster  B, et al.  Impact of predictor measurement heterogeneity across settings on the performance of prediction models: a measurement error perspective. Stat Med.  2019;38(18):3444–3459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Siddique  J, Daniels  MJ, Carroll  RJ, et al.  Measurement error correction and sensitivity analysis in longitudinal dietary intervention studies using an external validation study. Biometrics.  2019;75(3):927–937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Guo  Y, Little  RJ, McConnell  DS. On using summary statistics from an external calibration sample to correct for covariate measurement error. Epidemiology.  2012;23(1):165–174. [DOI] [PubMed] [Google Scholar]
  • 36. Riley  RD, Ensor  J, Snell  KIE, et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ.  2016;353:i3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Greenland  S, Salvan  A, Wegman  DH, et al.  A case-control study of cancer mortality at a transformer-assembly facility. Int Arch Occup Environ Health.  1994;66(1):49–54. [DOI] [PubMed] [Google Scholar]
  • 38. Mantel  N, Haenszel  W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst.  1959;22(4):719–748. [PubMed] [Google Scholar]
  • 39. Spiegelman  D, McDermott  A, Rosner  B. Regression calibration method for correcting measurement-error bias in nutritional epidemiology. Am J Clin Nutr.  1997;65(4 Suppl):1179S–1186S. [DOI] [PubMed] [Google Scholar]
  • 40. Carroll  RJ, Stefanski  LA. Approximate quasi-likelihood estimation in models with surrogate predictors. J Am Stat Assoc.  1990;85(411):652–663. [Google Scholar]
  • 41. Freedman  LS, Midthune  D, Carroll  RJ, et al.  A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression. Stat Med.  2008;27(25):5195–5216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Spiegelman  D, Carroll  RJ, Kipnis  V. Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. Stat Med.  2001;20(1):139–160. [DOI] [PubMed] [Google Scholar]
  • 43. Gesler  LJ. Statistical Analysis of Measurement Error Models and Applications. Providence, RI: American Mathematical Society; 1990. [Google Scholar]
  • 44. Wang  N, Carroll  RJ, Liang  KY. Quasilikelihood and variance functions in measurement error models with replicates. Biometrics.  1996;52(2):401–432. [PubMed] [Google Scholar]
  • 45. Liu  X, Liang  K-Y. Efficacy of repeated measures in regression models with measurement error. Biometrics.  1992;48(2):645–654. [PubMed] [Google Scholar]
  • 46. Cook  JR, Stefanski  LA. Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc.  1994;89(428):1314–1328. [Google Scholar]
  • 47. Ngantcha  M, Le-Pogam  M-A, Calmus  S, et al.  Hospital quality measures: are process indicators associated with hospital standardized mortality ratios in French acute care hospitals?  BMC Health Serv Res.  2017;17(1):578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Küchenhoff  H, Mwalili  SM, Lesaffre  E. A general method for dealing with misclassification in regression: the misclassification SIMEX. Biometrics.  2006;62(1):85–96. [DOI] [PubMed] [Google Scholar]
  • 49. Spiegelman  D, Rosner  B, Logan  R. Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J Am Stat Assoc.  2000;95(449):51–61. [Google Scholar]
  • 50. Greene  WF, Cai  J. Measurement error in covariates in the marginal hazards model for multivariate failure time data. Biometrics.  2004;60(4):987–996. [DOI] [PubMed] [Google Scholar]
  • 51. Oh  EJ, Shepherd  BE, Lumley  T, et al.  Considerations for analysis of time-to-event outcomes measured with error: bias and correction with SIMEX. Stat Med.  2018;37(8):1276–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. He  W, Yi  GY, Xiong  J. Accelerated failure time models with covariates subject to measurement error. Stat Med.  2007;26(26):4817–4832. [DOI] [PubMed] [Google Scholar]
  • 53. Wang  X-F, Sun  J, Fan  Z. Deconvolution density estimation with heteroscedastic errors using SIMEX. Electron J Stat.  2009;1–21. [Google Scholar]
  • 54. Slate  EH, Bandyopadhyay  D. An investigation of the MC-SIMEX method with application to measurement error in periodontal outcomes. Stat Med.  2009;28(28):3523–3538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Meng  X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci.  1994;9(4):538–558. [Google Scholar]
  • 56. Livingston  MD, Cannell  B, Muller  K, et al.  Comparing methods of misclassification correction for studies of adolescent alcohol use. Am J Drug Alcohol Abuse.  2018;44(2):160–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Cole  SR, Chu  H, Greenland  S. Multiple-imputation for measurement-error correction. Int J Epidemiol.  2006;35(4):1074–1081. [DOI] [PubMed] [Google Scholar]
  • 58. Lash  TL, Fox  MP, MacLehose  RF, et al.  Good practices for quantitative bias analysis. Int J Epidemiol.  2014;43(6):1969–1985. [DOI] [PubMed] [Google Scholar]
  • 59. VanderWeele  TJ, Li  Y. Simple sensitivity analysis for differential measurement error. Am J Epidemiol.  2019;188(10):1823–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Agogo  GO, van der  Voet  H, van’t  Veer  P, et al.  A method for sensitivity analysis to assess the effects of measurement error in multiple exposure variables using external validation data. BMC Med Res Methodol.  2016;16(1):139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Fox  MP, Lash  TL, Greenland  S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol.  2005;34(6):1370–1376. [DOI] [PubMed] [Google Scholar]
  • 62. Arah  OA. Bias analysis for uncontrolled confounding in the health sciences. Annu Rev Public Health.  2017;38:23–38. [DOI] [PubMed] [Google Scholar]
  • 63. Vanderweele  TJ, Arah  OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology.  2011;22(1):42–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Groenwold  RHH, Nelson  DB, Nichol  KL, et al.  Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiology. 2010;39(1):107–117. [DOI] [PubMed] [Google Scholar]
  • 65. Hlaing  WM, Schmidt  RD, Ahn  S, et al.  A snapshot of doctoral training in epidemiology: positioning us for the future. Am J Epidemiol.  2020;189(10):1154–1162. [DOI] [PubMed] [Google Scholar]
  • 66. Mehta  SH, Lau  B, Afdhal  NH, et al.  Exceeding the limits of liver histology markers. J Hepatol.  2009;50(1):36–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Dunn  TJ, Baguley  T, Brunsden  V. From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. Br J Psychol.  2014;105(3):399–412. [DOI] [PubMed] [Google Scholar]
  • 68. Guolo  A, Brazzale  AR. A simulation-based comparison of techniques to correct for measurement error in matched case–control studies. Stat Med.  2008;27(19):3755–3775. [DOI] [PubMed] [Google Scholar]
  • 69. Chesher  A. The effect of measurement error. Biometrika.  1991;78(3):451–462. [Google Scholar]
  • 70. Carroll  RJ, Gail  MH, Lubin  JH. Case-control studies with errors in covariates. J Am Stat Assoc.  1993;88(421):185–199. [Google Scholar]
  • 71. Zeger  SL, Thomas  D, Dominici  F, et al.  Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect.  2000;108(5):419–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Buonaccorsi  JP. Measurement Error and Misclassification: Models, Methods and Applications. Florence, KY: Chapman & Hall/CRC; 2009. [Google Scholar]

Articles from Epidemiologic Reviews are provided here courtesy of Oxford University Press

RESOURCES