Abstract
Investigators sometimes use information about a given variable obtained from multiple informants. We focus on estimating the effect of a predictor on a continuous outcome, when that predictor cannot be observed directly but is measured by two informants. We describe various approaches to using information from two informants to estimate a regression or correlation coefficient for the effect of the (true) predictor on the outcome. These approaches include methods we refer to as single informant, simple average, optimal weighted average, principal components analysis, and classical measurement error. Each of these five methods effectively uses a weighted average of the informants' reports as a proxy for the true predictor in calculating the correlation or regression coefficient. We compare the performance of these methods in simulation experiments that assume a rounded congeneric measurement model for the relationship between the informants' reports and a true predictor that is a mixture of zeros and positively-distributed continuous values. We also compare the methods' performance in a real data example -the relationship between vigorous physical activity (the predictor) and body mass index (the continuous outcome). The results of the simulations and the example suggest that the simple average is a reasonable choice when there are only two informants.
Investigators interested in measuring behavioral or psychological constructs, especially in children or adolescents, often collect information from multiple informants or sources. For example, investigators studying children may obtain reports from parents, teachers, and the child himself if he is old enough. The resulting multiple informant data are generally discrepant. In meta-analyses examining cross-informant correlations, the mean correlation between child report and other report of child psychopathology was 0.22,1 and the mean correlations for cross-informant reports of adult psychopathology based on the same instrument ranged between 0.40 and 0.70.2 Discrepancies between informants are thought to reflect random measurement error, systematic measurement bias, and the different perspectives of the informants (e.g., home versus school). In fact, discrepancy is the very reason for seeking reports from multiple sources: when information from multiple informants is combined, it is believed that different contexts and perspectives will be represented.1
A considerable amount of research has focused on using multiple informant data as outcomes3–5, but we focus here on using them as predictors.6 More specifically, we focus on the situation where it is assumed that there is an unobserved variable (the “true predictor”) underlying the informants' reports (which themselves are not perfect or near perfect (“gold-standard”) measures of the underlying variable), and where interest lies in estimating the effect of the true predictor on a continuous outcome. Our focus on this situation is motivated by a real data example7 where mother and child reports of the child's daily vigorous physical activity are used to estimate the effect of such physical activity on the child's body mass index (BMI). In situations where interest lies in modeling the effect of the informants' reports themselves on the outcome (e.g., when informants' perceptions of the true predictor are thought to be informative about the outcome8), it is common to perform separate regressions or include all the reports simultaneously as predictors in a regression model.6 In contrast, modeling the effect of the true predictor on the outcome is more complicated because the multiple informants' reports must be combined into a single measure of the true predictor. Further, the reports will ideally be combined in a way that corrects for measurement error. It is well known that failing to correct for measurement error in predictors can result in biased estimates and, when there are other predictors, invalid confidence intervals that do not have the nominal coverage properties.9
We compare five ways to combine multiple informant data to estimate a linear regression or correlation coefficient for the effect of the true predictor on the continuous outcome. First, we describe in detail how to implement these approaches. Then, we describe simulation experiments that compare the performance of these approaches in terms of bias and mean squared error in situations where the true predictor is a mixture of zeros and continuously-distributed positive values, as in the physical activity and BMI example. (This type of predictor is a commonly encountered example of a “semicontinuous” variable.) Finally, we compare the estimates produced by these approaches for the aforementioned physical activity and BMI example.
Overview of Approaches
We assume the following model for the relationship of interest:
| (1) |
where Ti is the true predictor and Yi is the continuous outcome for the ith individual, and the εis are independently and identically distributed with mean 0 and constant variance . In some instances, the parameter of interest will be β. In the example of vigorous physical activity and BMI, we might be interested in the expected change in BMI for one extra hour of vigorous physical activity. In other instances, the parameter of interest will be ρ, the (Pearson) correlation coefficient between Y and T. For example, if Y and T are not measured in meaningful units, ρ might be preferred because it is scale-invariant.
However, because T cannot be observed directly, information about β or ρ must be inferred from the relationship between Y and the informants' reports on T. The true predictor Ti is measured with error by each of J (here, 2) informants, and the resulting measurements of (or reports on) the true predictor are referred to as Xi1,…, XiJ. Figure 1 illustrates the relationship between the outcome, the true predictor, and the multiple informants' reports, and also introduces the concept of a gold standard measurement (a measurement of the true predictor with minimal measurement error, which we refer to as Gi) because it is relevant to the physical activity and BMI example.
Figure 1.
Illustration of Key Terminology. Variables that appear in black are observed, and variables that appear in gray are unobserved. The effect of interest is the effect of T on Y. This effect can be described by β or ρ, which are the parameters of interest.
We now describe some of the approaches commonly used to estimate β or ρ from the Xs and Ys. (Estimates of ρ and β will be referred to as and , respectively.) We refer to each approach either by its popular name or by a name that we have chosen to be reasonably descriptive. The approaches all amount to using a weighted average of the Xs (with weights that may not necessarily sum to one) as a proxy for T in calculating β or ρ. For method M, thus takes the form
| (2) |
and takes the form
| (3) |
where , with and being method-specific weights. There are a number of ways to determine the “best” weights for the two informants,10,11 as illustrated by the following methods.
In the single informant (SI) method, information from only one informant is used, typically because that informant is thought to provide information about T that is more valid or less often missing. Supposing without loss of generality that Informant 1 is the preferred informant, then and can be estimated by using Equations (2) and (3), respectively, with and .
In the simple average (SA) method, information from both informants is averaged using equal weights: and are estimated by using Equations (2) and (3), respectively, with . The simple average method is easy to implement and has been used in a variety of applications, e.g., reference 12.
In what we refer to as the optimal weighted average (OWA) method, the weights used to calculate and minimize the sum of squared errors from the linear regression (or identically, maximize the absolute value of the correlation coefficient) for Y and a weighted average of the Xs:
| (4) |
where and are estimates of the intercept and slope from the linear regression of Yi on (w1Xi1 + w2Xi2) and where w2 = 1 − w1.
In the principal components analysis (PCA) method, the first principal component from the multiple informant data is used as the predictor. This method, along with the factor analysis method (see the Conclusions section below), is advocated by Kraemer and colleagues for use with three or more multiple informants carefully selected to cover multiple perspectives and contexts.13 In practice, however, principal components analysis is often used with any two or more informants. Principal components analysis is a eigen-decomposition of the covariance (or correlation) matrix for the data, in this case the n × 2 matrix containing the information from the multiple informants. The first principal component (or “score”) is calculated by multiplying the centered data matrix by the “loadings,” the scaled eigenvector corresponding to the largest eigenvalue from the eigen-decomposition. The eigenvector is usually scaled so that its squared elements sum to one, but here we re-scale it so that its (un-squared) elements sum to one in order to preserve the scale of the predictor construct. Because applying the principal components analysis method to the correlation matrix is identical to the simple average method when there are only two informants, we focus on the principal components analysis method as applied to the covariance matrix. In that case, and are estimated by using Equations (2) and (3), respectively, with and , where
| (5) |
with and . (Note that in Equations (2) and (3), we do not center Xi1 and Xi2 before multiplying them by their respective weights, as is usually done in principal components analysis, because doing so has no effect on and .)
The classical measurement error (CME) method is used to handle errors in predictors in environmental and nutritional epidemiology and in econometrics, but also has roots in classical test theory for psychometric tests (see Bollen14). The method is the same as the simple average method, except that the resulting slope is then divided by an “attenuation factor” (here, λCME) to correct for attenuation due to random measurement error. The attenuation factor is derived under the “classical measurement model” for the relationship between the informants' reports and the true predictor, which assumes that the informants' reports equal the true predictor plus an additive random measurement error that has constant variance, σ2, across informants (see next section for more on the classical measurement model). The formula for the attenuation factor is:
(see Carroll et al.9 for the derivation). The parameters Var(Ti) and σ2 are unknown and, if only one attenuated measure of Ti were available, would have to be determined based on external information from theory or previous studies. However, because we have more than one measurement of Ti, we can estimate λCME using:
where and .15 Then, can be estimated by using Equation (2) with , and
In situations that violate the assumptions of the classical measurement model, can be very small (in which case may be greater than 1 in magnitude) or can even be negative (in which case the classical measurement error method fails because and should not and cannot, respectively, be calculated).
It is possible to analytically compare some, but not all, of the aforementioned estimators for β. For example, it is easy to show that
| (6) |
(see eAppendix 1A [http://links.lww.com] for a proof).
Simulation Experiments
Setup
We conducted simulation experiments to compare how the five approaches perform in terms of bias and mean squared error (MSE) for multiple informant data generated from a rounded version of the congeneric measurement model first described by Joreskog.16 This model for the relationship between the multiple informants' reports and the true predictor takes the form:
| (7) |
where i indexes the individuals being studied, j indexes the informant types (e.g., mother = 1, child = 2), and ∈ij is a random measurement error that comes from a distribution with mean 0 and variance . The parameter reflects the variation in measurement reliability across informant types. In contrast, the parameters δj and λj reflect the variation in measurement validity across informant types, with δj and λj representing, respectively, additive and multiplicative systematic measurement bias. (Note that λj reflects only multiplicative bias because Ti is centered by its mean, ETi, before multiplying by λj.) The model in (7) is similar to other models that have been proposed for multiple informant data in the context of psychiatric research13 and marketing research,17,18 except that the above model takes the form of a unidimensional rather than multidimensional factor analysis model and also involves rounding to reflect our focus on a true predictor that is a mixture of zeros and continuously-distributed positive values. Finally, note that the “classical measurement model” is a variant of the model in Equation (7) in which there is no rounding and where δ1 = δ2 = 0, λ1 = λ2 = 1, and .
In our simulation experiments, we investigated how the bias and MSE of the approaches described above vary with different values of the parameters in the model in (7): −0.50, −0.25, 0, 0.25 or 0.50, for δ1 and δ2; 0.40, 0.75, 1, 1.10, and 1.33 for λ1 and λ2; and 0.0, 0.1, or 0.5 for and . We chose these values either because they were realistic based on actual applications19 of previously-proposed models17,18 for multiple informant data or because they could aid us in understanding how the other parameters affected results (e.g., ). We also made the following assumptions about the variables in the measurement model (Equation 7) and regression model (Equation 1): Cor(Ti, ∈ij) = 0; Cor(∈i1, ∈i2) = 0; and Cor(∈ij, εi) = 0 (referred to as non-differential measurement error). Further, we assumed that α = 1, β = 1, and , with the value of β based on the physical activity and BMI example. We generated 1000 datasets of size n = 250 for every combination of the above parameter values; then used the single informant method (arbitrarily based on informant 1) and the simple average, optimal weighted average, principal components analysis, and classical measurement error methods (by definition, based on both informants) to calculate and for each dataset; and finally used the resulting estimates and true values of β (= 1) or ρ (= 0.5) to calculate the average bias and average MSE. Readers interested in the details of the simulation experiments can consult eAppendix 1B (http://links.lww.com), which describes the simulation experiments more fully, and can download the code for the simulation experiments (eAppendix 2, http://links.lww.com).
Results
Tables 1 and 2 summarize how changes in the measurement model parameters affect the average percent bias in and , respectively, for each method. In addition, readers interested in seeing the method-specific average bias and MSE for each combinations of measurement model parameters can view the figures in eAppendix 1B (http://links.lww.com) or download the results from the simulation experiment (eAppendix 3, http://links.lww.com).
Table 1.
Effect of Rounded Congeneric Measurement Model Parameters on Average Percent Bias in
| Parameter Valuesa | Range of Average Percent Bias in | ||||
|---|---|---|---|---|---|
| Single Informantb | Simple Average | Optimal Weighted Average | Principal Components Analysis | Classical Measurement Error | |
| Defaulta parameter values | 0% | 0% | 0% | 0% | 0% |
| δ1 < 0, δ2 < 0 | −4% to −2% | −4% to −2% | −4% to −2% | −4% to −2% | −4% to −2% |
| δ1 > 0, δ2 < 0 | 0% | 0% | 0% | 0% | 6% to 54%d |
| δ1 > 0, δ2 > 0 | 0% | 0% | 0% | 0% | 0% to 2% |
| λ1 < 1, λ2 < 1 | 0% | 0% | 0% | 0% | 0% to 6% |
| λ1 > 1, λ2 < 1 | 0% | 0% | 0% | 0% | 2% to 18% |
| λ1 > 1, λ2 > 1 | 0% | 0% | 0% | 0% | 0% |
| , | −32% to −10% | −20% to −4% | −20% to −4% | −20% to −4% | 0% |
| δ1 < 0, δ2 < 0 and λ1 < 1, λ2 < 1 | −8% to 0% | −8% to 0% | −8% to 0% | −8% to 0% | −8% to 20% |
| δ1 > 0, δ2 < 0 and λ1 < 1, λ2 < 1 | 0% | −2% to 0% | 0% | −2% to 0% | 18% to >100%c,d |
| δ1 > 0, δ2 > 0 and λ1 < 1, λ2 < 1 | 0% | 0% | 0% | 0% | 0% to 18% |
| δ1 < 0, δ2 < 0 and λ1 > 1, λ2 < 1 | −4% to −2% | −6% to 0% | −4% to 0% | −4% to −2% | −2% to 48% |
| δ1 > 0, δ2 < 0 and λ1 > 1, λ2 < 1 | 0% | 0% | 0% | 0% | 14% to >100%c,d |
| δ1 > 0, δ2 > 0 and λ1 > 1, λ2 < 1 | 0% | 0% | 0% | 0% | 2% to 24% |
| δ1 < 0, δ2 < 0 and λ1 > 1, λ2 > 1 | −4% to −2% | −4% to −2% | −4% to −2% | −4% to −2% | −4% to 0% |
| δ1 > 0, δ2 < 0 and λ1 > 1, λ2 > 1 | 0% | 0% | 0% | 0% | 2% to 36% |
| δ1 > 0, δ2 > 0 and λ1 > 1, λ2 > 1 | 0% | 0% | 0% | 0% | 0% to 2% |
| δ1 < 0, δ2 < 0 and , | −32% to −8% | −22% to −6% | −22% to −4% | −22% to −6% | −4% to 2% |
| δ1 > 0, δ2 < 0 and , | −34% to −10% | −22% to −6% | −20% to −4% | −26% to −6% | 4% to 86%c,d |
| δ1 > 0, δ2 > 0 and , | −34% to −10% | −22% to −6% | −20% to to −4% | −22% to −6% | 0% to 4% |
| λ1 < 1, λ2 < 1 and , | −68% to −16% | −56% to −10% | −54% to −8% | −62% to −10% | 0% to 22%c,d |
| λ1 > 1, λ2 < 1 and , | −28% to >% | −32% to −4% | −26% to −4% | −26% to −4% | 2% to 26%d |
| λ1 > 1, λ2 > 1 and , | −28% to −4% | −18% to −2% | −16% to −2% | −18% to −2% | 8% to 2% |
Parameters δ1 and δ2 (additive bias) can equal: −0.5 or −0.25 (< 0), 0 (default value), 0.25 or 0.5 (> 0).
Parameters λ1 and λ2 (multiplicative bias) can equal: 0.4 or 0.75 (< 1), 1 (default value), 1.1 or 1.33 (> 1).
Parameters and (random error) can equal: 0 (default value), 0.1 or 0.5 (> 0).
Non-specified parameters are equal to default values, which correspond to no measurement bias/error.
Informants 1 and 2 are interchangeable for all methods except SI, which uses Informant 1 only.
The classical measurement error method failed for some datasets.
The classical measurement error method produced correlations greater than 1 in magnitude for some datasets.
Table 2.
Effect of Rounded Congeneric Measurement Model Parameters on Average Percent Bias in
| Parameter Valuesa | Range of Average Percent Bias in | ||||
|---|---|---|---|---|---|
| Single Informantb | Simple Average | Optimal Weight Average | Principal Components Analysis | Classical Measurement Error | |
| Defaulta parameter values | 0% | 0% | 0% | 0% | 0% |
| δ1 < 0, δ2 < 0 | 17% to 35% | 17% to 35% | 17% to 35% | 17% to 35% | 17% to 35% |
| δ1 > 0, δ2 < 0 | 0% | 8% to 18% | 5% to 7% | 7% to 14% | 24% to >100% |
| δ1 > 0, δ2 > 0 | 0% | 0% | −2% to 0% | 0% | 0% to 5% |
| λ1 < 1, λ2 < 1 | 33% to >100% | 33% to >100% | 33% to >100% | 33% to >100% | 33% to >100% |
| λ1 > 1, λ2 < 1 | −20% to −7% | 0% to 36% | 13% to 90% | −6% to 12% | 8% to 72% |
| λ1 > 1, λ2 > 1 | −20% to −7% | −20% to −7% | −20% to −7% | −20% to −7% | −20% to −7% |
| , | −44% to −10% | −24% to −2% | −24% to −2% | −24% to −2% | 8% to 22% |
| δ1 < 0, δ2 < 0 and λ1 < 1, λ2 < 1 | 50% to >100% | 51% to >100% | 51% to >100% | 51% to >100% | 51% to >100% |
| δ1 > 0, δ2 < 0 and λ1 < 1, λ2 < 1 | 33% to >100% | 42% to >100% | 39% to >100% | 41% to >100% | 96% to >100%c |
| δ1 > 0, δ2 > 0 and λ1 < 1, λ2 < 1 | 33% to >100% | 33% to >100% | 30% to >100% | 33% to >100% | 34% to >100% |
| δ1 < 0, δ2 < 0 and λ1 > 1, λ2 < 1 | −12% to 23% | 11% to 83% | −8% to >100% | 4% to 49% | 21% to >100% |
| δ1 > 0, δ2 < 0 and λ1 > 1, λ2 < 1 | −25% to −8% | 0% to 50% | −7% to 71% | −14% to 10% | 33% to >100%c |
| δ1 > 0, δ2 > 0 and λ1 > 1, λ2 < 1 | −26% to −9% | −4% to 34% | −6% to 45% | −11% to 10% | 4% to 91% |
| δ1 < 0, δ2 < 0 and λ1 > 1, λ2 > 1 | −11% to 22% | −10% to 22% | −11% to 22% | −10% to 22% | −10% to 22% |
| δ1 > 0, δ2 < 0 and λ1 > 1, λ2 > 1 | −25% to −9% | −18% to 7% | −21% to −3% | −19% to 4% | −13% to >100% |
| δ1 > 0, δ2 > 0 and λ1 > 1, λ2 > 1 | −25% to -−9% | −25% to −9% | −26% to −9% | −25% to −9% | −25% to −5% |
| δ1 < 0, δ2 < 0 and , | −37% to 19% | −15% to 28% | −15 to 28% | −15 to 28% | 20% to 52% |
| δ1 > 0, δ2 < 0 and , | −53% to −17% | −26% to 7% | −23% to 18% | −36% to 1% | 25% to >100%c |
| δ1 > 0, δ2 > 0 and , | −54% to −18% | −35% to −9% | −36% to −9% | −36% to −9% | 1% to 16% |
| λ1 < 1, λ2 < 1 and , | −68% to 2% | −39% to 47% | −39% to 47% | −54% to 47% | 43% to >100%c |
| λ1 > 1, λ2 < 1 and , | −44% to −15 | −24% to 19% | −32% to 6% | −33% to 4% | 14% to >100% |
| λ1 > 1, λ2 > 1 and , | −45% to −15 | −31% to −9% | −31% to −9% | −31% to −9% | −18% to 10% |
Parameters δ1 and δ2 (additive bias) can equal: −0.5 or −0.25 (< 0), 0 (default value), 0.25 or 0.5 (> 0).
Parameters λ1 and λ2 (multiplicative bias) can equal: 0.4 or 0.75 (< 1), 1 (default value), 1.1 or 1.33 (> 1).
Parameters and (random error) can equal: 0 (default value), 0.1 or 0.5 (> 0).
Non-specified parameters are equal to default values, which correspond to no measurement bias/error.
Informants 1 and 2 are interchangeable for all methods except SI, which uses Informant 1 only.
The classical measurement error method failed for some datasets.
Regarding ρ, the single informant, simple average, optimal weighted average, and principal components analysis estimates follow similar patterns and are often very similar in value. They are never exaggerated (biased away from 0), but in some situations they are attenuated (biased toward 0). This attenuation bias becomes worse as (and , for the simple average, optimal weighted average, and principal components analysis methods) increases, and as λ1 (and λ2, for the simple average, optimal weighted average, and principal components analysis methods) becomes increasingly less than 1. In contrast, the classical measurement error estimates have at most small (< 10%) attenuation bias, but can become very exaggerated (in some cases, to the point where they are greater than 1) and can even fail. The exaggeration bias becomes worse as δ1 and δ2 become increasingly discrepant, and as λ1 and λ2 become increasingly discrepant or become increasingly less than 1. As for comparisons between methods, the optimal weighted average method performs better than (in terms of a smaller magnitude of bias of ) or at least as well as the single informant, simple average, and principal components analysis methods for all combinations of parameters. Optimal weighted average also performs better than classical measurement error for some combinations of parameters, especially when δ1 and δ2 differ considerably or λ1 and λ2 differ considerably. However, classical measurement error performs considerably better than optimal weighted average (and, by extension, the single informant, simple average, and principal components analysis methods) for other combinations of parameters, especially when the difference between δ1 and δ2 is small and both and are large.
Turning to β, the estimates from all five methods generally have a greater magnitude of bias than the corresponding estimates of ρ. The single informant, simple average, optimal weighted average, and principal components analysis estimates of β again follow a similar pattern but, unlike the corresponding estimates of ρ, can be either exaggerated or attenuated. The exaggeration bias becomes worse as δ1 (and δ2, for the simple average, optimal weighted average, and principal components analysis methods) becomes increasingly less than 0, and as λ1 (and λ2, for the simple average, optimal weighted average, and principal components analysis methods) becomes increasingly less than 1. The attenuation bias becomes worse as λ1 (and λ2, for the simple average, optimal weighted average, and principal components analysis methods) becomes increasingly more than 1, and as (and , for the simple average, optimal weighted average, and principal components analysis methods) increases. The classical measurement error estimates of β, like the corresponding estimates of ρ, have at most moderate attenuation bias (< 25%), but can become very exaggerated or even fail. The exaggeration bias becomes worse as δ1 and δ2 become increasingly discrepant or increasingly less than 0, as λ1 and λ2 become increasingly less than 1, and as and increase. The attenuation bias becomes worse as λ1 and λ2 become increasingly more than 1. As for comparisons between methods, each method outperforms all of the other methods for some combinations of parameters, as can be seen in Table 3.
Table 3.
Best-performinga Method(s) for Various Combinations of Rounded Congeneric Measurement Model Parameters
| Parameter Values | Single Informantb | Simple Average | Optimal Weight Average | Principal Components Analysis | Classical Measurement Error |
|---|---|---|---|---|---|
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 0.40; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 1.33; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0; δ2 = 0; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = −0.50; δ2 = −0.50; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0.50; δ2 = −0.50; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = −0.50; λ1 = 0.40; λ2 = 1.33; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = −0.50; λ1 = 0.40; λ2 = 1.33; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 0.40; λ2 = 1.33; = 0.0; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 0.40; λ2 = 1.33; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.0 | X | X | X | X | X |
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.0 | X | ||||
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | X | |||
| δ1 = −0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | X | |||
| δ1 = 0.50; δ2 = −0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = −0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = 0.50; δ2 = 0.50; λ1 = 1.00; λ2 = 1.00; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 1.33; ; σ2 = 0.0 | X | X | |||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 1.33; σ2 = 0.0 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 0.40; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.5 | X | X | |||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 1.33; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 0.40; σ2 = 0.5 | X | X | |||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 0.40; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 0.40; λ2 = 1.33; ; σ2 = 0.5 | X | ||||
| δ1 = 0; δ2 = 0; λ1 = 1.33; λ2 = 1.33; ; σ2 = 0.5 | X |
X indicates method with smallest magnitude of bias for given parameters. Multiple Xs per row indicate ties.
Informants 1 and 2 are interchangeable for all methods except single informant, which uses Informant 1 only.
Vigorous Physical Activity and Body Mass Index Example
To illustrate the five approaches, we consider a validation study20 used to design a larger study7 of the relationship between physical activity and obesity in children in two towns of Mexico City in 1996. In the validation study, which included 114 ten- to fourteen-year-old students, two informants (child and mother) completed a questionnaire designed to assess the child's physical activity and inactivity. Physical activity was also assessed by 24-hour recall, which was provided by the child in an interview with a trained nutritionist on three separate days; the three measurements of physical activity were averaged, with different weights for weekend days versus weekdays. Here, we focus on hours of daily vigorous physical activity (Ti) as a predictor for body mass index (Yi). We treat the weighted average of the 24-hour recall measurements of vigorous physical activity as the gold standard measure of vigorous physical activity (Gi). We treat the mother reports of vigorous physical activity as Xi1 and the child reports of vigorous physical activity as Xi2, and we use them to estimate β and ρ. For the sake of simplicity, we do not include any of the available covariates (e.g., child sex, age, etc.) in our analyses.
We focus on cases with complete data (n = 81). Figure 2 displays frequency plots for 24-hour recall and mother and child report of vigorous physical activity, as well as scatterplots of BMI versus those measures. For 24-hour recall, mother report, and child report, the means (variances) are 0.48 (0.40), 0.80 (0.46), and 1.00 (0.91), respectively, and the percentage of values that are 0 are 43%, 9%, and 1%, respectively. Both mother and child, and especially the child, overestimate vigorous physical activity relative to 24-hour recall. The Pearson correlations of the mother and child report with 24-hour recall are 0.41 and 0.24, respectively, and the Pearson correlation of the mother and child report with each other is 0.23. These numbers suggest that the mother report may better capture the relationship between vigorous physical activity and BMI.
Figure 2.
Frequency and Scatterplots (Versus Body Mass Index (BMI)) for the 24-Hour Recall Gold Standard Measurement and Mother and Child Reports of Vigorous Physical Activity.
Table 1 displays estimates of ρ and β from all five methods applied to the mother and child reports of vigorous physical activity, as well as estimates of ρ and β based on 24-hour recall of vigorous physical activity, the gold standard measurement. The estimates reveal that the optimal weighted average method (which gives weights of approximately 3/4 and 1/4 to the mother report and child report, respectively) produces estimates identical to the gold standard estimates of both ρ and β. The single informant, simple average, and principal components analysis methods all underestimate both ρ and β; the next-closest estimates to the gold standard estimates are given by the simple average method (which, of course, weights both reports equally), followed by the single informant method based on mother report only, then the principal components analysis method (which gives weights of approximately 1/4 and 3/4 to the mother report and child report, respectively), and then the single informant method based on child report only. Last, the classical measurement error method overestimates both ρ and β by a considerable amount. These results show that the mother report is a better proxy for 24-hour recall of vigorous physical activity in terms of ability to predict BMI, but that combining mother report with child report (via the optimal weighted average or simple average methods) produces better estimates than using mother report alone.
Conclusions
We have described five methods of estimating the regression or correlation coefficient for the effect of a predictor on a continuous outcome, when that predictor cannot be observed directly but is reported on by two informants. We then compared the performance of these methods in situations where the true predictor is a mixture of zeros and continuously-distributed positive values.
Regarding the correlation coefficient, the simulation experiments suggest that estimates obtained by the single informant, simple average, optimal weighted average, and principal components analysis methods become attenuated when random error is present – unsurprisingly given that these methods do not correct for attenuation due to random measurement error. However, these estimates are relatively unaffected by additive or multiplicative measurement bias on their own. (Without the rounding in the congeneric measurement model, the estimates would be completely unaffected by additive or multiplicative bias.) In contrast, the classical measurement error method assumes that the measurements contain random error, but not additive or multiplicative bias. Unsurprisingly, then, the classical measurement error estimates become exaggerated (or even fail) when the additive or multiple bias differs between informants, because the second term in the numerator of becomes too large. Also unsurprisingly, then, the classical measurement error estimates are unaffected by random error on its own since the method corrects for attenuation due to random error. In terms of comparisons between methods, the optimal weighted average estimates are never more biased than the single informant, simple average, and principal components analysis estimates, which is to be expected given that the optimal weighted average estimates weight the two informants optimally. For instance, in the physical activity and BMI example, the optimal weighted average estimates give more weight to the better informant (the mother), in contrast to the simple average estimates, which weights both informants equally, and the principal components analysis estimates, which give more weight to the worse informant (the child) because it has more variance. The single informant estimate (based on mother) gives all the weight to the better informant, but suffers more from attenuation due to random measurement error because it is based on only one informant. Finally, the optimal weighted average estimates can be more biased than the classical measurement error estimates when the random error is large and the additive and multiplicative bias do not differ greatly between informants.
Estimates of the regression coefficients are typically more biased than corresponding estimates of the correlation coefficients. The effects of random measurement error and additive and multiplicative measurement bias on the estimates are similar to those described in the preceding paragraph, with two exceptions. First, for all the methods, upward or downward multiplicative bias results in attenuated or exaggerated estimates, respectively, because multiplicative bias changes the scale of the predictor. Second, for the classical measurement error method, random error on its own results in small-to-moderate exaggeration in the estimates of the regression coefficients, which is due to the rounding in the congeneric measurement error model. (Without the rounding, the classical measurement error estimates would be unbiased.) In terms of comparisons among methods, each method performs best in some situations.
Overall, when there are only two informants, the simple average method is a reasonable choice. The simple average method, although not always optimal, rarely performs much worse than the single informant, optimal weighted average, and principal components analysis methods, and often performs similarly to the most optimal of those methods, as in the physical activity and BMI example and as found in previous studies.21 Further, with the simple average method, it is straightforward to compare results across samples because this method uses the same (equal) weights for informants in every sample. Although the classical measurement error method performs better than the other methods (including the simple average method) when there is a large amount of random error and little difference in additive or multiplicative bias between informants, it should be avoided in other situations because the estimate can be very exaggerated (as in the physical activity and BMI example).
Of course, it is preferable to have more than two informants, or validation data, because this allows the use of more sophisticated methods such as latent variable models (e.g., factor analysis) or extensions of the classical measurement error method that allow for measurement bias or correlated measurement errors (see Cheng and Van Ness15).
Supplementary Material
Table 4.
Results for Vigorous Physical Activity as a Predictor of Body Mass Index
| Estimate based on 24-hour recall of vigorous physical activity: | −0.18 | −1.0 |
| Estimates based on mother and child report of vigorous physical activity: | ||
| Single informant method (mother report only) | −0.17 | −0.86 |
| Single informant method (child report only) | −0.11 | −0.39 |
| Simple average method (both reports) | −0.17 | −0.90 |
| Optimal weighted average method (both reports) | −0.18 | −1.0 |
| Principal components analysis method (both reports) | −0.13 | −0.59 |
| Classical measurement error method (both reports) | −0.28 | −2.8 |
Acknowledgments
We thank Bernardo Hernández for generously allowing the use of his data on vigorous physical activity and body mass index, and Matt Vendlinski for helpful comments on an earlier draft of this manuscript.
Financial Support Support for the first author and third authors was provided by NIH grant 5R01-MH054693-11, and additional support for the first author was provided by NIH grants P50-MH069315-04, P50-MH069315-05, P50-MH084051-01, and R01-MH43454. Support for the second author was provided by NIH grant R01-MH59785.
Footnotes
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Achenbach TM, McConaughy SH, Howell CT. Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychological Bulletin. 1987;101:213–32. [PubMed] [Google Scholar]
- 2.Achenbach TM, Krukowski RA, Dumenci L, et al. Assessment of adult psychopathology: meta-analyses and implications of cross-informant correlations. Psychological Bulletin. 2005;131:361–82. doi: 10.1037/0033-2909.131.3.361. [DOI] [PubMed] [Google Scholar]
- 3.Fitzmaurice GM, Laird NM, Zahner GEP, et al. Bivariate logistic regression analysis of childhood psychopathology ratings using multiple informants. American Journal of Epidemiology. 1995;142:1194–203. doi: 10.1093/oxfordjournals.aje.a117578. [DOI] [PubMed] [Google Scholar]
- 4.Kuo M, Mohler B, Raudenbush SL, et al. Assessing exposure to violence using multiple informants: application of hierarchical linear model. Journal of Child Psychology and Psychiatry. 2000;41:1049–1056. [PubMed] [Google Scholar]
- 5.Goldwasser MA, Fitzmaurice GM. Multivariate linear regression analysis of childhood psychopathology using multiple informant data. International Journal of Methods in Psychiatric Research. 2001;10:1–10. [Google Scholar]
- 6.Horton NJ, Laird NM, Zahner GEP. Use of multiple informant data as a predictor in psychiatric epidemiology. International Journal of Methods in Psychiatric Research. 1999;8:6–18. [Google Scholar]
- 7.Hernández B, Gortmaker SL, Colditz GA, et al. Association of obesity with physical activity, television programs and other forms of video viewing among children in Mexico City. International Journal of Obesity. 1999;23:845–854. doi: 10.1038/sj.ijo.0800962. [DOI] [PubMed] [Google Scholar]
- 8.O'Malley JA, Landon BE, Guadagnoli E. Analyzing multiple informant data from an evaluation of the Health Disparities Collaboratives. Health Services Research. 2007;42:146–164. doi: 10.1111/j.1475-6773.2006.00597.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. Chapman & Hall/CRC; London: 1995. [Google Scholar]
- 10.Piacentini JC, Cohen P, Cohen J. Combining discrepant diagnostic information from multiple sources: Are complex algorithms better than simple ones? Journal of Abnormal Child Psychology. 1992;20:51–63. doi: 10.1007/BF00927116. [DOI] [PubMed] [Google Scholar]
- 11.van Bruggen GH, Lilien GL, Kacker M. Informants in organizational marketing research: Why use multiple informants and how to aggregate responses. Journal of Marketing Research. 2002;39:469–478. [Google Scholar]
- 12.Allen JP, Kuperminc G, Philliber S, et al. Programmatic prevention of adolescent problem behaviors: The role of autonomy, relatedness, and volunteer service in the Teen Outreach Program. American Journal of Community Psychology. 2005;22:617–638. doi: 10.1007/BF02506896. [DOI] [PubMed] [Google Scholar]
- 13.Kraemer HC, Measelle JR, Ablow JC, et al. A new approach to integrating data from multiple informants in psychiatric assessment and research: mixing and matching contexts and perspectives. American Journal of Psychiatry. 2003;160:1566–1577. doi: 10.1176/appi.ajp.160.9.1566. [DOI] [PubMed] [Google Scholar]
- 14.Bollen KA. Structural equations with latent variables. John Wiley & Sons; New York: 1989. [Google Scholar]
- 15.Cheng CL, Van Ness J. Statistical Regression with Measurement Error. Arnold Publishers; London: 1999. [Google Scholar]
- 16.Jöreskog KG. Statistical analysis of sets of congeneric test. Psychometrica. 1971;36:109–133. [Google Scholar]
- 17.Anderson JC. A measurement model to assess measurement-specific factors in multiple-informant research. Journal of Marketing Research. 1985;22:86–92. [Google Scholar]
- 18.Phillips LW. Assessing measurement error in key informant reports: A methodological note on organizational analysis in marketing. Journal of Marketing Research. 1981;18:395–415. [Google Scholar]
- 19.Kumar A, Dillon WR. On the use of confirmatory measurement models in the analysis of multiple informant reports. Journal of Marketing Research. 1990;27:102–111. [Google Scholar]
- 20.Hernández B, Gortmaker SL, Laird NM, et al. Validity and reproducibility of a physical activity and inactivity questionnaire for Mexico City's schoolchildren. Salud Publica de Mexico. 2000;42(4):315–323. [PubMed] [Google Scholar]
- 21.Schmidt FL. The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement. 1971;31:699–714. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


