Abstract
Many applications of biomedical science involve unobservable constructs, from measurement of health states to severity of complex diseases. The primary aim of measurement is to identify relevant pieces of observable information that thoroughly describe the construct of interest. Validation of the construct is often performed separately. Noting the increasing popularity of latent variable methods in biomedical research, we propose a Multiple Indicator Multiple Cause (MIMIC) latent variable model that combines item reduction and validation. Our joint latent variable model accounts for the bias that occurs in the traditional 2-stage process. The methods are motivated by an example from the Physical Activity and Lymphedema clinical trial in which the objectives were to describe lymphedema severity through self-reported Likert scale symptoms and to determine the relationship between symptom severity and a “gold standard” diagnostic measure of lymphedema. The MIMIC model identified 1 symptom as a potential candidate for removal. We present this paper as an illustration of the advantages of joint latent variable models and as an example of the applicability of these models for biomedical research.
Keywords: Factor analysis, Latent variable models, Lymphedema, Multiple Indicator Multiple Cause models
1. INTRODUCTION
Concepts such as quality of life, pain, or symptom severity cannot be measured by a blood test. These unobservable concepts must be quantified and measured by combining several relevant pieces of information. Measurement is frequently carried out through a scale administered to the study participant, which is summed to produce a score. From a statistical perspective, the goal of measurement is often to combine important pieces of information in a way that thoroughly describes an unobservable construct. In the scale development process, item selection or reduction is an important step in determining which items best describe the construct under study. It is undesirable to include “junk” items, those items that do not contribute useful information about the hypothetical construct, because they obscure the final scale score. The process of removing unnecessary items ultimately improves scale accuracy, reduces burden to participants, and decreases research costs. In this manuscript, we propose a joint latent variable model to differentiate items that validate well from those that do not.
Methods for item reduction originated in psychometrics and education where constructs such as depression and intelligence provided motivation for the development of measurement techniques to quantify and explain these entities via multiple items in a scale. The literature in these fields offers a variety of methods, many involving latent variables, that utilize the correlation among items to define a construct. Classical test theory (CTT) is a widely used item selection method that investigates the correlation among the items. Although there are several metrics in CTT used to judge items, item-total correlations and Cronbach's alpha are most common (, nunnally1967psychometric, clark1995constructing). Item-total correlation measures the correlation of a particular item with the scale total when that item is omitted from the scale. High item-total correlations are desired, and it is advised that items whose item-total correlation is less than ρ = 0.2 be dropped from the scale (Streiner and Norman, 1994). Pearson's product-moment correlation is most typically used to calculate item-total correlation, and this technique is intended for continuous data. However, it is commonly used for analysis of categorical data, and some argue that this type of correlation is robust enough to handle this type of data (Havlicek and Peterson, 1977). Cronbach's alpha is another measure of reliability, and it is advised that alpha be between 0.7 and 0.9. Values of alpha below 0.7 indicate that the items may not be homogeneous, and values above 0.9 can signify several problems with the scale, such as item redundancy or the presence of more than 1 distinct construct.
Perhaps the most ubiquitous statistical model for item reduction for continuous data is the factor analysis model (Spearman, 1904). This model separates the variance among a set of observed variables into variance due to a common factor, or unobserved latent construct, and variance due to the individual observed variables. Extensions to the classical factor analysis model have been formulated for categorical data (Christoffersson, 1975), (Muthén, 1978). Item reduction in factor analysis is accomplished by evaluating rotated factor loadings across multiple factors or evaluating the loadings within a particular factor. Models with more than 1 factor are not unique, therefore decisions about which items should be included in a factor can be ambiguous. Factor loading parameters provide information about the strength of the relationship between an individual item and the underlying construct.
An extension of unidimensional factor analysis for dichotomous or ordered categorical items, item response theory (IRT), is widely used in educational testing and is becoming more prevalent in health measurement (Rasch, 1993). IRT models use a latent variable framework to explain the probability of “correctly” answering test items. Similar to factor loadings for continuous outcome factor analysis, discrimination parameters in IRT models are often used as metrics for item reduction.
Once the selection of important items is complete, a process of validation is essential to determine how well the scale measures the intended construct. Validity in scale development can be established in a number of ways, including comparing the proposed construct or score to other constructs or comparing a single construct on different samples. In our example, criterion validity was desired, which is described as the correlation of a scale with another measure of the trait, such as a “gold standard” that has been previously studied or is accepted in the field (Streiner and Norman, 1994). Although item selection and validation are frequently performed in 2 separate stages in biomedical applications, Sammel and Ryan (1996) illustrated how the 2-stage procedure ignores additional measurement error inherent in the estimation of the latent factor. In comparison to other approaches such as a multivariate linear mixed model, there is evidence that a 2-stage approach leads to biased estimates of the association between the latent variable and the gold standard.
Multitrait multimethod (MTMM) models offer an alternative approach to evaluating measurement error (Campbell and Fiske, 1959). The most basic method, the MTMM matrix, consists of correlations of several concepts or traits measured by each of several methods. This correlation matrix provides estimates of reliability for each trait and method pair as well as estimates of validity for different measures of the same trait. While the MTMM matrix allows for useful estimates of reliability and validity, problems with the MTMM matrix include the lack of rigorous statistical tests associated with the correlation coefficients and the inability of the MTMM matrix to separate method variance from random error. Analysis of variance (ANOVA) and latent variable models like confirmatory factor analysis were proposed to deal with these issues. ANOVA partitions variance into groups defined by person, method, and trait and provides a global estimate of each type of variance (Guilford, 1954). Repeated measures are needed to estimate trait–method interactions. The confirmatory factor analysis model provides estimates of correlation among observed traits and methods through factor loadings (Werts and Linn, 1970). The model contains several components: a trait component, a method component, and a random error component. Overall, MTMM models provide an additional approach to assessing validity and can be useful in evaluating a gold standard.
The literature in psychometric and education research is rich with latent variable models that can be used for item selection and validation, and they are becoming increasingly utilized in biomedical research. Rabe-Hesketh and Skrondal (2007) introduced and provided examples of the following classical latent variable models for the biomedical audience: factor analysis, IRT, latent class models, and structural equation models. In addition to classical models, unifying model frameworks bridge analysis techniques familiar to biomedical researchers, like mixed-effects models, and more sophisticated latent variable models, like multilevel factor analysis (Skrondal and Rabe-Hesketh, 2004), (Muthén and Muthén, 1998). Unifying frameworks illustrate the relationship between the classical factor analysis model and IRT models and show, for example, that the factor analysis model for binary data can also be specified as a 2-parameter IRT model. Although the interpretation of factor analysis and IRT models may differ, the unifying latent variable model framework elucidates the statistical theory of the models, making them easier to understand and apply.
In addition to introducing the biomedical arena to latent variable models, recent literature also focuses on more sophisticated latent variable models that allow for not only multiple observed outcome types but also multiple latent variable types. General structural equation models accommodate many types of multivariate outcomes, including continuous, ordered categorical, and dichotomous (Muthén, 1984). Sammel and others (1997) showed that these models may also accommodate a mixture of outcome types. Classical latent variable models such as factor analysis, growth models, and latent class models motivated more complex latent variable hybrids such as factor mixture models and growth mixture models that allow for both continuous and categorical latent variables in a single model (Muthén, 2008). Fortunately, standard software is available (e.g. the CALIS procedure in SAS; the GLLAMM program in Stata, Rabe-Hesketh and others, 2004; the SEM package in R, Fox, 2006; and Mplus, Muthén and Muthén, 1998) to fit these complex latent variable models.
To explore the efficiency of performing item selection and validation in 1 model, we employed a MIMIC model (Jöreskog and Goldberger, 1975). As in factor analysis, factor loading estimates from the MIMIC model provide information on the strength of the association with the latent construct. Information about the relationship between the items and the validation metric is revealed through a regression parameter on the latent variable, which assesses the association between the validation metric and the latent variable. An extension of this model for categorical items assumes that ordinal items originate from underlying unobserved continuous, normally distributed items and relates the observed items with the underlying unobserved items through a series of threshold relationships (Muthén, 1984). Validation is incorporated directly into model development as opposed to CTT, IRT, or simple factor analysis models in which validation generally occurs in a separate step.
Data from the Physical Activity and Lymphedema (PAL) clinical trial were considered as a motivating example (Schmitz, Troxel, and others, 2009). The MIMIC model allowed for both the identification of lymphedema symptoms that were ultimately important contributors to a latent measure of lymphedema severity and the comparison of this core set of symptoms to a gold standard diagnostic measure. The primary aim was to identify important and clinically relevant symptoms and to demonstrate that this set of symptoms had a strong association with the gold standard arm volume difference.
The remainder of the article is outlined as follows. Section 2 illustrates the MIMIC model and its formulation for Likert scale data, Section 3 describes the PAL clinical trial, Section 4 presents results of the example, and in Section 5, we draw some conclusions.
2. METHODS
The MIMIC model consists of a system of structural equations that include both observed indicators and observed causes of a hypothesized latent variable. Observed indicators are random variables assumed to have been generated by the latent outcome, similar to the items or manifest variables in a factor analysis. Observed causes are either fixed or random variables that influence the latent outcome, like covariates in a regression setting. The single latent variable is measured by the set of observed indicators and is regressed on the set of observed causes (, zellner1970estimation, hauser1971treatment, sammel1996latent). Sammel and Ryan (2002) demonstrate that a test of the regression parameter, which measures the association between the latent variable and the gold standard, is a global test of the observed cause on all observed indicators. In the motivating example, the MIMIC model was formulated such that the single latent variable is interpretable as a continuum of lymphedema severity. As a means of comparison, lymphedema symptoms were analyzed using standard psychometric techniques, such as item-total correlation and Cronbach's alpha. The relationship between individual symptoms and the gold standard diagnostic measure of lymphedema was assessed with cumulative probit models.
2.1. MIMIC model specification
There are 2 components to the general MIMIC model: a measurement model that specifies the factor analysis model and a structural model that specifies the regression of the latent variable on the observed causes (see Figure 1). Consider a sample of n individuals from whom a set of m ordinal Likert scale outcome measurements (items), yi1,…,yim, and 1 continuous validation measurement (gold standard), zi, are taken. The ordinal items are assumed to have originated from continuous normally distributed items yi1*,…,yim*. The MIMIC model is written as follows. For subject i = 1,…,n and outcome measurements j = 1,…,m, the measurement model is specified as
| (2.1) |
where μ is a vector of means, λ is a vector of factor loading parameters, bi is the latent variable (lymphedema severity), and ϵi is a vector of specific variances. In the classical MIMIC model yi* is assumed to be directly observed, but here, it is unobserved. Instead, the relationship between unobserved yi* and observed yi is specified by
![]() |
where the τ's are threshold parameters defining category intervals on yi*. The structural model with a single gold standard is specified as
| (2.2) |
where β is the regression coefficient measuring the association between the latent variable and the gold standard. The random variable δi is an error term for the latent variable. Error terms in the measurement model are assumed to be uncorrelated with the observed causes in the structural model and with the error term in the structural model. Additionally, the error term in the structural model is uncorrelated with the observed causes. We assumed Var(δ) = 1, allowing the latent variable to be interpreted on a standardized scale. Estimation of the model was performed using the mean and variance adjusted weighted least squares estimator and Delta parameterization in Mplus Version 6.1. Example code for the ordinal version of the MIMIC model is provided in the supplementary material available at Biostatistics online.
Fig. 1.
Path diagram for general MIMIC model with m indicators and 1 observed cause.
3. EXAMPLE
A prominent fear for many breast cancer survivors, lymphedema is a debilitating chronic disease that results from surgical excision of lymph nodes as part of breast cancer treatment. In addition to swelling, lymphedema can cause skin changes, reduction of limb function, loss of sensation as well as depression, decreased quality of life, decreased physical self-esteem, and other physical and psychological morbidities (Ahmed and others, 2008), (Cormier and others, 2009), (Shih and others, 2009).
Until recently, women at risk for or diagnosed with lymphedema have been encouraged to limit physical activity, even such mundane tasks as lifting grocery bags. This guideline has the effect of inhibiting everyday activities and may even slow physical recovery from cancer. However, results of the PAL trial contradicted these guidelines, showing that a progressive weight-training program was safe for breast cancer survivors. Among women diagnosed with lymphedema, the trial showed that a weight-training intervention was not only safe in that it did not significantly affect lymphedema severity, but it also was shown to effectively reduce the number and severity of arm and hand symptoms and the incidence of lymphedema exacerbations (Schmitz and others, 2009). A subsequent follow-up study indicated that in breast cancer survivors who were at risk for lymphedema but who had not yet been diagnosed, a much larger group, the same weight-training intervention was not associated with an increased incidence of lymphedema (Schmitz and others, 2010). That weight-lifting was shown to be safe for both women at risk for lymphedema and women already diagnosed with lymphedema was revolutionary as great benefits such as increased muscle strength, decreased weight gain, and increased quality of life have been shown to result from increased physical activity (Schmitz, Courneya, and others, 2010).
Incidence of lymphedema in the literature varies wildly with estimates between 6% and 70% (Schmitz, Courneya, and others, 2010). One factor attributed to the difference is the criteria used for diagnosis of lymphedema. Several diagnostic measures include: water displacement volumetry, extracellular water in the arm measured by multifrequency bioelectrical impedance analysis, and serial circumference measurements and truncated cone volumetry. Initial evidence indicates that self-reported lymphedema symptoms can be as useful as objective diagnostic measures in discriminating women with lymphedema from those without (Norman and others, 2001). Self-report of symptoms could be a useful measure in diagnosing lymphedema because patients are more aware of acute changes in swelling, skin tone, and function. Furthermore, it is argued that it is important to take into account patient pain or distress in the diagnosis of lymphedema and that swelling alone is not sufficient for diagnosis. We explored the relationship among symptoms and the volume difference to determine the utility of this information in summarizing lymphedema severity.
Outcome measures in our analysis included self-reported lymphedema symptoms and an objective measure of lymphedema severity. The self-reported symptoms were measured using the validated Norman Lymphedema Survey (Norman and others, 2001). The severity of 13 symptoms was assessed: rings too tight, watch too tight, bracelets too tight, clothing too tight, puffiness, knuckles not visible, veins not visible, skin feels leathery, arm feels tired, pain, pitting, swelling after exercise, and difficulty writing. Symptoms were measured on a 5-point Likert scale with responses ranging from 0 (no symptom) to 4 (very severe). Water displacement volumetry was chosen as the gold standard and was measured as the percent difference in volume between the lymphedema-affected versus unaffected arms.
Exploratory factor analysis for ordinal data revealed 3 distinct factors from the Norman Lymphedema Survey items. One symptom, swelling after exercise, did not load strongly on any factor. This symptom was chosen as a junk item, potentially eligible for removal from the scale. The factor representing tissue, swelling, and function was chosen for further investigation. This factor included the following symptoms: clothing too tight, puffiness, skin feels leathery, pain, and difficulty writing. In the MIMIC model, Norman lymphedema symptoms were the observed indicators and volume difference was the observed cause.
4. RESULTS
Data for the example came from a subset of the PAL trial. The sample composed of n = 141 women diagnosed with lymphedema at baseline. The average volume difference, defined as the percent difference in arm volume between affected and unaffected arms, for the sample was 16.11% (95% confidence interval (13.59–18.65). Figure 2 illustrates the association between individual symptom response categories and the average percent volume difference along with the sample size. For clothing too tight, puffiness, skin feels leathery, and arm feels tired, there was a general increase in mean volume difference with increasing levels of symptom severity. Note that because of the small sample sizes in response category very severe for pain and difficulty writing (n = 2 and n = 3, respectively), responses severe and very severe were combined. There did not appear to be a trend in the association between mean volume difference and symptom severity for pain, difficulty writing, or swelling after exercise.
Fig. 2.
Plot of mean volume difference with error bars by symptom.
Polychoric correlations among the symptoms in the swelling/function factor were generally moderate to strong, ranging from ρPain, Clothing = 0.222 to ρPuffiness, Clothing = 0.742. Correlations among the junk item and other items were much smaller, ranging from ρPuffiness, Swelling = − 0.002 to ρWriting, Exercise = 0.220. These correlations provided initial evidence that swelling after exercise could be a candidate for removal since it did not correlate highly with the other items. Correlations among all symptoms are available in the supplementary material at Biostatistics online.
Standard psychometric techniques also identified swelling after exercise as potential junk (see Table 1). Item-total correlations when all items were included in the scale ranged from ρ = 0.111 for swelling after exercise to ρ = 0.666 for arm feels tired. According to the ρ = 0.2 guideline (Streiner and Norman, 1994), swelling after exercise could potentially be considered for removal. The overall Cronbach's alpha assuming all 7 items in the scale was α = 0.727, indicating that the removal of swelling after exercise is unwarranted.
Table 1.
Standard psychometric measures
| Symptom | Item-total correlation | Alpha |
| Clothing too tight | 0.474 | 0.687 |
| Puffiness | 0.636 | 0.647 |
| Skin feels leathery | 0.495 | 0.682 |
| Arm feels tired | 0.666 | 0.639 |
| Pain | 0.382 | 0.709 |
| Difficulty writing | 0.358 | 0.715 |
| Swelling after exercise | 0.111 | 0.769 |
Results from univariate probit models are featured in Table 2. According to these models, there was a statistically significant relationship between volume difference and increasing item severity for clothing too tight and puffiness (p < 0.001 for both). In both of these models, the cumulative probability starting at the severe end of the scale increased with higher levels of volume difference (βClothing too tight = 0.030, standard error (SE) = 0.006, βPuffiness = 0.029, SE = 0.007). In other words, symptom severity for clothing too tight and puffiness tended to be more intense as volume difference increased. All other items identified in the exploratory model as well as the junk item were not significantly associated with volume difference when considered individually.
Table 2.
Univariate cumulative probit models
| Symptom | Estimate | SE | p |
| Clothing too tight | |||
| Intercept 1 | 0.356 | 0.150 | 0.017 |
| Intercept 2 | 0.723 | 0.153 | < 0.001 |
| Intercept 3 | 1.746 | 0.199 | < 0.001 |
| Intercept 4 | 2.797 | 0.396 | < 0.001 |
| Beta | 0.030 | 0.006 | < 0.001 |
| Puffiness | |||
| Intercept 1 | – 0.503 | 0.156 | 0.001 |
| Intercept 2 | 0.106 | 0.146 | 0.467 |
| Intercept 3 | 1.260 | 0.176 | < 0.001 |
| Intercept 4 | 2.363 | 0.259 | < 0.001 |
| Beta | 0.029 | 0.007 | < 0.001 |
| Skin feels leathery | |||
| Intercept 1 | 0.431 | 0.157 | 0.006 |
| Intercept 2 | 0.796 | 0.169 | < 0.001 |
| Intercept 3 | 1.452 | 0.206 | < 0.001 |
| Intercept 4 | 1.940 | 0.255 | < 0.001 |
| Beta | 0.007 | 0.006 | 0.260 |
| Arm feels tired | |||
| Intercept 1 | – 0.394 | 0.151 | 0.009 |
| Intercept 2 | – 0.032 | 0.146 | 0.829 |
| Intercept 3 | 0.963 | 0.159 | < 0.001 |
| Intercept 4 | 1.771 | 0.211 | < 0.001 |
| Beta | 0.007 | 0.006 | 0.243 |
| Pain | |||
| Intercept 1 | 0.138 | 0.148 | 0.352 |
| Intercept 2 | 0.399 | 0.151 | 0.008 |
| Intercept 3 | 1.140 | 0.170 | < 0.001 |
| Intercept 4 | 2.161 | 0.286 | < 0.001 |
| Beta | – 0.002 | 0.006 | 0.751 |
| Difficulty writing | |||
| Intercept 1 | 0.905 | 0.178 | < 0.001 |
| Intercept 2 | 1.014 | 0.180 | < 0.001 |
| Intercept 3 | 1.503 | 0.206 | < 0.001 |
| Intercept 4 | 2.227 | 0.320 | < 0.001 |
| Beta | 0.002 | 0.008 | 0.771 |
| Swelling after exercise | |||
| Intercept 1 | 0.632 | 0.175 | < 0.001 |
| Intercept 2 | 0.847 | 0.192 | < 0.001 |
| Intercept 3 | 1.497 | 0.219 | < 0.001 |
| Intercept 4 | NA | NA | NA |
| Beta | – 0.002 | 0.008 | 0.842 |
When symptoms were considered jointly, factor loadings from the ordinal MIMIC model indicated a strong relationship among the candidate items and the latent measure of lymphedema severity (see Table 3). For these items identified by exploratory factor analysis, factor loadings exceeded 0.5 and range from λPain = 0.512 to λPuffiness = 0.906. All factor loadings for candidate items were statistically significant different from zero (p < 0.001). The factor loading for swelling after exercise was small (λSwelling = 0.145), and the test for the factor loading was not statistically significant (p = 0.132), indicating that this item did not contribute to the underlying latent lymphedema severity. The coefficient for the regression of the latent lymphedema severity on volume difference was β = 0.020 and was statistically significant (p = 0.001). This represents a global test of the significance of the relationship between volume difference and all items in the MIMIC model. The regression coefficient can be interpreted as follows: every 1% change in volume difference corresponds to a β = 0.020 change in latent lymphedema severity on the standard normal scale. In other words, a clinically meaningful change in volume difference of 5% would correspond to a change of β = 0.10, an effect size of 0.1 standard deviation units in the latent measure of lymphedema severity.
Table 3.
Ordinal MIMIC model
| Symptom | Estimate | SE | p |
| Factor loadings | |||
| Clothing too tight | 0.709 | 0.051 | < 0.001 |
| Puffiness | 0.906 | 0.038 | < 0.001 |
| Skin feels leathery | 0.631 | 0.057 | < 0.001 |
| Arm feels tired | 0.779 | 0.038 | < 0.001 |
| Pain | 0.512 | 0.069 | < 0.001 |
| Difficulty writing | 0.526 | 0.075 | < 0.001 |
| Swelling after exercise | 0.145 | 0.096 | 0.132 |
| Regression coefficient | |||
| Beta | 0.020 | 0.006 | 0.001 |
| Thresholds | |||
| Clothing too tight | |||
| Intercept 1 | 0.356 | 0.150 | 0.017 |
| Intercept 2 | 0.723 | 0.153 | < 0.001 |
| Intercept 3 | 1.746 | 0.199 | < 0.001 |
| Intercept 4 | 2.798 | 0.396 | < 0.001 |
| Puffiness | |||
| Intercept 1 | – 0.503 | 0.156 | 0.001 |
| Intercept 2 | 0.106 | 0.146 | 0.467 |
| Intercept 3 | 1.260 | 0.176 | < 0.001 |
| Intercept 4 | 2.363 | 0.259 | < 0.001 |
| Skin feels leathery | |||
| Intercept 1 | 0.431 | 0.157 | 0.006 |
| Intercept 2 | 0.796 | 0.169 | < 0.001 |
| Intercept 3 | 1.452 | 0.206 | < 0.001 |
| Intercept 4 | 1.940 | 0.255 | < 0.001 |
| Arm feels tired | |||
| Intercept 1 | – 0.394 | 0.151 | 0.009 |
| Intercept 2 | – 0.032 | 0.146 | 0.829 |
| Intercept 3 | 0.963 | 0.159 | < 0.001 |
| Intercept 4 | 1.771 | 0.211 | < 0.001 |
| Pain | |||
| Intercept 1 | 0.138 | 0.148 | 0.352 |
| Intercept 2 | 0.399 | 0.151 | 0.008 |
| Intercept 3 | 1.140 | 0.170 | < 0.001 |
| Intercept 4 | 2.161 | 0.286 | < 0.001 |
| Difficulty writing | |||
| Intercept 1 | 0.905 | 0.178 | < 0.001 |
| Intercept 2 | 1.014 | 0.180 | < 0.001 |
| Intercept 3 | 1.503 | 0.206 | < 0.001 |
| Intercept 4 | 2.228 | 0.320 | < 0.001 |
| Swelling after exercise | |||
| Intercept 1 | 0.632 | 0.175 | < 0.001 |
| Intercept 2 | 0.847 | 0.192 | < 0.001 |
| Intercept 3 | 1.497 | 0.219 | < 0.001 |
| Intercept 4 | NA | NA | NA |
5. DISCUSSION
The objective of this paper was to evaluate items under consideration for inclusion in a scale while simultaneously comparing the scale to a gold standard measure. The use of joint latent variable models such as the MIMIC model are advocated over traditional psychometric procedures for several reasons. These models allow for item reduction and validation in 1 model, eliminating the bias inherent in a traditional 2-stage procedure. Joint latent variable models also exploit the correlation among items and provide a global test of significance between items and the gold standard. Finally, it has been shown that removing junk leads to a stronger association between the latent variable and the gold standard (Sammel and others, 1999). With the advent of latent variable packages in standard software and literature introducing model frameworks that bring latent variables into the fray of analysis techniques, the use of latent variable models like the MIMIC model in biomedical research is feasible for many applications.
In terms of the motivating example, standard psychometric techniques, univariate models, and the MIMIC model provided somewhat conflicting results. Item-total correlations identified swelling after exercise as a potential junk item. While psychometric techniques took into account the relationship among symptoms, they did not take into account the relationship between items and gold standard volume difference. Cumulative probit models showed a significant relationship between volume difference and only 2 of the items: clothing too tight and puffiness. These models were useful for describing the relationship between volume difference and item severity scores in the univariate setting but were not able to take into account the correlation among the items and how all the items together are associated with volume difference. The MIMIC model showed that 6 of the 7 items in the model were significant components of lymphedema severity. The MIMIC model is advocated because it takes into account the correlation among the items as well as the relationship between the latent measure of severity and the gold standard. Unlike the standard psychometric techniques that assume an underlying normal distribution of the items, the categorical formulation of the MIMIC model is an appropriate choice for ordinal data. The results of our evaluation of the Norman symptoms showed that the MIMIC model can be useful to inform item reduction. The MIMIC model confirmed that our potential junk item was not statistically significant and could be considered for removal. For this particular example, factor loadings from the measurement-only model did not differ significantly from those of the MIMIC model. Simulations are needed to determine which elements, either correlations among items or associations between individual items and the gold standard, influence the statistical tests of factor loadings.
Item reduction and validation are joint components for scale development. Rather than performing these processes separately, it has been demonstrated that it is advantageous to combine them into a single latent variable model. This model eliminates the potential bias induced by using separate item selection and validation procedures by incorporating the measurement error associated with the estimation of the latent variable. While latent variable models such as the MIMIC model are more complex than simple factor analysis or psychometric measures such as Cronbach's alpha, there is software available to perform these types of latent variable models. An impartial method of analyzing items that is justifiable from a statistical perspective could add credibility to the usually subjective process of item selection, and adding a validation metric to the same model allows for the estimation of the relationship between the items and a gold standard measure.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
FUNDING
National Cancer Institute (T32 CA93283 to S.M.H., R01-CA106851 to K.H.S.).
Supplementary Material
Acknowledgments
Conflict of Interest: None declared.
References
- Ahmed RL, Prizment A, Lazovich DA, Schmitz KH, Folsom AR. Lymphedema and quality of life in breast cancer survivors: the Iowa Women's Health Study. Journal of Clinical Oncology. 2008;26:5689–5696. doi: 10.1200/JCO.2008.16.4731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin. 1959;56:81–105. [PubMed] [Google Scholar]
- Christoffersson A. Factor analysis of dichotomized variables. Psychometrika. 1975;40:5–32. [Google Scholar]
- Clark LA, Watson D. Constructing validity: basic issues in objective scale development. Psychological Assessment. 1995;7:309–319. doi: 10.1037/pas0000626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cormier JN, Xing Y, Zaniletti I, Askew RL, Stewart BR, Armer JM. Minimal limb volume change has a significant impact on breast cancer survivors. Lymphology. 2009;42:161–175. [PMC free article] [PubMed] [Google Scholar]
- Fox J. Teacher's corner: structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006;13:465–486. [Google Scholar]
- Guilford JP. Psychometric Methods. New York: McGraw-Hill; 1954. [Google Scholar]
- Hauser RM, Goldberger AS. The treatment of unobservable variables in path analysis. Sociological Methodology. 1971;3:81–117. [Google Scholar]
- Havlicek LL, Peterson NL. Effect of the violation of assumptions upon significance levels of the Pearson r. Psychological Bulletin. 1977;84:373–377. [Google Scholar]
- Jöreskog KG, Goldberger AS. Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association. 1975;70:631–639. [Google Scholar]
- Muthén B. Contributions to factor analysis of dichotomous variables. Psychometrika. 1978;43:551–560. [Google Scholar]
- Muthén B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49:115–132. [Google Scholar]
- Muthén B. Latent variable hybrids: overview of old and new models. In: Hancock GR, Samuelsen KM, editors. Advances in Latent Variable Mixture Models. Charlotte, NC: Information Age Publishing, Inc.; 2008. pp. 1–24. [Google Scholar]
- Muthén LK, Muthén BO. Mplus User's Guide. Los Angeles, CA: Muthén & Muthén; 1998. [Google Scholar]
- Norman SA, Miller LT, Erikson HB, Norman MF, McCorkle R. Development and validation of a telephone questionnaire to characterize lymphedema in women treated for breast cancer. Physical Therapy. 2001;81:1192–1205. [PubMed] [Google Scholar]
- Nunnally JC, Bernstein IH, Berge JMF. Psychometric Theory. New York: McGraw-Hill; 1967. [Google Scholar]
- Rabe-Hesketh S, Skrondal A. Classical latent variable models for medical research. Statistical Methods in Medical Research. 2007;17:5–32. doi: 10.1177/0962280207081236. [DOI] [PubMed] [Google Scholar]
- Rabe-Hesketh S, Skrondal A, Pickles A. GLLAMM manual. UC Berkeley Division of Biostatistics Working Paper Series. 2004;160:1–138. [Google Scholar]
- Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago, IL: MESA Press; 1993. [Google Scholar]
- Sammel M, Lin X, Ryan L. Multivariate linear mixed models for multiple outcomes. Statistics in Medicine. 1999;18:2479–2492. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2479::aid-sim270>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- Sammel MD, Ryan LM. Latent variable models with fixed effects. Biometrics. 1996;52:650–663. [PubMed] [Google Scholar]
- Sammel MD, Ryan LM. Effects of covariance misspecification in a latent variable model for multiple outcomes. Statistica Sinica. 2002;12:1207–1222. [Google Scholar]
- Sammel MD, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1997;59:667–678. [Google Scholar]
- Schmitz KH, Ahmed RL, Troxel AB, Cheville A, Lewis-Grant L, Smith R, Bryan CJ, Williams-Smith CT, Chittams J. Weight lifting for women at risk for breast cancer–related lymphedema. JAMA: The Journal of the American Medical Association. 2010;304:2699–2705. doi: 10.1001/jama.2010.1837. [DOI] [PubMed] [Google Scholar]
- Schmitz KH, Ahmed RL, Troxel A, Cheville A, Smith R, Lewis-Grant L, Bryan CJ, Williams-Smith CT, Greene QP. Weight lifting in women with breast-cancer-related lymphedema. New England Journal of Medicine. 2009;361:664–673. doi: 10.1056/NEJMoa0810118. [DOI] [PubMed] [Google Scholar]
- Schmitz KH, Courneya KS, Matthews C, Demark-Wahnefried W, Galvão DA, Pinto BM, Irwin ML. Wolin KY, Segal RJ, Lucia A, and others American College of Sports Medicine roundtable on exercise guidelines for cancer survivors. Medicine and Science in Sports and Exercise. 2010;42:1409–1426. doi: 10.1249/MSS.0b013e3181e0c112. [DOI] [PubMed] [Google Scholar]
- Schmitz KH, Troxel AB, Cheville A, Grant LL, Bryan CJ, Gross C, Lytle LA, Ahmed RL. Physical activity and lymphedema (the PAL trial): assessing the safety of progressive strength training in breast cancer survivors. Contemporary Clinical Trials. 2009;30:233–245. doi: 10.1016/j.cct.2009.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shih YCT, Xu Y, Cormier JN, Giordano S, Ridner SH, Buchholz TA, Perkins GH, Elting LS. Incidence, treatment costs, and complications of lymphedema after breast cancer among women of working age: a 2-year follow-up study. Journal of Clinical Oncology. 2009;27:2007–2014. doi: 10.1200/JCO.2008.18.3517. [DOI] [PubMed] [Google Scholar]
- Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: CRC Press; 2004. [Google Scholar]
- Spearman C. General intelligence objectively determined and measured. American Journal of Psychology. 1904;15:201–293. [Google Scholar]
- Streiner DL, Norman GR. Health Measurement Scales. Oxford: Oxford University Press; 1994. [Google Scholar]
- Werts CE, Linn RL. Path analysis: psychological examples. Psychological Bulletin. 1970;74:193–212. [Google Scholar]
- Zellner A. Estimation of regression relationships containing unobservable independent variables. International Economic Review. 1970;11:441–454. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



