Should linking should replace regression when mapping from profile to preference-based measures?

Peter M Fayers; Ron D Hays

doi:10.1016/j.jval.2013.12.002

. Author manuscript; available in PMC: 2015 Mar 1.

Published in final edited form as: Value Health. 2014 Mar;17(2):261–265. doi: 10.1016/j.jval.2013.12.002

Should linking should replace regression when mapping from profile to preference-based measures?

Peter M Fayers ¹, Ron D Hays ²

PMCID: PMC4232819 NIHMSID: NIHMS641181 PMID: 24636385

Abstract

Profile instruments are frequently used to assess health-related quality of life and other patient-reported outcomes. Preference-based measures are required for health-economic cost-utility evaluations. Although regression-based approaches are commonly used to map from profile measures to preference measures, we show that this results in biased estimates because of regression to the mean. Linking (scale-aligning) is proposed as an alternative.

Keywords: Mapping outcomes, Linking scales, Test equating, Scale aligning, Profile instruments, Preference-based measures, Patient-reported outcomes

Introduction

A variety of health-related quality of life (HRQoL) measures are used in clinical trials and observational studies. While diversity of approaches is welcomed because measures need to be fit for purpose, it makes comparing and combining results challenging. A major advantage of preference-based measures is that they yield the single summary score needed to estimate quality-adjusted life years (QALYs) for cost-utility evaluations. But it is frequently recommended that clinical trials use disease-targeted instruments that are sensitive to clinically-relevant changes, while at the same time minimizing patient burden and thereby maximizing survey response and item completion rates [1]. Thus, many studies include only generic [e.g., 2] and disease-targeted [3] profile measures, and there has been great interest in mapping profile to preference-based measures to enable the calculation of QALYs [e.g., 4–8]. “Mapping” is the equating (or “linking”) of values from a source instrument to equivalent values on a target instrument. We emphasize that for health-economic evaluations the purpose of mapping is to obtain group-averaged estimates of QALYs with corresponding standard deviations, to enable comparison of interventions (e.g., treatments or management policies) in clinical trials, observation studies and meta-analyses.

Brazier et al. reviewed 30 studies that reported 119 different models mapping profile to preference-based measures [6]. The most common target measure was the EQ-5D, and the most widely used starting measures were the SF-12 or SF-profile measures. Brazier et al. comment that performance of the mapping functions in terms of goodness-of-fit and prediction is variable and so it is impossible to generalize across instruments. Most importantly, the majority of mapping functions were estimated using ordinary least squares. Some studies explored generalized linear models with random effects, adjusted least square regression models, weighted least squares and other approaches but, one way or another, they all used regression methods. We explain why these least-squares regression-based approaches are problematic for mapping.

Regression to the mean

Predictions from regression models result in attenuated estimates. Indeed, the very term “regression” is short for “regression to the mean,” and was defined by Francis Galton in 1886 in his seminal paper, “Regression towards mediocrity in hereditary stature” [9]. (The Galton paper was sullied by the pejorative term “mediocrity” and the eugenic beliefs that are now considered reprehensible.) Later, it was shown that the “mediocre” value towards which regression estimates tend is the central, or mean, value (“regression to the mean”). What triggered Galton’s paper was his observation that although tall fathers tend to have tall sons, these sons are usually less tall than their fathers; and short fathers similarly have sons who, while usually short, are also less extreme than their fathers. Superficially, this may appear to imply that over time everyone will have the same height, a conclusion that is manifestly untrue. It is the presence of additional random variation that suffices to ensure that there are always new people at the two extreme ends of the distribution. What it does mean, however, is that for any individual the best predicted (“true”) score is less extreme—that is, regressed towards the mean.

Another way to think of regression to the mean is that if someone has a better than average score their score is likely to be partly the consequence of an above average underlying or “true” level and partly luck, so that their true score is likely to be closer to the mean value. If we assume each person had two assessments, a test and retest, then the retest score is likely to be closer to the mean. This same effect is observed in reverse, too: someone with a higher than average retest score is likely to have had a score nearer the mean on their original test. This effect is observed whenever there is less than perfect correlation between the two assessments. Prediction models shrink estimates towards the mean.

Mapping versus prediction

The distinction between mapping (using scale-alignment) and prediction (using regression) has been recognized in the educational field for almost a century [10] as well as the “fallacy of using regression lines to show a true correspondence” [11]. In brief, when mapping educational exam scores, one is not interested in predicting the score a student might have obtained on another examination. Rather, one wants to know what score is equivalent for the second exam, such that students of a particular ranking on one exam are assigned the same ranking on the other exam. For example, suppose students are randomly assigned to take either examination X or examination Y. Because the assignment is random, the students in the two groups should, on average, be of similar ability. Let us now assume that the two exams assess the same underlying construct, and that we wish to convert all scores to be on a single metric. One simple approach, known as equipercentile linking, is to ensure that a student in the top 5% for test X will also be in the top 5% when the X-test scores are converted to Y–test scores. This is in contrast to predicting scores from regression analyses, when regression to the mean results in predicted scores that are closer to the mean value, with the best students who completed test X therefore unfairly receiving a lower predicted score on test Y than they deserve (and the less able students receiving the advantage of a higher than deserved score). When converting to Y-scores, using regression-based methods, the scores of students who completed test X with low or high results become unfairly biased towards the mean Y-score.

The aim of prediction is typically to predict the most likely true score based on information that is known about the respondent. Thus other factors such as socioeconomic status, age and gender might also be included if predictive. In contrast, mapping, or scale-aligning, does not predict scores for one instrument from another. Instead, it aims to align the scales so that the distributions are matched and an individual with a particular score on one scale can be compared with similar individuals assessed on the other scale [12]. Regression does not achieve this: as discussed below, the predicted Y-scores for individuals assessed using X will be less extreme than the observed scores of similar individuals who were assessed using Y, and therefore the overall ranking of individuals who took X will be biased relative to those who took Y.

Regression is a robust technique that is assuredly the most appropriate approach whenever the aims are to predict outcomes, evaluate explanatory factors or explore potentially causal relationships. Scale-aligning has a very specific and different objective.

Shrinkage and variance of the predicted scores

Brazier et al. [6, p. 221] comment that “These papers also found that the predicted values from the mapping functions tend to have lower levels of variance than the original observed values.” Because of regression to the mean, this is hardly surprising and, in fact, is what is to be expected. In the simplest case of linear regression between two normally distributed variables, we have:

σ_{Predicted}^{2} = r^{2} \times σ_{Y}^{2},

(1)

where r is the correlation between X and Y, $σ_{Y}^{2}$ is the variance of the Y-scores, and $σ_{Predicted}^{2}$ is the variance of the predicted Y-scores based on the observed values of X. Thus the amount of shrinkage of the variance is directly proportional to r². This tells us that as r tends towards 1.0, there is little or no loss of variability. Crucially, as the correlation between X and Y decreases (i.e., as r tends towards zero), the variance of the predicted values becomes smaller because regression-based the predictions are increasingly shrunk towards the mean. As might be expected, in the extreme case of zero correlation the predictor variable provides no information and then the best predicted value is simply the mean Y-score and so the variance of the predicted values becomes zero.

Another effect of regression to the mean and the consequent reduced variance in the predicted scores is that we should anticipate that the “misfit” will be most apparent for the highest-scoring individuals, whose scores will be underestimated relative to subjects who were assessed using instrument Y, and lowest-score persons, whose scores will be overestimated. This has been frequently noted in those mapping studies that use regression-based approaches, where it has been observed that the cumulative distribution function of the predicted scores is shrunk at the tails in comparison with the observed values of the target distribution [6,13,14]. Finally, it is also worth noting that X and Y are interchangeable: if we instead use the values of Y to try to predict X-scores, the variance of the predicted values of X would also have a variance shrunk by r².

Consequences for mapping profile to preference measures

In the HRQoL setting, instrument X is usually a profile instrument such as the SF-12 and SF-36 or the condition-specific EORTC QLQ-C30; the target instrument, Y, is a preference-based measure such as the EQ-5D or SF-6D. To estimate QALYs for a clinical trial that only used a profile instrument X, the X-scores are first mapped to preference scores Y using a published mapping equation that is typically based on linear regression [6], and then combined with patient survival times. One review found that many studies report the variance of Y that is explained by X to be generally above 50% for the generic profile instruments, but lower for the condition-specific instruments [6], while another review reported 33 comparisons with a median of 49% [7]. In such studies, the Y-scores predicted by regression will have a variance that is 50% or less than the variance that would have been observed if instrument Y had been used for direct measurements. The falsely low variances may result in optimistic claims of precision, with unduly narrow confidence limits (CIs). For example, if r is 0.7 (corresponding to 49% variance explained), the CIs for the estimated Y-scores will appear to be only 70% of the true width. When mapping profile to preference measures, few studies report r=0.8 or greater, although when equating patient-reported outcomes for two instruments measuring the same domains it is reported that correlations are commonly 0.8 or greater [15]. We suggest that only when r>0.9 can regression to the mean be ignored, which is rarely the case when mapping to preference measures, and even then variances are shrunk to 81% and CIs to 90% of the true value.

Within a single clinical trial, many statistical tests (e.g., t-test) are unaffected by linear transformations and so if X is statistically significant then Y will be, too, despite regression to the mean. Also, in the simple single-study case, it may be possible to use compensatory adjustment to correct the shrunken CIs. When the estimates for individual patients are then used for calculating QALYs, however, the distribution of the QALY estimates will have a variance that is substantially shrunken in some unspecified manner that will depend on survival times, and it is unclear how this will affect the estimated QALYs and their CIs. In theory it should be possible to adjust the regression-shrunk individual-patient estimates, but scale-aligning is a more direct and simpler approach. Care should also be taken with meta-analyses that combine results from several trials. If Y-scores are derived from linear regression in some clinical trials while other trials directly used instrument Y, the regression-predicted Y-scores and the directly observed Y-scores will not be on scales with comparable frequency distributions, potentially invalidating significance tests and confidence intervals unless compensatory adjustment is made. Nonlinear regression is even more complex. Scale-aligning preserves the mean and variance, thus avoiding these problems.

Methods for mapping scores

As mentioned above, test-linking and aligning have a long tradition in educational testing. Mapping and equating of exams is “high stakes” because it determines the future prospects of students, and it is essential to be fair and unbiased when comparing those who have taken different examinations. Educational research has developed methods for comparison of students who have taken different exams, and these techniques have been evaluated and applied to large samples covering students from a wide range of abilities [16–19]. Thus we turn to education for details of suitable methodology for scale-aligning, while bearing in mind that, unlike the educational setting, clinical trials and meta-analyses are group-based and we are more concerned with (a) estimating group effects than making precise estimates of scores for individuals, and (b) preserving the properties of the estimated means and avoiding variance shrinkage.

Five requirements have been proposed for equating of scores to be valid [20], although these are intended for the equating of individuals rather than generating group-based statistics. Angoff used the term “calibration” for linking scores that have differing reliability (relaxing requirement b) or different difficulty, and also described the scale-alignment of tests measuring different constructs as providing “comparable scores” [21]. Kolen described the linking of different, but similar, constructs using a common population of respondents as “battery scaling” [22]. We use the term “scale-aligning” and suggest that the same conditions are applicable when the focus is scale-aligning for group comparisons, with the exception of (b) which is not applicable for scale-alignment [12].

Equal constructs: The tests should measure the same constructs.
Equal reliability: The tests should have the same reliability.
Symmetry: The function for linking the scores of Y to those of X should be the inverse of the function for linking the scores of X to those of Y.
Equity: It should be a matter of indifference as to which of the two equated tests is used.
Population invariance: The choice of subpopulation used to estimate the linking function between the scores of tests X and Y should not matter; the linking function should be population invariant.

Approaches that may be applied include:

Simple linear equating, based on equating the mean and SD of the two scales.
Equipercentile equating, which matches the two cumulative distribution functions to each other either via smooth functions or in a nonparametric manner [12,17]?
Item response theory (IRT)-based methods that map onto logistic scales, possibly together with equipercentile equating [23].

We focus here on (i), the linear equating approach. Equipercentile equating is non-trivial as it requires pre-smoothing of the X and Y distributions and/or post-smoothing of the equipercentile relationship [12,17] because of the discrete categories used in many HRQoL instruments. Equipercentile equating and IRT have been used for mapping unidimensional patient-reported outcomes [15]. Linear equating, however, is the simplest method, and is the most analogous to linear regression. Most HRQoL mapping studies use a single group approach, in which all respondents complete both the profile instrument and the target preference-based instrument; this is rarely feasible in educational settings, where more complex designs are frequently used.

Equipercentile equating, mentioned above, provides a nonparametric approach that matches the entire cumulative distribution. To derive a linear scale-aligning function that is comparable to linear regression, the equipercentile requirement can be applied by ensuring X-scores and Y-scores correspond to the same number of standard deviations above or below the mean. That leads to the following linking function that transforms the X-scores to have the same mean and standard deviation as the Y-scores:

Y = μ_{Y} + (σ_{Y} / σ_{X}) (X - μ_{X}),

(2)

where μ_X and μ_Y are the mean values of X and Y, and σ_X, σ_Y the standard deviations. Note that in contrast to the linear linking of equation (2), the linear regression function (3) does involve the correlation, r, as the slope of regression line is β = r(σ_Y,/σ_X).

\begin{array}{l} Y = μ_{Y} + β (X - μ_{X}) \\ = μ_{Y} + r (σ_{Y} / σ_{X}) (X - μ_{X}), \end{array}

(3)

Equation (2) is symmetrical so it does not matter whether we convert from X to Y or from Y to X, and the linear relationship represents a single line. Equation (3), on the other hand, results in two different regression lines according to whether X or Y is regarded as the dependent (outcome) variable, and as the correlation tends towards zero these lines increasingly diverges. Geometrically, for r=0.70 the regression lines of Y on X and X on Y subtend an angle of approximately 20° [24], with the scale-aligning line roughly midway between the two; as r becomes smaller, the two regression lines diverge, regression to the mean increases, and shrinkage of variance in the predicted values becomes greater.

Although we used a simple scale-aligning argument to derive equation (2), it can be shown that this equation also represents a form of regression in which it is assumed that random errors occur not only in Y but also in X; this “geometric mean regression” is explained in the Statistical Appendix found at Supplemental Materials at: XXX (although called “regression”, the estimated values no longer “regress” to the mean). Lu et al., using a different approach, also conclude that ordinary least squares regression is not coherent and geometric mean regression is preferable [25]. In practice, many HRQoL mapping studies make use of multiple subscales from the generic measure, and both the regression equation (3) and the linking equation (2) can be extended to these more general models.

Validity and goodness-of-fit

Goodness-of-fit is frequently assessed in terms of root-mean-square error (RMSE), which is the square-root of the summed squared differences between the observed and expected values. If the parameters for linear regression have been evaluated on the basis of ordinary least-squares estimation, linear regression is by definition optimal in terms of a linear relationship that yields the smallest RMSE. Thus all other linear scale-aligning, linking or mapping functions will inevitably show poorer fit in terms of RMSE statistics. For mapping, it is inappropriate to define goodness-of-fit in terms of predictive ability. The role of mapping or scale-alignment is to determine equivalent scores such that respondents taking either test will achieve the same overall rank score as if they had taken the other test, with as a consequence, the equivalent cumulative distribution function, mean and standard distribution for the observed and estimated scores.

Other methods than goodness-of-fit are required for comparing different approaches. Longworth and Rowen review methods of validating and evaluating mapping studies [26]. They suggest assessing performance and validity of a linking function by predictive ability and elements that include a) content validity and the extent to which the tests measure similar constructs; b) strength of association between the scores; c) the quality of the linking data and the mapping study (for example, qualitative and descriptive review); d) comparison of the distributions and cumulative distributions of the variables; (e) studies to evaluate the population invariance [20] of the linking functions.

Content validity and similarity of constructs

The validity of a mapping depends on the assumption that the two instruments assess the same or closely similar constructs. This may for example be assessed using qualitative methods in which patients and experts formally compare the wording and meaning of items. Blome et al. (2012) map a skin-targeted HRQoL instrument for psoriasis to the EQ-5D [27]. The authors found poor association between the constructs which “seem to be too different to be equivalent to each other, because the two instruments assess largely different aspects of patient impairment.” They acknowledged that this mapping “has severe limitations in validity and clinical relevance”, and postulate that “comparable results could be derived from studies on other skin diseases. Consequently, the EQ-5D or comparable instruments should be implemented in studies aiming to measure utilities, because utilities cannot reliably be estimated from other study variables”. This conclusion is likely to apply whenever condition-targeted profile scales are mapped to generic preference scales. Although similarity of constructs is essentially a qualitative judgment, high similarity may be expected to lead to strong correlation between the scales.

Strength of association between the scores

Association between X and Y scores can be measured by the correlation coefficient, r, or when more complex models with covariates are used the multiple correlation coefficient, R. Although the correlation coefficient does not appear explicitly in the scale-aligning function (2), for both regression and scale-aligning the correlation should be high, and if the two instruments really are assessing similar constructs it will be. We have been impressed, however, by how low the correlations are in many studies mapping disease-specific profiles to preference measures. As observed above, reviews have reported that in as many as half of the mapping studies R² (the variance explained) fails to reach 50%. Thus Blome et al. reported R² was only 0.24 [27]. So, what value is acceptable for mapping and scale-aligning? In educational settings, it has been suggested that correlations must exceed 0.87 for adequate scale-aligning of individuals [28,29]. For group-based estimates, as when comparing the average number of QALYs from clinical trials in health economic assessments, the magnitude of the correlation need not be as high but it should still be reasonably large. By analogy with the widely used threshold of correlations for reliability of group-level comparisons, and based on our experience, we suggest a threshold of 0.70 as the lowest acceptable. As many as half the published studies are rejected by this criterion. This level of correlation, however, still represents poor agreement between the observed scores X and Y, and implies that the proportion of variance explained will only be 49%; thus many would argue that even 0.70 is too low a threshold.

Quality of the linking data and the mapping study

Longworth and Rowen provide guidance about the requirements for a mapping study [26]. Generally, the mapping study will either comprise data from one or more clinical trials, or will be a purpose-designed mapping study. To be confident about the generalizability of the mapping function to future samples, the characteristics of the estimation sample should be as similar as possible to the characteristics of the sample to which the mapping algorithm will be applied.

Similarity of distribution functions

Similarity of distribution functions is a prerequisite for linear alignment of scales; the linear linking functions merely adjust for different means and standard deviations. If the content of the scales is similar, this requirement is likely to follow. Problems arise, however, when linking profile measures to the EQ-5D, as the EQ-5D returns scores that follow a bimodal (or even trimodal) distribution, with respondents grouped into low and high values and very few in the middle [6,13,14,30]. The SF-6D, on the other hand, follows a smooth unimodal continuum – as is commonly observed with scales from health profile instruments such as the EORTC QLQ-C30 [31], and which seems more plausible for the samples of patients in clinical trials. This disparity is also seen with the EQ-5D cumulative distribution function. Clearly these indexes measure different things. Possible causes are that their underlying constructs are fundamentally different, or their scoring algorithms are inconsistent. Arguably, no mapping is likely to compensate for one measure smoothly covering the continuum while the other is strangely bimodal. At the very least, nonlinear (possibly nonparametric) solutions must be considered.

Population invariance

Dorans and Holland show that an effective way of confirming the validity of linking scores is by assessing whether the linking function is invariant in diverse subpopulations, as differences in constructs or reliability of the instruments is manifested by population invariance, as are nonlinearity and other departures from the model [20]. For HRQoL instruments, possible subgroups to explore might be disease type, disease severity, age, gender and race/ethnicity. Dorans and Holland describe and illustrate suitable tests.

Conclusion

Regression, which is a method for predicting outcomes and is normally quite justifiably the method of choice, differs from scale-alignment, which is appropriate when mapping between instruments. The differences are largely attributable to regression to the mean, which is a frequently overlooked and misunderstood phenomenon. For simplicity of exposition, we have focused on the simple case of linear regression and linear linking functions in single group designs. In HRQoL research, the term “mapping” has usually been implemented using regression-based prediction. The use of regression models, however, is inappropriate for that task, and results in biased estimates. Approaches such as non-parametric equipercentile methods or parametric linking functions should instead be used for mapping profile to preference-based measures. Other options include IRT [23], but care should still be taken to distinguish between prediction of individual scores (when uncertainty shrinks estimated values towards the mean), and mapping.

Acknowledgments

Source of financial support: Ron D. Hays was supported in part by grants from the NIA (P30-AG021684) and the NIMHD (P20MD000182).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Fayers PM, Machin D. Quality of Life: The assessment, analysis and interpretation of patient-reported outcomes. 2. Chichester: Wiley & Sons Ltd; 2007. [Google Scholar]
2.Hunt SM, McEwan J. The development of a subjective health indicator. Sociol Health Illness. 1980;2:231–46. doi: 10.1111/1467-9566.ep11340686. [DOI] [PubMed] [Google Scholar]
3.Hays RD, Mangione CM, Ellwein L, et al. Psychometric properties of the National Eye Institute – Refractive Error Quality of Life Instrument. Ophthalmology. 2003;110:2292–01. doi: 10.1016/j.ophtha.2002.07.001. [DOI] [PubMed] [Google Scholar]
4.Revicki DA, Kawata AK, Harnam N, et al. Predicting EuroQol (EQ-5D) scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) Global items and domain item banks in a United States sample. Qual Life Res. 2009;18:783–91. doi: 10.1007/s11136-009-9489-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kowalski JW, Rentz AM, Walt JG, et al. Rasch analysis in the development of a simplified version of the National Eye Institute Visual-Function Questionnaire-25 for utility estimation. Qual Life Res. 2011;21:323–34. doi: 10.1007/s11136-011-9938-z. [DOI] [PubMed] [Google Scholar]
6.Brazier JE, Yang Y, Tsuchiya A, Rowen DL. A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. Eur J Health Econ. 2010;11:215–25. doi: 10.1007/s10198-009-0168-z. [DOI] [PubMed] [Google Scholar]
7.Mortimer D, Segal L. Comparing the incomparable? A systematic review of competing techniques for converting descriptive measures of health status into QALY-weights. Med Decis Making. 2008;28:66–89. doi: 10.1177/0272989X07309642. [DOI] [PubMed] [Google Scholar]
8.Chuang L-H, Whitehead SJ. Mapping for economic evaluation. Br Med Bull. 2012;101:1–15. doi: 10.1093/bmb/ldr049. [DOI] [PubMed] [Google Scholar]
9.Galton F. Regression towards mediocrity in hereditary stature. J Anthrop Inst Great Britain. 1886;15:246–63. [Google Scholar]
10.Thorndike EL. On finding equivalent schools in tests of intelligence. J Appl Psychol. 1922;6:29–33. [Google Scholar]
11.Otis AS. The method for finding the correspondence between scores in two tests. J Educ Psychol. 1922;13:529–45. [Google Scholar]
12.Dorans NJ. Linking scores from multiple health outcome instruments. Qual Life Res. 2007;16:85–94. doi: 10.1007/s11136-006-9155-3. [DOI] [PubMed] [Google Scholar]
13.Versteegh MM, Rowen D, Brazier JE, Stolk EA. Mapping onto Eq-5 D for patients in poor health. Health Qual Life Outcomes. 2010;8:141. doi: 10.1186/1477-7525-8-141. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Rowen D, Brazier J, Roberts J. Mapping SF-36 onto the EQ-5D index: how reliable is the relationship? Health Qual Life Outcomes. 2009;7:27. doi: 10.1186/1477-7525-7-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Choi SW, Podrabsky T, McKinney N, Schalet BD, Cook KF, Cella D. PROSetta Stone® Analysis Report: a Rosetta Stone for Patient Reported Outcomes. Vol. 1. Chicago, IL: Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University; 2012. [Google Scholar]
16.Brennan RL, editor. Educational measurement. 4. Westport, CT: Praeger Publishers; 2007. [Google Scholar]
17.Kolen MJ, Brennan RL. Test Equating, Scaling, and Linking: Methods and Practices. 2. New York: Springer; 2004. [Google Scholar]
18.von Davier AA, editor. Statistical Models for Test Equating, Scaling, and Linking. New York: Springer; 2010. [Google Scholar]
19.Dorans NJ, Pommerich M, Holland PW, editors. Linking and Aligning Scores and Scales. New York: Springer; 2007. [Google Scholar]
20.Dorans NJ, Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. J Educ Meas. 2000;37:281–306. [Google Scholar]
21.Angoff WH. Scales, norms, and equivalent scores. In: Thorndike RL, editor. Educational measurement. 2. Washington, DC: American Council on Education; 1971. pp. 508–600. [Google Scholar]
22.Kolen MJ. Linking Assessments: Concept and History. Appl Psychol Meas. 2004;28:219–26. [Google Scholar]
23.Kolen MJ. Comparison of traditional and item response theory methods for equating tests. J Educ Meas. 1981;18:1–11. [Google Scholar]
24.Schmid J. The relationship between the coefficient of correlation and the angle included between regression lines. J Educ Res. 1947;41:311–13. [Google Scholar]
25.Lu G, Brazier JE, Ades AE. Mapping from disease-specific to generic health-related quality-of-life scales: a common factor model. Value Health. 2013;16:177–84. doi: 10.1016/j.jval.2012.07.003. [DOI] [PubMed] [Google Scholar]
26.Longworth L, Rowen D. Mapping to Obtain EQ-5D Utility Values for Use in NICE Health Technology Assessments. Value Health. 2013;16:202–10. doi: 10.1016/j.jval.2012.10.010. [DOI] [PubMed] [Google Scholar]
27.Blome C, Beikert FC, Rustenbach SJ, Augustin M. Mapping DLQI on EQ-5D in psoriasis: transformation of skin-specific health-related quality of life into utilities. Arch Dermatol Res. 2012 doi: 10.1007/s00403-012-1309-2. [DOI] [PubMed] [Google Scholar]
28.Dorans NJ. Equating, Concordance, and Expectation. Appl Psychol Meas. 2004;28:227–46. [Google Scholar]
29.Dorans NJ, Walker ME. Sizing up linkages. In: Dorans NJ, Pommerich M, Holland PW, editors. Linking and Aligning Scores and Scales. New York: Springer; 2007. [Google Scholar]
30.Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of theEQ-5D and SF-6D across seven patient groups. Health Econ. 2004;13:873–84. doi: 10.1002/hec.866. [DOI] [PubMed] [Google Scholar]
31.Scott NW, Fayers PM, Aaronson NK, et al. EORTC QLQ-C30 Reference Values. Brussels: EORTC Quality of Life Group Publications; 2008. [Google Scholar]

[R1] 1.Fayers PM, Machin D. Quality of Life: The assessment, analysis and interpretation of patient-reported outcomes. 2. Chichester: Wiley & Sons Ltd; 2007. [Google Scholar]

[R2] 2.Hunt SM, McEwan J. The development of a subjective health indicator. Sociol Health Illness. 1980;2:231–46. doi: 10.1111/1467-9566.ep11340686. [DOI] [PubMed] [Google Scholar]

[R3] 3.Hays RD, Mangione CM, Ellwein L, et al. Psychometric properties of the National Eye Institute – Refractive Error Quality of Life Instrument. Ophthalmology. 2003;110:2292–01. doi: 10.1016/j.ophtha.2002.07.001. [DOI] [PubMed] [Google Scholar]

[R4] 4.Revicki DA, Kawata AK, Harnam N, et al. Predicting EuroQol (EQ-5D) scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) Global items and domain item banks in a United States sample. Qual Life Res. 2009;18:783–91. doi: 10.1007/s11136-009-9489-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kowalski JW, Rentz AM, Walt JG, et al. Rasch analysis in the development of a simplified version of the National Eye Institute Visual-Function Questionnaire-25 for utility estimation. Qual Life Res. 2011;21:323–34. doi: 10.1007/s11136-011-9938-z. [DOI] [PubMed] [Google Scholar]

[R6] 6.Brazier JE, Yang Y, Tsuchiya A, Rowen DL. A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. Eur J Health Econ. 2010;11:215–25. doi: 10.1007/s10198-009-0168-z. [DOI] [PubMed] [Google Scholar]

[R7] 7.Mortimer D, Segal L. Comparing the incomparable? A systematic review of competing techniques for converting descriptive measures of health status into QALY-weights. Med Decis Making. 2008;28:66–89. doi: 10.1177/0272989X07309642. [DOI] [PubMed] [Google Scholar]

[R8] 8.Chuang L-H, Whitehead SJ. Mapping for economic evaluation. Br Med Bull. 2012;101:1–15. doi: 10.1093/bmb/ldr049. [DOI] [PubMed] [Google Scholar]

[R9] 9.Galton F. Regression towards mediocrity in hereditary stature. J Anthrop Inst Great Britain. 1886;15:246–63. [Google Scholar]

[R10] 10.Thorndike EL. On finding equivalent schools in tests of intelligence. J Appl Psychol. 1922;6:29–33. [Google Scholar]

[R11] 11.Otis AS. The method for finding the correspondence between scores in two tests. J Educ Psychol. 1922;13:529–45. [Google Scholar]

[R12] 12.Dorans NJ. Linking scores from multiple health outcome instruments. Qual Life Res. 2007;16:85–94. doi: 10.1007/s11136-006-9155-3. [DOI] [PubMed] [Google Scholar]

[R13] 13.Versteegh MM, Rowen D, Brazier JE, Stolk EA. Mapping onto Eq-5 D for patients in poor health. Health Qual Life Outcomes. 2010;8:141. doi: 10.1186/1477-7525-8-141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Rowen D, Brazier J, Roberts J. Mapping SF-36 onto the EQ-5D index: how reliable is the relationship? Health Qual Life Outcomes. 2009;7:27. doi: 10.1186/1477-7525-7-27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Choi SW, Podrabsky T, McKinney N, Schalet BD, Cook KF, Cella D. PROSetta Stone® Analysis Report: a Rosetta Stone for Patient Reported Outcomes. Vol. 1. Chicago, IL: Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University; 2012. [Google Scholar]

[R16] 16.Brennan RL, editor. Educational measurement. 4. Westport, CT: Praeger Publishers; 2007. [Google Scholar]

[R17] 17.Kolen MJ, Brennan RL. Test Equating, Scaling, and Linking: Methods and Practices. 2. New York: Springer; 2004. [Google Scholar]

[R18] 18.von Davier AA, editor. Statistical Models for Test Equating, Scaling, and Linking. New York: Springer; 2010. [Google Scholar]

[R19] 19.Dorans NJ, Pommerich M, Holland PW, editors. Linking and Aligning Scores and Scales. New York: Springer; 2007. [Google Scholar]

[R20] 20.Dorans NJ, Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. J Educ Meas. 2000;37:281–306. [Google Scholar]

[R21] 21.Angoff WH. Scales, norms, and equivalent scores. In: Thorndike RL, editor. Educational measurement. 2. Washington, DC: American Council on Education; 1971. pp. 508–600. [Google Scholar]

[R22] 22.Kolen MJ. Linking Assessments: Concept and History. Appl Psychol Meas. 2004;28:219–26. [Google Scholar]

[R23] 23.Kolen MJ. Comparison of traditional and item response theory methods for equating tests. J Educ Meas. 1981;18:1–11. [Google Scholar]

[R24] 24.Schmid J. The relationship between the coefficient of correlation and the angle included between regression lines. J Educ Res. 1947;41:311–13. [Google Scholar]

[R25] 25.Lu G, Brazier JE, Ades AE. Mapping from disease-specific to generic health-related quality-of-life scales: a common factor model. Value Health. 2013;16:177–84. doi: 10.1016/j.jval.2012.07.003. [DOI] [PubMed] [Google Scholar]

[R26] 26.Longworth L, Rowen D. Mapping to Obtain EQ-5D Utility Values for Use in NICE Health Technology Assessments. Value Health. 2013;16:202–10. doi: 10.1016/j.jval.2012.10.010. [DOI] [PubMed] [Google Scholar]

[R27] 27.Blome C, Beikert FC, Rustenbach SJ, Augustin M. Mapping DLQI on EQ-5D in psoriasis: transformation of skin-specific health-related quality of life into utilities. Arch Dermatol Res. 2012 doi: 10.1007/s00403-012-1309-2. [DOI] [PubMed] [Google Scholar]

[R28] 28.Dorans NJ. Equating, Concordance, and Expectation. Appl Psychol Meas. 2004;28:227–46. [Google Scholar]

[R29] 29.Dorans NJ, Walker ME. Sizing up linkages. In: Dorans NJ, Pommerich M, Holland PW, editors. Linking and Aligning Scores and Scales. New York: Springer; 2007. [Google Scholar]

[R30] 30.Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of theEQ-5D and SF-6D across seven patient groups. Health Econ. 2004;13:873–84. doi: 10.1002/hec.866. [DOI] [PubMed] [Google Scholar]

[R31] 31.Scott NW, Fayers PM, Aaronson NK, et al. EORTC QLQ-C30 Reference Values. Brussels: EORTC Quality of Life Group Publications; 2008. [Google Scholar]

PERMALINK

Should linking should replace regression when mapping from profile to preference-based measures?

Peter M Fayers, PhD

Ron D Hays, PhD

Abstract

Introduction

Regression to the mean

Mapping versus prediction

Shrinkage and variance of the predicted scores

Consequences for mapping profile to preference measures

Methods for mapping scores

Validity and goodness-of-fit

Content validity and similarity of constructs

Strength of association between the scores

Quality of the linking data and the mapping study

Similarity of distribution functions

Population invariance

Conclusion

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Should linking should replace regression when mapping from profile to preference-based measures?

Peter M Fayers, PhD

Ron D Hays, PhD

Abstract

Introduction

Regression to the mean

Mapping versus prediction

Shrinkage and variance of the predicted scores

Consequences for mapping profile to preference measures

Methods for mapping scores

Validity and goodness-of-fit

Content validity and similarity of constructs

Strength of association between the scores

Quality of the linking data and the mapping study

Similarity of distribution functions

Population invariance

Conclusion

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases