Abstract
Objective
Anchoring vignettes appear with growing frequency in surveys of health and aging, but little research investigates how to optimize their wording. This study experimentally tests whether mentioning specific health conditions and/or medical procedures enhances or undermines vignette validity.
Methods
Three series of general health anchoring vignettes were fielded to 2,550 respondents in the Wisconsin Longitudinal Study: one mentioning no specific health conditions or procedures, one mentioning heart disease-related ones, and one mentioning diabetes-related ones. Variations on hierarchical ordered probit models were used to test whether vignette wording affected adherence to the key measurement assumptions of vignette equivalence (VE) and response consistency (RC).
Results
While all vignette series showed substantial violations of VE, violations were larger (especially by sex and education) when using disease-specific texts. RC violations appeared relatively minor, but somewhat larger in disease-specific texts.
Discussion
These findings suggest that more general, universal vignette texts may be preferable to ones describing highly specific conditions/procedures. The common advice to prioritize specificity and concreteness in survey texts may be misguided if sociodemographic groups differ in their familiarity or associations with the presented details. Anchoring vignettes are a potentially useful survey tool, but further efforts are needed to optimize their wording.
Keywords: Anchoring vignettes, Differential item functioning, Health measurement, Reporting heterogeneity, Self-rated health, Survey design
In the early 2000s, anchoring vignettes (described in more detail below) were introduced as a means of improving intergroup comparability of subjective survey items (King, Murray, Salomon, & Tandon, 2004). Anchoring vignettes have since grown in popularity. However, recent evidence of vignettes’ failure to adhere to measurement assumptions—in particular the assumption of vignette equivalence (VE)—has led to a pessimistic turn in anchoring vignette research (e.g., Bago d’Uva, Lindeboom, O’Donnell, & van Doorslaer, 2011; Grol-Prokopczyk et al., 2015). One reason for existing vignettes’ poor performance may be that very limited attention has been paid to vignette wording (Kapteyn, Smith, van Soest, & Vonkova, 2011). What features of vignette wording maximize vignette validity? In particular, is the tenet that textual specificity enhances vignette validity empirically supported? This study compares three sets of vignettes to test whether more general or more specific health descriptions should be preferred, and in the process clarifies how to write anchoring vignettes that function as intended.
The Problem: Reporting Heterogeneity
Many key variables in surveys of aging (e.g., well-being, stress, pain) are based on subjective self-ratings. The widely-used general self-rated health question (Idler & Benyamini, 1997), for example, asks respondents to rate their overall health with subjective categories such as “excellent, very good, good, fair, or poor.” Use of subjective response categories, however, raises concerns about incomparability across sociodemographic groups, as different groups may attribute different meanings to specific response categories. Such differences in reporting styles—termed “reporting heterogeneity” (Bago d’Uva et al., 2011) or “response-category differential item functioning” (King & Wand, 2007)—can reflect cultural differences. For instance, members of some populations appear reluctant to use highly positive terms such as “excellent” when rating their health, to avoid the appearance of boasting (e.g., Abdulrahim & Ajrouch, 2010; Shetterly, Baxter, Mason, & Hamman, 1996). In other cases, reporting heterogeneity may reflect linguistic or translational differences: For example, Latinos are more likely to self-report “fair” health when this term is translated as “regular” than when it is translated as “mas o menos”—a fact with repercussions for comparison of Latino and non-Latino health (or of Spanish-language surveys using different wording; Sanchez & Vargas, 2016).
Reporting heterogeneity poses a serious challenge for both domestic and international comparative research. Studies based on unadjusted subjective self-ratings may yield highly implausible rankings of countries or other demographic groups (e.g., Sen, 2002). A recent National Institute on Aging report describes “large agreement on the need for harmonization” of survey items, to take full advantage of the growing number of surveys of aging worldwide (National Institute on Aging, 2012:3).
The Solution, or Another Problem?: Anchoring Vignettes
Anchoring vignettes are brief texts describing fictional characters that manifest the trait of interest (e.g., mobility, general health, etc.) to a greater or lesser degree. Survey respondents are asked to rate the characters’ levels of the trait using the same response categories that they use for self-ratings. Because respondents receive identical vignettes, differences in vignette ratings can be interpreted as markers of reporting heterogeneity. Subsequent analyses can use vignette ratings to adjust for group differences in rating styles, yielding unbiased comparisons of self-ratings. For more on the method and vignette-based adjustment techniques, see King et al., 2004; King & Wand, 2007; van Soest & Vonkova, 2014.
The anchoring vignette method has a compelling logic, but its validity depends on two key measurement assumptions: vignette equivalence (VE) and response consistency (RC) (King et al., 2004). In the context of general health, VE means that different respondents perceive a given vignette to represent the same absolute level of health, that is, to occupy the same position on the latent health scale (even if respondents use different response categories to describe that level of health). VE would be violated if a vignette text is understood differently by different groups (Rice, Robone, & Smith, 2011)—for example, if an obese vignette character is seen as unhealthy in some countries but as healthy (because not suffering from undernutrition) in others. RC refers to respondents using response categories in the same manner when rating vignette characters’ health as when rating their own—that is, using the same intercategory thresholds in both cases.
In the early 2000s, researchers from the World Health Organization (WHO) developed and fielded health domain-specific anchoring vignettes in the 70-country World Health Survey (WHS). Encouraged by positive initial assessments (e.g., King et al., 2004; Murray et al., 2003), the use of anchoring vignettes quickly spread. Anchoring vignettes now appear in many surveys of health and aging in all corners of the globe, including the Health and Retirement Study (HRS); the Survey of Health, Ageing and Retirement in Europe (SHARE); the English Longitudinal Study of Ageing (ELSA); the China Health and Retirement Longitudinal Study (CHARLS); the WHO Study on Global AGEing and Adult Health (SAGE); and many others. Scholars engaging in international comparative research (e.g., Kapteyn, 2010) and those focusing on domestic comparisons (e.g., Dowd & Todd, 2011) have advocated for and used vignettes in their research.
However, initial enthusiasm for anchoring vignettes has been marred by growing evidence that many vignettes violate measurement assumptions, in particular VE. Evidence for RC is mixed but is overall stronger than for VE (Grol-Prokopczyk et al., 2015; Kapteyn et al., 2011; Paccagnella, 2013) Substantial violations of VE have been found in ELSA’s mobility and cognition vignettes (Bago d’Uva et al., 2011), SHARE’s life satisfaction vignettes (Corrado & Weeks, 2010), SAGE’s mobility and cognition vignettes (Hirve, Gómez-Olivé, & Oti, 2013), SAGE’s and HRS’s domain-specific health vignettes (Grol-Prokopczyk et al., 2015), and others. VE violations appear particularly large across countries but are also observed across sociodemographic groups within countries. In short, different respondents often seem to interpret vignette texts in fundamentally different ways, meaning that these vignettes cannot serve to “anchor” self-reports. Violations of VE have been described as the biggest current challenge to the anchoring vignette method (Paccagnella, 2013). Surprisingly, however, little research has examined how to improve vignette wording to avoid such violations.
How to Write Valid Vignettes?
How might problems with VE be overcome? Survey researchers frequently assert the importance of objectivity and concrete detail in survey questions, especially those intended for diverse populations (e.g., Pasick, Stewart, Bird, & D’Onofrio, 2001). Researchers focusing specifically on anchoring vignettes are no exception, arguing that VE violations are “more likely when dealing with abstract concepts” (Molina, 2016:300) and might result from respondents “imput[ing] missing information” from insufficiently detailed vignette texts (van Soest, Delaney, Harmon, Kapteyn, & Smith, 2011:579). The ideal vignette description, then, would be “complete” (while still short enough that respondents read it carefully; Kapteyn et al., 2011:14) and “tangible” (Molina, 2016:300), that is, invoking concrete detail.
These beliefs are at least partially reflected in the wording of many current vignettes. Mobility and vision vignettes frequently refer to specific, quantified distances (e.g., “Mary… jogs 4 kilometres twice a week;” “Hector… picks out most details in pictures from across 20 metres”), and quantification is also common in other series (e.g., “it takes around 15 minutes for him to go back to sleep.” Texts come from WHS/SAGE vignettes). Other vignettes refer to specific sites, causes, and/or effects of health problems.
But are exhortations to specificity and concreteness always appropriate? The vignettes mentioned in the previous paragraph perform poorly in tests of VE (Grol-Prokopczyk et al., 2015). Moreover, Su et al.’s (2017) experimental, cognitive interview-based research on vision vignettes in China directly finds that some concrete details—for example, specific distances such as “20 meters”—undermine VE. In their sample, three-quarters of respondents were not sure how far 20 m actually was. Admittedly, the respondents were 7th and 11th graders; perhaps adult respondents would have a more accurate sense of distance—but perhaps not.
Unfortunately, fine-grained and/or experimental investigations into the effects of vignette wording, such as Su et al.’s (2017), are rare. (Some that do exist focus on how to present the age and sex of vignette characters (Grol-Prokopczyk, 2014; Jürges & Winter, 2013), rather than on substantive content). Nonetheless, one might hypothesize a priori that existing vignettes suffer from mentioning certain concrete details, such as suicide (which might be evaluated differently depending on local legal and religious prohibitions), newspaper reading (an activity with different meanings for literate vs. illiterate respondents), etc.
In short, survey designers need information—thus far largely lacking—on whether and when concrete details in anchoring vignettes enhance VE. Do specific details make vignette characters more realistic, comprehensible, and easier to locate at a specific position on the latent health scale? Or do such details invite group-specific interpretations that undermine VE instead of bolstering it? What types of details fall into each category?
While comprehensive answers to such questions are beyond the scope of a single article, this study aims to begin answering them in the context of general health anchoring vignettes (designed to adjust the commonly-used general self-rated health question). Specifically, this study compares three versions of health vignettes: one focusing on general, arguably universal features of overall health (such as pain and energy levels), and two which also include descriptions of specific health problems and/or medical procedures relating to heart disease and diabetes. I focus on testing adherence to VE—given evidence that this “is a much more fragile assumption than response consistency” (Kapteyn et al., 2011, p. 20)—but assess RC as well, since both are crucial to vignette validity. Which of the vignette series shows the best performance in terms of adherence to measurement assumptions, and what does this tell us about how to design future vignettes?
Methods
Data
Data for this project come from the Wisconsin Longitudinal Study (WLS) (Hauser, Sewell, & Herd, 2014), which began in 1957 as a survey of graduating high school seniors in Wisconsin. Multiple follow-up waves have tracked initial respondents from early adulthood through retirement age and beyond (Herd, Carr, & Roan, 2014). In the mid-2000s, the WLS conducted telephone interviews of a randomly selected sibling of the graduating senior, as well as the sibling’s spouse.
The present analyses are based on data from 2005 to 2007, when three sets of general health anchoring vignettes were fielded to 2,625 siblings and sibling-spouses. Seventy-five cases are excluded here due to missing information about age, education, and/or income, yielding an analytic sample of 2,550 (comprising 1,151 siblings and 1,399 sibling-spouses). Sociodemographic characteristics of the analytic sample are presented in Table 1. Because respondents were siblings or siblings-in-law of 1957 high school graduates, the range of ages at time of interview is relatively closely clustered around 64 and only a small percentage of respondents (5%) did not complete high school. A primary limitation of this dataset is its racial homogeneity: reflecting the demographics of Wisconsin in 1957, 99% of WLS respondents identify as White.
Table 1.
Characteristics of Sample (n = 2,550; from Wisconsin Longitudinal Study, 2005–2007)
| % or mean | SD | N | |
|---|---|---|---|
| Sex | |||
| Female | 54.75% | 1,396 | |
| Male | 45.25% | 1,154 | |
| Age at time of interview | |||
| Overall | 63.78 | 7.54 | |
| Under age 60 years | 27.61% | 704 | |
| Ages 60–64 years | 29.53% | 753 | |
| Ages 65–69 years | 20.39% | 520 | |
| Ages 70 years or above | 22.47% | 573 | |
| Education | |||
| Less than high school | 5.25% | 134 | |
| High school diploma | 40.55% | 1,034 | |
| Some college | 19.80% | 505 | |
| 4-year college degree | 17.96% | 458 | |
| Graduate degree | 16.43% | 419 | |
| Household income | |||
| Overall | $71,264.48 | $79,344.76 | |
| Quartile 1 (poorest) | $4,429.86 | $5,663.70 | |
| Quartile 2 | $30,137.69 | $7,447.15 | |
| Quartile 3 | $61,862.70 | $11,062.66 | |
| Quartile 4 (richest) | $155,266.75 | $96,454.79 | |
The three series of general health anchoring vignettes each consisted of four vignettes: Severities 1–4, with Severity 4 representing the worst health. Table 2 provides the vignette texts. The No Disease series describes characters’ health in relatively general terms, referring to four very common—arguably universal—dimensions of health: energy/fatigue, mobility, pain, and days in bed due to illness. The Heart Disease and Diabetes series include the same text as the No Disease series, but add one to two sentences (randomly assigned to appear before or after the base text) referring to specific medical diagnoses, treatments, and/or procedures. The Heart Disease texts refer to blood pressure, cholesterol, angioplasty, heart attack, and bypass surgery, while the Diabetes texts refer to blood sugar levels, diabetes, insulin injections, and “diabetes-related complications.” All series were intended to calibrate the general self-rated health question—“In general, would you say your health is excellent, very good, good, fair, or poor?”—and accordingly respondents were given the same five response categories when rating vignette characters’ health. Categories were reverse-coded, so that higher values represent better health. Respondents were randomly assigned three vignettes, which appeared in random order: one from each series, and each representing a different severity level.
Table 2.
Texts of Three Series of General Health Anchoring Vignettes
| Introductory text | Earlier we asked you to rate your own health overall. We are interested in how you would use these same categories to rate the health of other people your age. Now I am going to describe the health of some people your age; then I am going to ask you to rate their health using the same categories you used to rate your own health. |
| No disease series | These also serve as base texts for the Health Disease and Diabetes series. |
| Severity 1 | [Name/she/he] is energetic, and has little trouble with bending, lifting, and climbing stairs. [She/he] rarely experiences pain, except for minor headaches. In the past year [Name/she/he] spent one day in bed due to illness. |
| Severity 2 | [Name/she/he] is usually energetic, but occasionally feels fatigued. [S/he] has some trouble bending, lifting, and climbing stairs. [His/her] occasional pain does not affect [his/her] daily activities. In the past year, [Name/she/he] spent a few days in bed due to illness. |
| Severity 3 | About once a week, [Name/she/he] has no energy. [S/he] has some trouble bending, lifting, and climbing stairs, and each week experiences pain that limits some of [his/her] daily activities. In the past year, [Name/she/he] spent a week in bed due to illness. |
| Severity 4 | [Name/she/he] feels exhausted several days a week. [S/he] has trouble bending, lifting, and climbing stairs, and every day experiences pain that limits many of [his/her] daily activities. In the past year, [Name/she/he] spent a few nights in a hospital, and over a week in bed due to illness. |
| Heart disease series | The sentences below are added to the base text from the No Disease series. |
| Severity 1 | [Name]’s doctor says [Name] has good blood pressure, and that [his/her] heart is in good health. |
| Severity 2 | [Name]’s doctor says [Name] has borderline high blood pressure and high cholesterol but does not need medication for them. |
| Severity 3 | [Name] has high blood pressure and high cholesterol. [S/he] once underwent angioplasty to unblock an artery, and takes medication for these problems. |
| Severity 4 | [Name] has very high blood pressure and cholesterol. [S/he] once had a heart attack and subsequently had successful bypass surgery. |
| Diabetes series | The sentences below are added to the base text from the No Disease series. |
| Severity 1 | [Name]’s doctor says [Name] has healthy blood sugar levels. |
| Severity 2 | [Name]’s doctor says [Name] must lower [hid/her] blood sugar levels to avoid getting diabetes. |
| Severity 3 | [Name] has diabetes, and controls it by managing [his/her] diet. |
| Severity 4 | [Name] has diabetes that requires [him/her] to take daily insulin injections, and is experiencing some diabetes-related complications. |
| Question after each vignette | In general, would you say [Name]’s health is: excellent, very good, good, fair, or poor? |
Note: First names used in vignettes were David, Tom, and William for male respondents, and Karen, Joan, and Nancy for female respondents.
The three vignette series were not originally designed to test the effects of more universal versus more particular wording on VE (rather, an effect of mentioning diseases on RC was hypothesized; Grol-Prokocpzyk et al., 2011). Consequently, the vignette series do not represent a maximally sharp division between more general and more specific descriptions of health. For example, the No Disease Severity 4 vignette mentions “spen[ding] a few nights in a hospital,” a specific experience that might have different meaning for respondents depending on whether they have comprehensive health insurance, whether they live near a hospital, etc. Simultaneously, since the Heart Disease and Diabetes vignettes are based on the No Disease vignettes, they include the general descriptions of energy levels, pain, etc.—that is, they represent a combination of general and specific aspects of health. In a sense, then, the present comparison is a conservative test of the effects of vignette wording on VE: If disease-specific vignettes perform differently than the No Disease vignettes, despite sharing roughly two-thirds of their wording, then this is evidence that even small variations in vignette wording can affect vignette interpretation.
Tests of VE
Tests of VE were conducted with a method developed by Bago D’Uva et al. (2011) and since used by a growing number of researchers (including Grol-Prokopczyk et al., 2015; Hirve et al., 2013; Molina, 2016). The underlying logic of the method is as follows: Ideally, we would like to know the exact locations occupied by vignettes in a series on the latent scale, so that we could compare those locations across groups (e.g., age groups, countries, etc.). Unfortunately, one cannot directly calculate the exact location of all vignettes in a series, as such a model is unidentifiable. However, as Bago D’Uva and colleagues note, if we assume that one vignette in a series is in the same location for all respondents, then we can estimate the distance along the latent scale between that reference vignette and all other vignettes in the series. We can then compare such distances across groups. Similar intervignette distances are supportive of VE, while dissimilar ones suggest violations of VE.
Formally, the Bago D’Uva et al. (2011) test is conducted by comparing two models. Let Vij represent the perceived location of vignette j for respondent i, Rij the respondent’s rating of that vignette, and K the number of response categories (here, 5). Then Rij= k if Vij is between thresholds τik-1 and τik, with thresholds assumed to increase monotonically between τi0 = -∞ and τiK = ∞.
In Model A, covariates have no effect on vignettes’ perceived locations (consistent with the assumption of VE), so that each vignette location can be represented simply as a constant aj plus a random error term εij:
where εij is assumed to be normally distributed with a mean of zero. For model identification, α1 is set to 0 and the variance of the random error term is set to 1.
In Model B, a single reference vignette is set to 0, as in Model A. The perceived positions of all other vignettes, however, may now be affected by a vector of covariates Xi, which in the present study include sex, age, education, and income.
Here, severity 1 is the reference vignette, so the vector λ1 contains only zeroes. The covariate vector X here takes a linear functional form and does not include a constant term.
If VE holds, then λj = 0 for all j, so that Model B reduces to Model A. A likelihood ratio (LR) test would thus fail to reject the null hypothesis of no difference between models. If the test rejects the null hypothesis, however, this suggests that groups perceive vignettes to lie at different locations on the latent spectrum and thus that VE is violated. This test also assumes that respondents use the same thresholds in rating each vignette in a series (Bago d’Uva et al., 2011, p. 881). Therefore, non-zero values of λ could be driven either by absence of VE or by absence of threshold consistency across vignettes. Below, I interpret the LR tests primarily as tests of VE, given both theoretical and empirical reasons to suspect VE violations. Regardless, both scenarios are indicative of problematic vignettes; either would invalidate the use of vignettes to adjust self-ratings.
Both models were implemented with variations on hierarchical ordered probit, or hopit, models (Greene & Hensher, 2010). While standard ordered probit models assume that response-category thresholds are fixed across respondents, hopit models allow thresholds to vary across groups, based on ratings of anchoring vignettes. To ensure correct ordering of thresholds, an exponential function is used: τi1= γ1Xi and τik= τik-1+ exp(γkXi) for k = 2,…,K-1. Group differences in thresholds are then taken into account to estimate perceived vignette locations (relative to the reference vignette). Hopit models are often used to adjust self-ratings based on vignette ratings, and in such cases require data on self-ratings. In the model variation used here, however, no self-ratings are needed, since the goal is simply to estimate distances among vignettes on the latent spectrum.
In this study, Models A and B both allow response category thresholds to vary by sex, age in years, educational category, and income quartile. In Model A, the equation for perceived vignette locations includes only dummies for vignette severity. In Model B, however, the equation for vignette locations also includes multiple interaction terms, one for each covariate interacted with each severity. Thus, the “female × Severity 2” interaction indicates whether the distance between the Severity 2 vignette and the reference (Severity 1) vignette is larger for women than for men. By examining coefficients for such interaction terms, one can identify covariates driving violations of VE.
This study presents the LR test and Model B results for each vignette series. In addition, a version of Model B was run that pooled all three series of vignettes and included three-way interactions between each covariate, severity, and series (as well as interactions between severity and series). The three-way interactions indicate whether violations of VE were significantly larger for disease-specific vignettes than for No Disease vignettes.
Severity 1 was used as the reference vignette throughout, as it mentions the fewest specific health problems/conditions; contrasts across series are likely to be greater across higher-severity vignettes. Covariates included in the interactions were the dummies shown in Table 1, except that “less than high school” and “high school diploma” were combined due to low numbers in the former category. All analyses were conducted with Stata version 14.2; code is available upon request. Supplementary Appendix A provides further details about the likelihood function and LR test used in these analyses.
Tests of RC
Two types of RC tests were conducted. Due to length limitations, they are described and presented in Supplementary Appendix D.
Results
Violations of VE
As shown in Supplementary Appendix B, likelihood ratio tests comparing Models A and B reject VE for all three series (p < .01 for the No Disease series; p < .001 for Heart Disease and Diabetes). One may note that the LR test statistic is smaller for the No Disease series (57.90) than for the disease-specific series (79.24 and 87.95), suggesting a smaller violation of VE in the former. In models with different subsets of covariates, the No Disease series also shows smaller violations.
To explore primary drivers of VE violations across series, Table 3 presents results from Model B for each series. Coefficients in the “Vignette Severity” section estimate the baseline position of each vignette on the latent health spectrum relative to the Severity 1 vignette (which is set to 0). Because Severities 2–4 represent worse health than Severity 1, they have negative coefficients. Subsequent rows of the table test for violations of VE, by estimating whether specific sociodemographic groups perceive a given vignette as being significantly farther from the reference vignette than do other groups. Because no systematic differences in perceived vignette locations were found by age group, interactions with age dummies are not shown.
Table 3.
Predictors of Perceived Vignette Location, by Vignette Series
| No disease series (N = 2,548) | Heart disease series (N = 2,548) | Diabetes series (N = 2,545) | ||||
|---|---|---|---|---|---|---|
| β | SE | β | SE | β | SE | |
| Vig. Severity (ref: 1) | ||||||
| Severity 2 | −1.61*** | 0.21 | −1.55*** | 0.23 | −1.30*** | 0.22 |
| Severity 3 | −2.22*** | 0.23 | −3.16*** | 0.27 | −1.87*** | 0.23 |
| Severity 4 | −2.86*** | 0.24 | −3.29*** | 0.28 | −2.76*** | 0.26 |
| Severity 2 interactions | ||||||
| Sev. 2 × Female | −0.09 | 0.14 | −0.32* | 0.15 | −0.52*** | 0.15 |
| Sev. 2 × Some Coll. | −0.06 | 0.19 | −0.02 | 0.20 | −0.46* | 0.20 |
| Sev. 2 × Coll. Degr. | 0.02 | 0.20 | 0.01 | 0.22 | −0.68** | 0.21 |
| Sev. 2 × Grad. Degr. | −0.23 | 0.22 | 0.13 | 0.21 | −0.60** | 0.22 |
| Sev. 2 × Inc. Quartile 2 | −0.07 | 0.21 | −0.01 | 0.22 | −0.28 | 0.21 |
| Sev. 2 × Inc. Quartile 3 | 0.07 | 0.20 | −0.09 | 0.21 | −0.25 | 0.20 |
| Sev. 2 × Inc. Quartile 4 | −0.29 | 0.21 | −0.35 | 0.22 | −0.38† | 0.22 |
| Severity 3 interactions | ||||||
| Sev. 3 × Female | −0.28† | 0.16 | −0.64** | 0.18 | −0.48** | 0.16 |
| Sev. 3 × Some Coll. | −0.32 | 0.21 | −0.13 | 0.24 | −0.49* | 0.21 |
| Sev. 3 × Coll. Degr. | −0.16 | 0.22 | −0.87** | 0.29 | −0.59* | 0.23 |
| Sev. 3 × Grad. Degr. | −0.23 | 0.24 | −0.51† | 0.27 | −1.05*** | 0.26 |
| Sev. 3 × Inc. Quartile 2 | −0.25 | 0.23 | 0.10 | 0.26 | −0.37 | 0.23 |
| Sev. 3 × Inc. Quartile 3 | −0.05 | 0.22 | −0.03 | 0.26 | −0.31 | 0.22 |
| Sev. 3 × Inc. Quartile 4 | −0.72** | 0.23 | −0.39 | 0.28 | −0.46* | 0.23 |
| Severity 4 interactions | ||||||
| Sev. 4 × Female | −0.39* | 0.17 | −1.00*** | 0.20 | −0.82*** | 0.18 |
| Sev. 4 × Some Coll. | −0.45* | 0.23 | −0.24 | 0.26 | −0.48* | 0.24 |
| Sev. 4 × Coll. Degr. | −0.25 | 0.24 | −0.54† | 0.30 | −0.77** | 0.25 |
| Sev. 4 × Grad. Degr. | −0.67* | 0.26 | −0.40 | 0.29 | −1.27*** | 0.28 |
| Sev. 4 × Inc. Quartile 2 | −0.20 | 0.24 | −0.18 | 0.28 | −0.23 | 0.25 |
| Sev. 4 × Inc. Quartile 3 | −0.26 | 0.24 | −0.33 | 0.27 | −0.53* | 0.25 |
| Sev. 4 × Inc. Quartile 4 | −0.63* | 0.25 | −0.89** | 0.29 | −0.63* | 0.26 |
| Wald chi-square (df = 33) | 1,730.22*** | 1,977.15*** | 1,786.30*** | |||
Note: Results from Model B regressions. Omitted categories are male, high school or less, and income quartile 1 (poorest). Models also include age dummies (categorized as in Table 1); not shown.
† p <.10. *p < .05. **p < .01. ***p < .001, two-tailed.
Examining the No Disease columns in Table 3, one sees that there are no significant violations of VE by subgroup for the Severity 2 vignette, but some significant violations by gender, education, and income for the Severity 3 and 4 vignettes. Women seem to perceive the Severity 4 vignette as representing worse health (relative to the reference vignette) than do men (β = −0.39, p < .05). Similarly, the highest-income respondents perceive vignettes 3 and 4 more negatively than do the lowest-income respondents (β = −0.72, p < .01; and β = −0.63, p < .05, respectively), and respondents with some college or graduate degrees perceive the Severity 4 vignette more negatively than do those with high school education or less (β = −0.45, p < .05; and β = −0.67, p < .05, respectively). These findings confirm that VE is violated in the No Disease series, though also show that the Severity 2 and 3 vignettes fare relatively well in this regard; most significant violations of VE manifest in the Severity 4 interactions.
Examining results for the Heart Disease and Diabetes series, one sees more frequent violations of VE. Violations of VE by sex are especially consistent, with women perceiving worse health than men for vignette characters in all three severities of vignettes, in both series. These violations are often substantively quite large, for example, the distance between men’s and women’s perceived locations of the Severity 4 Heart Disease vignette (β = −1.00, p < .001) is approximately as large as the mean distance between adjacent severities of vignettes. One could thus argue that for women, the Severity 4 vignette effectively functions as if it were a Severity 5 vignette. Large differences in perceived vignette locations are also found across educational groups. In the Heart Disease series, college graduates see the Severity 3 vignette as representing significantly worse health than do respondents with no college education (β = −0.87, p < .01). In the Diabetes series, all three educational dummies predict significant more negative perceptions of vignettes, for all severities. Respondents with graduate degrees show particularly large differences vis-à-vis those with high school diplomas or less for the Severity 3 and 4 vignettes (β = −1.05, p < .001; and β = −1.27, p < .001; respectively). Again, the magnitude of these differences is on par with that between adjacent vignette severities. In sum, members of different educational categories perceive identical vignettes in substantially different ways, with higher education predicting larger perceived distance between vignettes.
Differences across income quartiles are generally smaller and less consistent than for education, though also show a general tendency for higher income to predict more negative perceptions of vignettes. Thus the Severity 4 Heart disease vignette is perceived more negatively by members of the richest income quartile (β = −0.89, p < .01) and the same pattern is seen for Severities 3 and 4 in the Diabetes series (β = −0.46, p < .05; and β = −0.63, p < .05, respectively). As mentioned, no clear pattern was evident across age groups (not shown); this may reflect the relatively narrow age distribution of the sample.
Supplementary Appendix C presents the version of Model B pooling all three series, and including three-way interactions of severity, covariate, and series dummies. Not all cross-series differences in VE violations described above are statistically significant in Appendix C but a number are: Violations by sex are larger in the Heart Disease series than in the No Disease series for Severities 3 and 4 (p < .10 and p < .001, respectively) and are marginally significantly larger for Diabetes Severity 4 (p < .10). The Heart Disease Severity 3 vignette also shows significantly larger violations of VE for college degree or graduate degree holders than does the comparable No Disease vignette and the same holds for graduate degree holders in the Diabetes Severity 3 vignette (p < .05 in all cases). Violations of VE by income are marginally significantly larger in the Heart Disease Severity 3 vignette than in the No Disease analog (for quartiles 2 and 4).
Overall, then, the present findings suggest that violations of VE are larger in disease-specific series than in the No Disease series, and that these differences manifest particularly strongly across sex and educational groups, and to a lesser extent across income groups. The medical details introduced in the disease-specific vignettes appear to undermine rather than enhance their validity.
Findings Regarding RC
Findings regarding RC are presented in Supplementary Appendix D. Although the tests used are not definitive, they suggest reasonably good adherence to RC across series, albeit with vignettes referring to heart disease or diabetes appearing to perform slightly worse than No Disease vignettes.
Theorizing Violations of VE
In the Bago D’Uva et al. (2011) global (LR-based) test of VE, results are unaffected by choice of reference vignette. However, the specific contrasts that emerge in Model B do depend on reference vignette. For example, while current findings suggest that women perceive Severity 4 vignettes more negatively than men, a model treating Severity 4 as the reference vignette would show women to perceive Severity 1 vignettes more positively then men (and in actuality, gender differences could affect perceptions of both vignettes). Thus, one cannot straightforwardly interpret Model B as indicating precisely which vignettes drive VE violations.
Nonetheless, if one assumes that VE violations are likely smaller for Severity 1 than for other Severities (due to the former’s less frequent mention of potentially unfamiliar medical procedures), and that Severity 1 is thus a reasonable reference vignette, one can propose some explanations of the findings in Table 3.
First, one reason Severity 4 may be the worst-performing No Disease vignette is that it is the only one in the series to mention “spen[ding] a few nights in a hospital.” Why might mention of a hospital matter? Research finds that low socioeconomic status (SES) patients often prefer hospital-based medical care to clinical care, because they consider it more affordable, more convenient, and of better quality (Kangovi et al., 2013). In contrast, higher SES patients are more likely to attend to minor health ailments via office visits, so that hospital stays represent truly grave health problems. This could explain why higher SES respondents perceive the Severity 4 vignette more negatively.
Next, why might women perceive Heart Disease and Diabetes vignettes more negatively than men? One possibility is that women are generally more risk-averse regarding their health (Courtenay, 2000), so that any potentially serious health diagnosis is perceived more negatively by women. Another possibility is that heart disease-related risk factors and procedures are less familiar and hence more daunting to women. While cardiovascular disease is the leading cause of death of both sexes in the United States, it typically “develops 7 to 10 years later in women than in men” (Maas & Appelman, 2010:598), so that in any given age stratum, men are substantially more likely to experience heart attacks or coronary heart disease (Mosca, Barrett-Connor, & Wenger, 2011). Women also have slightly lower age-adjusted rates of diabetes than men (Centers for Disease Control and Prevention, 2015).
Similarly, the consistent violations of VE by educational category in the Diabetes series—and somewhat less consistent violations in the Heart Disease series—may reflect less direct familiarity with these conditions among the better educated, due to the strong educational gradient in their prevalence (Fiscella & Tancredi, 2008; Smith, 2007). The general pattern across these findings, then, is that less direct experience with a medical condition or treatment predicts relatively negative evaluations of it, while familiarity predicts relatively positive evaluations. This hypothesis is speculative but invites further investigation.
Discussion
Kapteyn et al., acknowledging the paucity of research on how to optimize anchoring vignette wording, call for “a systematic experimental approach to the design of anchoring vignettes” (Kapteyn et al., 2011, p. 1). The current study echoes and heeds this call, by experimentally testing whether mentioning specific health conditions and/or procedures affects vignettes’ adherence to key measurement assumptions, in particular VE. Results suggest that the Heart Disease and Diabetes series provoke substantially larger violations of VE than does the No Disease series, especially violations by sex and education (and to a lesser degree, income). With greater differences in wording across series and/or with a more diverse sample, one would expect the cross-series differences to be even larger. For example, young adults might have less familiarity with terms such as “angioplasty,” and thus show greater violations of VE if presented with such terms. Regarding RC, current tests identified relatively minor violations, which appeared somewhat larger for Heart Disease and Diabetes vignettes—further supporting the preferability of the No Disease vignettes. In these analyses, no evidence for a trade-off between VE and RC was found.
Violations of VE were admittedly observed even in the No Disease series. Such violations appear largely driven by the Severity 4 vignette, however, which mentioned spending nights in a hospital. As argued above, socioeconomic differences in use of hospitals for medical care may lead to different interpretations of this vignette. If the Severity 4 vignette had been excluded, the No Disease series would appear to perform rather well.
This study has a number of important limitations. First, as noted in Supplementary Appendix D, the current tests of RC are not definitive, and thus may not be accurate enough to identify trade-offs between VE and RC. Moreover, even if the current RC findings are accurate, the sample was very homogenous in terms of race, age, and geographic origin. In a less homogeneous group, one might observe larger and/or different violations of VE, and potentially clearer violations of RC as well (e.g., if young adults picture vignette characters who have trouble climbing stairs as older than themselves, then they may use substantially different thresholds for vignette than for self-ratings). Another limitation is that the three vignettes series differ in length and in number of presented health dimensions. However, problems with disease-specific vignettes are unlikely to result primarily from their slightly greater length. Previous research has found no significant difference in ratings of vignettes with prepended versus appended disease-specific texts (Grol-Prokocpzyk et al., 2011), indicating that respondents attend equally well to initial and final content. Similarly, it seems unlikely that the move from four to five health dimensions explains the poorer performance of the disease-specific vignettes, because differences in performance are observed even between Heart Disease and Diabetes vignettes (which have similar length and dimensionality). Differences in vignette content—specifically, invocations of more or less familiar health problems/procedures—appear to provide more plausible explanations of these findings. Future research with more diverse respondents and more precisely designed vignettes could clarify the generalizability and interpretation of the present results.
Additional limitations relate to the Bago D’Uva et al. (2011) test of VE. First, the average latent distances between vignettes differ somewhat across the three series. While, to some extent, this is a direct reflection of different VE violations across series, it is possible that VE violations are more likely to reach statistical significance when vignettes in a series are farther apart. On the other hand, due to response category floor or ceiling effects, it is also possible that violations of VE could be underestimated when vignette characters have particularly good or bad health. Such considerations further underscore that the current findings should be viewed as suggestive but not conclusive. Finally, as noted earlier, this test cannot distinguish between true violations of VE and use of different intercategory thresholds in rating different vignettes. However, as discussed, there are strong theoretical reasons to suspect genuine violations of VE. Evidence of RC between vignette ratings and self-ratings (here and in prior research) also suggests that differential use of thresholds is a less likely explanation of the findings. Regardless, since either scenario undermines the anchoring vignette method, the present results support the use of more universal vignette texts.
Assuming that the current findings are broadly correct, what can we conclude about the value of specific, concrete details in vignette texts? I do not argue that specificity and concreteness are inherently problematic (perhaps the opposite), but that they may become problematic if they evoke different associations across groups, or mention concepts that are more familiar to some groups than to others (since, as argued above, familiarity appears to breed relative positivity in perceptions of vignettes). In other words, these findings suggest that universality should trump specificity and concreteness as a priority in vignette design. Most of the health problems depicted in the No Disease series—pain, fatigue, difficulty bending—are universally experienced and appear for this reason to invite fewer violations of VE than references to diabetes, angioplasty, etc.
Su et al., after discovering that many respondents were not sure how far “20 meters” was, edited their vision vignette to refer to distances in concrete but nonnumeric terms: “In the cafeteria, [Xiao Wang] can clearly recognize students sitting at his table, but not those sitting at the next table” (Su et al., 2017). This is a nice example of how a vignette can be concrete without depending on specialized knowledge. Of course, quantification need not always undermine VE: e.g., van Soest and colleagues (2011) convincingly argue that number of alcoholic beverages is a widely understood measure of drinking behavior among Irish university students.
Experiments testing the effects of vignette wording remain rare—and are particularly needed given that surveys frequently borrow vignette texts from each other, leading to an unexpectedly small pool of health vignettes in circulation. Experiments are not the only tool for improving vignette wording, however. Many of Su et al.’s (2017) most insightful findings emerged from cognitive interviews with respondents; similarly, Au & Lorgelly’s (2014) use of interviews fruitfully clarified factors affecting RC. Admittedly, interviewing as part of pretesting a survey can be expensive and laborious, and not all respondents are comfortable with “think-aloud” or related questioning procedures (Pasick et al., 2001). Nonetheless, combining qualitative and quantitative experimental research has potential to improve vignette wording, and ideally to help generate anchoring vignettes that work.
Finally, I note that research on improving VE may be of broad interest to survey designers, even those who do not work with anchoring vignettes. After all, minimizing group differences in interpretation of survey texts is a widespread goal (Angel, 2013) and the pursuit of “vignette equivalence” could easily be generalized to the pursuit of “functional equivalence,” that is, equivalence of survey item meanings across diverse groups of respondents (Pan & Fond, 2014). This suggests that universality may be preferable to specificity in cross-national or other comparative survey items, even apart from anchoring vignettes—and that anchoring vignette researchers may find willing collaborators in other areas of survey research.
Supplementary Material
Supplementary data is available at The Journals of Gerontology, Series B: Psychological Sciences and Social Sciences online.
Funding
This research uses data from the Wisconsin Longitudinal Study (WLS) of the University of Wisconsin-Madison. Since 1991, the WLS has been supported principally by the National Institute on Aging at the National Institutes of Health (grant numbers AG-9775, AG-21079, AG-033285, and AG-041868), with additional support from the Vilas Estate Trust, the National Science Foundation, the Spencer Foundation, and the Graduate School of the University of Wisconsin-Madison. Since 1992, data have been collected by the University of Wisconsin Survey Center. A public use file of data from the Wisconsin Longitudinal Study is available from the Wisconsin Longitudinal Study, University of Wisconsin-Madison, 1180 Observatory Drive, Madison, Wisconsin 53706 and at http://www.ssc.wisc.edu/wlsresearch/data/. The opinions expressed herein are those of the author.
Conflict of Interest
The authors declare no conflict of interest.
Supplementary Material
Acknowledgments
I gratefully acknowledge Dr Márton Ispány’s assistance and contributions to the statistical code used herein.
References
- Abdulrahim S., & Ajrouch K (2010). Social and cultural meanings of self-rated health: Arab immigrants in the United States. Qualitative Health Research, 20, 1229–1240. doi:10.1177/1049732310371104 [DOI] [PubMed] [Google Scholar]
- Angel R. J. (2013). After Babel: Language and the fundamental challenges of comparative aging research. Journal of Cross-Cultural Gerontology, 28, 223–238. doi:10.1007/s10823-013-9197-2 [DOI] [PubMed] [Google Scholar]
- Au N., & Lorgelly P. K (2014). Anchoring vignettes for health comparisons: An analysis of response consistency. Quality of Life Research, 23, 1721–1731. doi:10.1007/s11136-013-0615-2 [DOI] [PubMed] [Google Scholar]
- Bago d’Uva T. Lindeboom M. O’Donnell O., & van Doorslaer E (2011). Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. Journal of Human Resources, 46, 875–906. doi:10.3368/jhr.46.4.875 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Centers for Disease Control and Prevention (2015). Age-adjusted rates of diagnosed diabetes per 100 civilian, non-institutionalized population, by sex, United States, 1980–2014 Retrieved from http://www.cdc.gov/diabetes/statistics/prev/national/figbysex.htm.
- Corrado L., & Weeks M (2010). Identification strategies in survey response using vignettes. Cambridge Working Papers in Economics. Faculty of Economics, University of Cambridge; Retrieved from https://ideas.repec.org/p/cam/camdae/1031.html. [Google Scholar]
- Courtenay W. H. (2000). Constructions of masculinity and their influence on men’s well-being: A theory of gender and health. Social Science & Medicine, 50, 1385–1401. doi:10.1016/S0277-9536(99)00390-1 [DOI] [PubMed] [Google Scholar]
- Dowd J. B., & Todd M (2011). Does self-reported health bias the measurement of health inequalities in U.S. adults? Evidence using anchoring vignettes from the health and retirement study. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 66B, 478–489. doi:10.1093/geronb/gbr050 [DOI] [PubMed] [Google Scholar]
- Fiscella K., & Tancredi D (2008). Socioeconomic status and coronary heart disease risk prediction. JAMA, 300, 2666–2668. doi:10.1001/jama.2008.792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greene W. H., & Hensher D. A (2010). Modeling ordered choices: a primer. Cambridge: Cambridge University Press. [Google Scholar]
- Grol-Prokopczyk H., Freese J., & Hauser R.M Using anchoring vignettes to assess group differences in general self-rated health. Journal of Health and Social Behavior, 52, 246–261. doi: 10.1177/0022146510396713 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grol-Prokopczyk H. Age and sex effects in anchoring vignette studies: Methodological and empirical contributions. Survey Research Methods, 8, 1–17. doi:10.1007/s13524-015-0422-1 [PMC free article] [PubMed] [Google Scholar]
- Grol-Prokopczyk H., Verdes-Tennant E., McEniry M., & Ispány M (2015). Promises and pitfalls of anchoring vignettes in health survey research. Demography, 52, 1703–1728. doi:10.1007/s13524-015-0422-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hauser R. M. Sewell W. H., & Herd P (2014). Wisconsin Longitudinal Study (WLS) [graduates, siblings, and spouses]: 1957–2012 Version 13.03. Machine-readable data file Retrieved from: http://www.ssc.wisc.edu/wlsresearch/documentation/.
- Herd P. Carr D., & Roan C (2014). Cohort profile: Wisconsin longitudinal study (WLS). International Journal of Epidemiology, 43, 34–41. doi:10.1093/ije/dys194 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirve S., Gómez-Olivé X., Oti S., et al. (2013). Use of anchoring vignettes to evaluate health reporting behavior amongst adults aged 50 years and above in Africa and Asia—testing assumptions. Global Health Action, 6, Article 21064. doi:10.3402/gha.v6i0.21064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Idler E. L., & Benyamini Y (1997). Self-rated health and mortality: A review of twenty-seven community studies. Journal of Health and Social Behavior, 38, 21–37. doi:10.2307/2955359 [PubMed] [Google Scholar]
- Jürges H., & Winter J (2013). Are anchoring vignettes ratings sensitive to vignette age and sex?Health Economics, 22, 1–13. doi:10.1002/hec.1806 [DOI] [PubMed] [Google Scholar]
- Kangovi S. Barg F. K. Carter T. Long J. A. Shannon R., & Grande D (2013). Understanding why patients of low socioeconomic status prefer hospitals over ambulatory care. Health Affairs, 32, 1196–1203. doi:10.1377/hlthaff.2012.0825 [DOI] [PubMed] [Google Scholar]
- Kapteyn A. (2010). What can we learn from (and about) global aging?Demography, 47, S191–S209. doi:10.1353/dem.2010.0006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapteyn A. Smith J. P. van Soest A., & Vonkova H (2011). Anchoring vignettes and response consistency. Santa Monica, CA: RAND Corporation; Retrieved from http://www.rand.org/pubs/working_papers/WR840.html. [Google Scholar]
- King G. Murray C. J. L. Salomon J. A., & Tandon A (2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review, 98, 191–207. doi:10.1017/S000305540400108X [Google Scholar]
- King G., & Wand J (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15, 46–66. doi:10.1093/pan/mpl011 [Google Scholar]
- Maas A. H. E. M., & Appelman Y. E. A (2010). Gender differences in coronary heart disease. Netherlands Heart Journal, 18, 598–602. doi:10.1007/s12471-010-0841-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molina T. (2016). Reporting heterogeneity and health disparities across gender and education levels: Evidence from four countries. Demography, 53, 295–323. doi:10.1007/s13524-016-0456-z [DOI] [PubMed] [Google Scholar]
- Mosca L. Barrett-Connor E., & Wenger N. K (2011). Sex/gender differences in cardiovascular disease prevention: What a difference a decade makes. Circulation, 124, 2145–2154. doi:10.1161/circulationaha.110.968792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murray C. J. L. Özaltin E. Tandon A. Salomon J. A. Sadana R., & Chatterji S (2003). Empirical evaluation of the anchoring vignette approach in health surveys. In Murray C. J. L., Evans D. B. (Eds.), Health systems performance assessment: debates, methods and empiricism (pp. 369–399). Geneva: World Health Organization. [Google Scholar]
- National Institute on Aging (2012). Harmonization strategies for behavioral, social science, and genetic research. Workshop Summary Report Retrieved from https://www.nia.nih.gov/sites/default/files/nia_bssg_harmonization_summary_version_2-5-20122.pdf.
- Paccagnella O. (2013). Modelling individual heterogeneity in ordered choice models: Anchoring vignettes and the Chopit Model. QdS - Journal of Methodological and Applied Statistics, 15, 69–94. [Google Scholar]
- Pan Y., & Fond M (2014). Evaluating multilingual questionnaires: A sociolinguistic perspective. Survey Research Methods, 8, 181–194. doi:10.18148/srm/2014.v8i3.5483 [Google Scholar]
- Pasick R. J. Stewart S. L. Bird J. A., & D’Onofrio C. N (2001). Quality of data in multiethnic health surveys. Public Health Reports, 116, 223–243. doi:10.1093/phr/116.S1.223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rice N. Robone S., & Smith P (2011). Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. European Journal of Health Economics, 12, 141–162. doi:10.1007/s10198-010-0235-5 [DOI] [PubMed] [Google Scholar]
- Sanchez G. R., & Vargas E. D (2016). Language bias and self-rated health status among the Latino population: Evidence of the influence of translation in a wording experiment. Quality of Life Research, 25, 1131–1136. doi:10.1007/s11136-015-1147-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sen A. (2002). Health: Perception versus observation. BMJ, 324, 860–861. doi:10.1136/bmj.324.7342.860 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shetterly S. M. Baxter J. Mason L. D., & Hamman R. F (1996). Self-rated health among Hispanic vs non-Hispanic white adults: The San Luis Valley Health and Aging Study. American Journal of Public Health, 86, 1798–1801. doi:10.2105/AJPH.86.12.1798 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith J. P. (2007). Diabetes and the rise of the SES health gradient. National Bureau of Economic Research, Working Paper 12905. Cambridge, MA: Retrieved from http://www.nber.org/papers/w12905.pdf. [Google Scholar]
- Su Y. Willis G., & Salomon J. A (2017). Improving vignette descriptions and question formats to measure distance vision: Evidence from cognitive interviews among students in China. Field Methods, prepublished January 1, 2017. doi:10.1177/1525822X16680810 [Google Scholar]
- van Soest A. Delaney L. Harmon C. Kapteyn A., & Smith J. P (2011). Validating the use of anchoring vignettes for the correction of response scale differences in subjective questions. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(3), 575–595. doi:10.1111/j.1467-985X.2011.00694.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Soest A., & Vonkova H (2014). Testing the specification of parametric models by using anchoring vignettes. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 115–133. doi:10.1111/j.1467-985X.2012.12000.x [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
