Summary
Reliance on self-rated health to proxy medical need can bias estimation of education-related inequity in healthcare utilization. We correct this bias both by instrumenting self-rated health with objective health indicators and by purging self-rated health of reporting heterogeneity that is identified from health vignettes. Using data on elderly Europeans, we find that instrumenting self-rated health shifts the distribution of visits to a doctor in the direction of inequality favouring the better educated. There is a further, and typically larger, shift in the same direction when correction is made for the tendency of the better educated to rate their health more negatively.
Keywords: Equity, Health, Healthcare, Reporting heterogeneity, Vignettes
1. Introduction
Throughout most of Europe there is a stated policy aspiration to equitable provision of healthcare. This is often interpreted as the horizontal equity principle of equal treatment for equal need. In empirical work, the prevailing approach is to measure horizontal inequity by the degree to which utilization is related to socio-economic status (SES) after controlling for differences in needs (Wagstaff and van Doorslaer, 2000a, b). Existing European evidence reveals a bias favouring those of low SES in the distribution of primary care but inequity to the advantage of higher socio-economic groups in specialist care (van Doorslaer et al., 2000, 2004).
The validity of this evidence is contingent on the adequacy of the measures of need employed. As comprehensive, objective, health measures are seldom available in the general population surveys that provide data on healthcare utilization and SES, researchers must usually rely on self-reported categorical health indicators. It is, however, possible that two equally healthy individuals report different levels of health because their conceptions of good health and their health expectations are contingent on their knowledge of disease and available medical treatments, as well as the observed health of their peers. To use a much-cited example from India, self-reported morbidity rates are highest in Kerala, the state with the highest life expectancy, the highest rate of literacy and the most extensive public health services (Drèze and Sen, 2002; Sen, 2002). They are higher still in the USA (Murray and Chen, 1992; Sen, 2002). If health perceptions also vary with SES, then self-reported health will not provide an unbiased benchmark of needs against which to measure inequity in healthcare utilization. Then, as argued by van Doorslaer et al. (2004), evidence of higher rates of utilization of general practitioner (GP) services in Europe among those of lower SES may not be indicative of inequity favouring the poor but, rather, a tendency of more disadvantaged groups to under-report morbidity. This has long been recognized (e.g. O’Donnell and Propper (1991)) but, hitherto, it has been difficult to do much about it. It has been shown that inclusion of a larger battery of health indicators (e.g. van Doorslaer et al. (2000)) or allowance for unobserved individual heterogeneity (e.g. Bago d’Uva et al. (2009)) ameliorates the problem. But, with data on self-reports only, the fundamental problem remains that differences in reporting behaviour cannot be disentangled from differences in true health.
Anchoring vignettes offer a potential solution to the problem. In addition to rating their own health on a categorical scale, survey respondents rate the health of a hypothetical case of described health problems (Tandon et al., 2003; King et al., 2004). Since the description is fixed, systematic differences in ratings of the vignette provide evidence of heterogeneity in reporting thresholds. Under assumptions, these differences in reporting styles can then be purged from the ratings of own health.
This study uses vignettes to identify differences in the reporting of health by education, as well as age, gender and country, and determines the effect of correcting for such differences on the measurement of education-related inequity in the utilization of primary and specialist medical care. The focus on disparities by education is partly driven by the data that are used, which come from the Survey on Health, Ageing and Retirement in Europe (SHARE) that randomly samples from populations aged 50 years and over in 12 countries (Börsch-Supan and Jürges, 2005a). For older populations, education is a better indicator of long-term SES than income, which has been used in most previous European studies of inequity in healthcare utilization (van Doorslaer et al., 2000, 2004; Bago d’Uva et al., 2009). Besides being a proxy for socio-economic inequality, disparities in education are of direct interest because of the strong association that exists between education and health (Smith and Kington, 1997; Mackenbach et al., 2008; Cutler and Lleras-Muney, 2008). There is growing evidence from high income countries that this relationship has a causal basis (Lleras-Muney, 2005; Oreopoulos, 2006; van Kippersluis et al., 2011), unlike that between financial resources and health (Frijters et al., 2005; Smith, 2007). A potential causal mechanism is differential utilization of medical care by education, possibly operating through intervening effects on income and health insurance coverage (Glied and Lleras-Muney, 2008). One US study found an education advantage in adoption of the most recent medicines that is not affected by control for income and health insurance coverage, which suggests that the acquisition and utilization of information on medical conditions and treatments is a more plausible mechanism (Lleras-Muney and Lichtenberg, 2002). In the European context of universal and mostly publicly funded health insurance, financial barriers are likely to be even less important, relative to information, in explaining any education gradient in the use of medical care. Our aim in this paper is to establish the extent of the disparity in education in utilization, irrespective of the mechanisms through which it may operate, and so we purposely do not control for income or supplementary health insurance coverage.
There are reasons to expect styles of reporting health to vary by education. Superior information acquisition skills which more highly educated individuals have increase the likelihood that they will recognize and report symptoms of disease. The strong correlation between health and education may generate peer effects, with low and high educated individuals reporting health relative to the average level observed among their respective associates. Both arguments imply that the higher educated will report their health more negatively. One strategy that has been adopted to test this hypothesis is to check whether self-reported health is a stronger predictor of mortality for more highly educated individuals. The evidence from both Europe and the USA is mixed (Beam Dowd and Zajacova, 2007; Huisman et al., 2007; Jurges, 2008; Singh-Manoux et al., 2007). Adopting a different strategy, Lindeboom and van Doorslaer (2004) found no education-related variation in reported health in Canada after conditioning on a presumed sufficiently comprehensive and objective indicator of health—the McMaster health utility index. Vignettes offer a more direct test. Using the same SHARE data that are employed in the present paper, Bago d’Uva et al. (2008) found that the more highly educated rate the health of vignette descriptions more severely in six of the eight countries that were examined. On the basis of this evidence, we can predict that an observed education gradient in the response of healthcare demand to a given reported level of health will understate the true extent to which the better educated make greater use of healthcare for a given health condition. In this paper, we estimate the magnitude of this bias.
With one exception (Kapteyn et al., 2009), previous applications of the vignettes approach have been concerned with identification of and correction for reporting heterogeneity in the context of a descriptive analysis of a variable measured with error (e.g. King et al. (2004), Kapteyn et al. (2007) and Bago d’Uva et al. (2008)). To our knowledge, this is only the second study to use vignettes to correct for measurement error in an independent variable and it is the first to do so in modelling health and healthcare. This involves development of a joint model of healthcare utilization and health reporting.
Identification of reporting heterogeneity by using vignettes rests on two assumptions—vignette equivalence and response consistency (see Section 3.3). In a previous paper, we presented evidence against both assumptions in the health domains of mobility and cognition for a sample of the older population of England (Bago d’Uva et al., 2011). Notwithstanding the doubt that is cast by these results on the validity of the approach, it is still possible that the method shifts the distribution of health in the correct direction. A priori we expect that education-related inequity is underestimated when no account is taken of differences in reporting styles. We establish the extent to which this is confirmed when vignettes are used to adjust for reporting heterogeneity. A partial, but by no means complete, alternative to vignettes is to instrument self-reported health by using more objective health indicators (Bound, 1991). We examine the marginal effect of using vignettes to estimate systematic variation in reporting scales explicitly, in addition to exploiting objective health indicators. This allows us to gauge the marginal return to inclusion of a vignettes module in a health survey, in addition to measures of physical functioning, such as grip strength, and data on chronic conditions.
The next section describes the SHARE data. Following that, we propose a model of visits to a doctor as a function of medical need and explain how both objective health indicators and vignettes can be used to reduce bias resulting from the use of self-rated health as a proxy for need. Section 4 presents our findings on the extent to which the two corrections for measurement error in need impact on the estimation of education-related inequity in primary and specialist care. The final section concludes.
2. Data
2.1. Sample
We use data from the first wave of the SHARE collected in 2004–2005—release version 2.0. Vignettes data were collected from supplementary probability samples, which also completed the full SHARE questionnaire, in all except four countries (Börsch-Supan et al., 2005). The overall response rate in the vignette samples was 57.7%—highest in France (77%) and lowest in Belgium (42%). The countries where the vignettes were fielded and the respective sample sizes are as follows: Belgium (n = 531), France (n = 773), Germany (n = 468), Greece (n = 646), Italy (n = 397), the Netherlands (n = 489), Spain (n = 414) and Sweden (n = 380). We pooled these samples to give a total sample of 4098 individuals. Despite the fact that the sample size relative to population differs across countries, we do not apply weights to make the sample representative of the cross-country population. For modelling purposes, this is not necessary provided that we condition on country indicators.
2.2. Healthcare utilization
We examine variation both in the number of visits to a GP and visits to a specialist. Respondents are asked to report the number of times that they saw or spoke with a medical doctor about their health in the previous 12 months. A reminder is given of the month in the previous year from which they should count. They are told to exclude dental care and hospital stays but to include consultations in the emergency room and outpatient clinics. Of the total number of visits recorded, they are asked how many were with a GP or a doctor at a health centre. We use this as our measure of GP visits. The remainder is used as the measure of specialist visits.
Although there is likely to be measurement error in the recall of the number of visits to a doctor over a period of 1 year, there is no obvious reason why this should be correlated with education, particularly after conditioning on cognitive tests of memory, as we do. Nonetheless, it should be kept in mind that, as with all previous studies, any correlated measurement error would bias the estimated education-related inequity in visits to a doctor.
Descriptive statistics on GP and specialist visits in each country are given in Table 1. The frequency of visiting a GP is highest in Italy, at just fewer than seven visits on average per year, and lowest in Sweden, where, on average, older individuals visit less than twice per year. The standard deviation is generally large relative to the mean, particularly in Greece, Italy and Spain. In all countries except Greece, the Netherlands and Sweden, 80–90% of individuals consult a GP at least once a year. It is the low probability of any contact that is responsible for the relatively low mean number of visits in Greece, whereas both the probability and the intensity of visiting are low in both the Netherlands and Sweden. Mean visits are high in Italy because of their intensity among those who make any consultation. Some substitution is suggested by the fact that Italy also has the lowest mean number of visits to a specialist, except for Sweden. Spain, another country in which the GP acts as a gate-keeper to specialist care, also has a relatively low ratio of mean specialist to GP visits. But the Netherlands also operates a GP gate-keeper system and has the highest such ratio. Sweden is striking in having the lowest mean number of consultations with both GPs and specialists. Around 40% of the sample consult a specialist each year in all the other countries except Belgium and Germany, where more than half do so. Both GP and specialist visits are highly skewed with a few individuals consulting the doctor almost every week in most countries, and twice per week in some cases.
Table 1.
Country | Results for GP visits
|
Results for specialist visits
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Standard deviation | P (visits > 0)† | Mean|visits > 0‡ | Maximum | Mean | Standard deviation | P (visits > 0) | Mean|visits > 0 | Maximum | |
Belgium | 5.011 | 5.613 | 0.887 | 5.650 | 53 | 2.215 | 4.833 | 0.544 | 4.069 | 54 |
France | 4.886 | 4.573 | 0.891 | 5.482 | 50 | 1.551 | 2.865 | 0.468 | 3.312 | 34 |
Germany | 4.154 | 4.820 | 0.870 | 4.776 | 49 | 2.667 | 5.013 | 0.575 | 4.639 | 60 |
Greece | 3.043 | 4.571 | 0.605 | 5.028 | 36 | 1.944 | 5.407 | 0.390 | 4.984 | 73 |
Italy | 6.662 | 10.756 | 0.811 | 8.214 | 98 | 1.446 | 3.070 | 0.406 | 3.565 | 30 |
Netherlands | 2.145 | 2.431 | 0.767 | 2.797 | 24 | 1.573 | 4.844 | 0.405 | 3.884 | 89 |
Spain | 6.246 | 9.484 | 0.845 | 7.389 | 90 | 1.942 | 6.055 | 0.440 | 4.418 | 98 |
Sweden | 1.655 | 2.274 | 0.650 | 2.547 | 26 | 0.971 | 2.901 | 0.316 | 3.075 | 40 |
Total | 4.211 | 6.155 | 0.794 | 5.307 | 98 | 1.805 | 4.519 | 0.447 | 4.034 | 98 |
Proportion with any visit.
Mean visits among those with any visit.
2.3. Education and demographics
Educational attainment is measured according to the international standard classification of education (ISCED 97) as (United Nations Educational, Scientific and Cultural Organization, 1997)
finished at most primary education or first stage of basic education (ISCED 0–1—primary),
lower secondary or second stage of basic education (ISCED 2—lower secondary),
upper secondary education (ISCED 3–4—upper secondary) and
recognized third-level education, which includes higher vocational education and university degree (ISCED 5–6—tertiary).
For Germany the data do not distinguish between levels (a) and (b). Education enters models as an explanatory variable represented by a set of dummy indicators with the lowest education level used as the reference category. As explained in Section 1, the purpose of the analysis is to measure education-related disparities in healthcare use conditional only on medical need. We therefore control only for indicators of health, age and gender. Exploratory analysis found no evidence of significant urban–rural differences in either health reporting or healthcare utilization. Given the very small size of the rural samples in two countries (Greece and Spain), and the possibility that location is determined by education, we do not control for this characteristic.
2.4. Self-reported health and vignettes
A self-completion drop-off questionnaire covered self-assessments of health and vignette ratings. Respondents were asked to rate problems or difficulties that had been experienced in the preceding 30 days in each of six health domains—mobility, cognition, pain, sleep, breathing and emotional health—on a five-point scale running from none to extreme. The questions referred to problems or difficulties with moving around (mobility), concentrating or remembering things (cognition), bodily aches or pains (pain), falling asleep, interrupted sleep or waking early (sleep), shortness of breath (breathing) and feeling sad, low or depressed (emotional health). For more detail on these questions and the choice of health domains the reader is referred to Bago d’Uva et al. (2008) and references therein.
After rating their own health, respondents were presented with three vignettes within each domain, each of which is a brief description of a case corresponding to a given level of difficulty in that domain (see http://ije.oxfordjournals.org/content/37/6/1375/suppl/DC1). The respondent was asked to rate the level of difficulty, or problem, represented by each vignette by using the same five response categories as were used to rate own health. One of two versions of the vignettes module—differing in the implied genders (names) of the vignette and their ordering—was randomly assigned to respondents.
Ratings of the vignettes indicate that reporting styles vary by education. To illustrate, in Table 2 we present the proportion of respondents across all countries who classify the vignette that was intended to correspond to the middle level of difficulty within each domain as experiencing none or mild difficulty. The proportions are standardized for age, gender and country and so do not reflect education-related differences in these characteristics. With the exception of cognition, there is a clear tendency for the lower education groups to be more likely to consider the health condition that is described in the vignette as being one that presents no, or only a mild degree of, difficulty. For example, whereas 24% of those with only primary level education consider pain in the arm that is relieved in the evening after work to represent no more than a mild (pain) problem, only 16% with tertiary education are so sanguine.
Table 2.
ISCED education category | Proportions for the following domains:
|
|||||
---|---|---|---|---|---|---|
Mobility | Cognition | Pain | Sleep | Breathing | Emotional health | |
Primary | 0.1669 | 0.3172 | 0.2398 | 0.1372 | 0.1032 | 0.1821 |
Lower secondary | 0.1503 | 0.3355 | 0.2030 | 0.1248 | 0.0626 | 0.1554 |
Upper secondary | 0.1226 | 0.3235 | 0.1950 | 0.0932 | 0.0668 | 0.1404 |
Tertiary | 0.1009 | 0.3118 | 0.1600 | 0.1008 | 0.0465 | 0.1236 |
Cross-country pooled sample. Rates are standardized for age, gender and country.
In a related reference using the SHARE data (Bago d’Uva et al., 2008), some of us showed the implications of these education-related differences in health reporting for measurement of heath inequality. This analysis revealed significant reporting differences by education in 29 out of 48 domain–country cases examined, with the higher educated more likely to rate a given health state negatively. Correcting these differences generally increased measured inequality and resulted in the emergence of significant health disparities to the advantage of the better educated in 18 cases. Spain and Sweden displayed different patterns of health reporting by education. In the present study, we take account of such cross-country variation not only by allowing health reporting to vary by country but also by permitting the differences in education in reporting to vary by country.
2.5. Health indicators
Education-related differences in the propensity to report health problems is further evident from variation in objective health indicators within given categories of self-rated health. This is illustrated in Table 3 for the domains of mobility, cognition and depression. In each case, lower educated individuals score worse on corresponding objective measures of health than do higher educated individuals. This is true after standardizing for differences in age, gender and the proportion of respondents from each country across the education categories. For example, among individuals reporting no problems with mobility, measured mean grip strength, which is a predictor of limitations of mobility (Rantanen et al., 1999) as well as mortality (Rantanen et al. (2003) and references therein), is 34.8 for those with no more than primary education but 35.6 for those with tertiary level education. Even larger differences are observed for those reporting moderate problems with mobility. Among individuals reporting severe or extreme difficulty in concentrating or remembering things, 36% of those with no more than primary education, but only 15% of those with tertiary education, are found to have some difficulty recalling the date, month, year or day of the week. Of the sample respondents reporting severe or extreme problems with feelings of sadness or depression, 66% of those with primary education versus 57% of those with tertiary education are identified as depression cases by using the EURO-D depression scale (Prince et al., 1999).
Table 3.
Level of difficulty | Means for the following ISCED education categories:
|
Observations | |||
---|---|---|---|---|---|
Primary | Lower secondary | Upper secondary | Tertiary | ||
Mean grip strength | |||||
Self-rated mobility problem | |||||
None | 34.7882 | 35.1748 | 35.6109 | 35.5989 | 2440 |
Mild | 33.3737 | 33.3920 | 35.0635 | 34.3488 | 913 |
Moderate | 31.4700 | 31.7039 | 32.5558 | 34.4061 | 515 |
Severe or extreme | 29.7960 | 32.7974 | 29.1409 | 30.2602 | 230 |
Proportion with less than very good orientation in time (score on date recall test) | |||||
Self-rated difficulty concentrating or remembering | |||||
None | 0.1066 | 0.0791 | 0.0588 | 0.0936 | 1834 |
Mild | 0.1430 | 0.1203 | 0.0798 | 0.0884 | 1445 |
Moderate | 0.2021 | 0.2072 | 0.1644 | 0.0914 | 653 |
Severe or extreme | 0.3646 | 0.2765 | 0.1599 | 0.1514 | 166 |
Proportion identified as depressed on EURO-D scale (Prince et al., 1999) | |||||
Self-rated problem of sadness or depression | |||||
None | 0.1094 | 0.0946 | 0.0783 | 0.0747 | 2068 |
Mild | 0.2683 | 0.2682 | 0.2674 | 0.2034 | 1172 |
Moderate | 0.5445 | 0.5935 | 0.4744 | 0.4736 | 590 |
Severe or extreme | 0.6662 | 0.6756 | 0.6028 | 0.5775 | 275 |
Observations | 1402 | 780 | 1154 | 762 | 4098 |
Cross-country pooled sample. Standardized for age, gender and country.
We exploit the health indicators for two main purposes. First, the additional health variables allow us to predict health within each domain from a model of reported health as a function of objective indicators and then to substitute these predictions in a model of visits to a doctor. This instrumental variable approach removes errors-in-variables bias from the estimated effect of need on visits to a doctor (Bound, 1991). Second, the six health domains are unlikely to be fully comprehensive in capturing needs for primary and specialist medical care. Some dimensions of health—notably sight and hearing problems—are not covered by the six domains. Chronic conditions may be under medical management that requires regular consultation with a doctor for a check-up and prescription of medication and yet be sufficiently well contained such that related health problems or difficulties are not reported.
The structural framework that we adopt is one in which an individual consults a doctor either in response to a symptom, which may provoke reporting of a health problem within one of the six health domains, or for continued management of a chronic condition. Consistent with this, we specify visits to a doctor as a function of the need for curative care and for disease management, which are proxied by the six health domains and a list of diagnosed chronic conditions respectively. For each domain, self-rated health is specified as a function of only indicators that are symptoms of a health condition or disease within that domain, or that identify such a condition or disease. In the presence of comorbidities, an indicator may be correlated with reported health in a domain without being a symptom of any recognized medical condition within that domain. For example, in the SHARE data, grip strength is correlated with cognitive problems. We do not, however, use grip strength to explain reported cognitive functioning since it is not a symptom of any cognitive problem for which medical advice may be sought. Following this logic, the choice of indicators per domain, which is listed in Table 4, was made following expert medical advice. Table 4 also lists the diagnosed conditions that are permitted to impact on doctor visits directly, and not only through one or more of the health domains.
Table 4.
Health indicators within each domain | |
Mobility | (a) Whether obese; (b) maximum grip strength; (c) number of limitations in activities of daily living; (d) number of limitations in mobility, arm function and fine motor function; (e) whether has been diagnosed with stroke, arthritis or rheumatism, hip or femoral fracture or Parkinson’s disease; (f) whether has any of the symptoms swollen legs, falling down, fear of falling down, dizziness, faints or blackouts |
Cognition | All five measured tests of cognitive functioning performed by SHARE respondents: (a) orientation in time (score of date recall test); (b), (c) immediate and delayed word recall†; (d) word finding and verbal fluency (number of animals named within a time interval); (e) numeracy‡ |
Pain | (a) Whether bothered by pain in back, knees, hips or other joints; (b) whether takes medication for joint pain or inflammation; (c) whether takes medication for other pain or for stomach burn; (d) whether has been diagnosed with cancer (excluding cancer that does not cause pain—skin, testicle and thyroid—and also cases diagnosed more than 5 years previously), stomach, duodenal or peptic ulcer, arthritis or rheumatism, or hip or femoral fracture |
Sleep | (a) Whether EURO-D depression score (Prince et al., 1999) indicates trouble with sleeping; (b) whether is bothered by sleeping problems; (c) whether takes medication for sleeping problems; (d) whether obese; (e) whether has been diagnosed with asthma, bronchitis, etc., cancer of oral cavity, larynx, pharynx and lung (dropping cases diagnosed longer than 5 years previously) |
Breathing | (a) Whether is bothered by breathlessness; (b) whether bothered by persistent cough; (c) whether has been diagnosed with chronic lung disease or asthma; (d) whether takes medication for asthma and/or chronic bronchitis |
Emotional health | (a) Score on EURO-D depression scale (Prince et al., 1999); (b) whether takes medication for depression |
Diagnosed conditions permitted to explain doctor visits directly | |
Whether has | (a) Heart attack or other heart problems; (b) high blood pressure; (c) high blood cholesterol; (d) stroke or cerebral vascular disease; (e) diabetes; (f) chronic lung disease; (g) asthma; (h) arthritis or rheumatism; (i) osteoporosis; (j) cancer or malignant tumour (except if diagnosed more than 5 years previously); (k) stomach, duodenal or peptic ulcer; (l) Parkinson’s disease; (m) cataracts; (n) hip or femoral fracture; (o) any other diagnosed condition |
Respondents are presented orally with 10 common words and asked to remember them. Word recall is tested immediately and after a short delay during which other cognitive tests are performed.
Respondents are asked to solve up to three problems requiring simple mental calculations based on real life situations. Those who fail the first question are asked an easier one (and given a total score of 1 if they answer the second incorrectly also or 2 if they answer it correctly). Those who answer the first question correctly are asked two progressively more difficult questions (and given a total score of 3 if they answer the first incorrectly, of 4 if they answer only the first correctly and of 5 if they answer them both correctly).
3. Joint model of visits to a doctor and health reporting
3.1. Identification problem
Our aim is to test for education-related variation in visits to a doctor conditional on a defined level of medical need. The problem is that need is fundamentally unobservable and its mismeasurement is likely to induce bias in the estimated education effect. This is analogous to the identification problem that was examined by Bound (1991) in the context of estimating (unobserved) disability and wage effects on labour supply. Let doctor visits y be determined as y = y(η1, η2, E), where η1 represents need for treatment of current health problems and symptoms, η2 is need for the medical management of diagnosed chronic conditions and E is education. Obviously, other determinants could be added. We shall assume that there is no problem with measurement of the second type of need. The set of diagnosed conditions that is listed in Table 4 is presumed to represent comprehensively all those for which an individual may consult for medication or check-up. Self-reported health problems (h) are potential proxies for the first type of need. There are two problems with this approach. First, substituting h for η1 in a model of doctor visits will introduce errors in variables that will result in an underestimation of the need effect and will spill over to biased estimation of the education effect unless, conditional on self-rated health, there is no correlation between education and need. Second, as illustrated in Tables 2 and 3, it is likely that reporting styles vary with education, such that h = h(η1, E). This introduces an endogeneity bias to the estimated education effect on visits to a doctor that will be present even in the absence of any errors-in-variables bias.
Using objective health indicators, such as those listed in Table 4, to instrument self-rated health in a model of visits to a doctor does not solve the problem provided that there is education-related reporting heterogeneity. The need effect would be consistently estimated but the education effect would not. As Bound (1991) demonstrated, such a model is only identified in the presence of outside information about reporting behaviour. We use vignettes to identify that information and then impose it in a model of visits to a doctor as a function of education and a measure of need purged of reporting heterogeneity. We compare the estimates from this model with those from two others. The first uses the raw self-rated health variables to proxy need and is vulnerable to bias both from errors in variables and from education-related reporting errors. The second instruments self-rated health with objective indicators and so deals with the first bias but not the second.
3.2. Hurdle model of visits to a doctor
The descriptive statistics that were presented in Table 1 reveal a substantial proportion of individuals who report no visits to a doctor, particularly specialist consultations, in the previous year. To allow for the likelihood that the stochastic process that is responsible for observing zero visits differs from that generating a positive number of visits, we estimate a hurdle model. In particular, we specify a logit model for the probability of any visits and a truncated-at-zero negative binomial II model for the count of positive visits (Grootendorst, 1995; Gurmu, 1998; Winkelman, 2004). Let yi denote the number of visits to the doctor (GP or specialist) made by individual i in the previous year and Ii = 1(yi > 0), where 1(·) is the indicator function. The probability of observing a given number of visits to a doctor is (Gurmu, 1998; Winkelman, 2004)
(1) |
where
λsi = exp(Xiβs), s = 1, 2, Γ (·) is the gamma function and α > 0 denotes an overdispersion parameter that is estimated. It is assumed that the two parts of the model, i.e. the participation decision and the positive number of visits, are stochastically independent and so the log-likelihood factorizes into two components that can be estimated separately.
The vector Xi includes four types of variable:
education,
demographics and country indicators,
health specific to the six reported domains (Hi) and
diagnosed health conditions or diseases.
Education is interacted with the country indicators in the most flexible model estimated. We estimate with models using alternative sets of domain-specific health variables. First, we simply use the reported category indicating the degree of difficulty that was experienced within each health domain, Hi = (H1i, H2i, …, HDi). In this case, self-reports are taken at face value and the estimates are vulnerable to both types of bias referred to above. Presuming that visits to a doctor increase with true medical need and, as is likely, need is negatively correlated with education, then the errors-in-variables bias will result in an underestimation of the effect of education on visits to a doctor (Bound (1991), page 111). If the better educated understate their health, then this will further bias the education effect downwards. The degree of the latter bias will increase both with the degree of reporting heterogeneity and with the responsiveness of healthcare use to need.
Our second approach is to proxy need for curative care with domain-specific health scores derived from ordered probit models of self-rated health as a function of more objective health indicators, as well as education, demographics and country indicators. The category reported in domain d, Hdi, is assumed to be generated by the position of a latent health index , which is specified as
(2) |
relative to a set of fixed thresholds , such that
(3) |
where Zdi is the corresponding vector of health indicators listed in Table 4 and Wi includes education, age, sex and country, interacted with education. The predicted latent health scores from these models, , are entered into the hurdle model for visits to a doctor.
Besides non-linearity, identification relies on the exclusion of health indicators from the model of visits to a doctor. The exclusion restrictions follow from the presumption that individuals consult either for the treatment of current symptoms in the six health domains or for management of chronic conditions. All health indicators other than diagnosed chronic conditions are permitted to impact on utilization only through health problems in the six health domains. For example, conditional on mobility problems, grip strength is assumed to have no direct influence on visits to a doctor. Cognitive test scores are presumed to influence consultations only through problems of concentration and memory. The excluded instruments for each health domain are given by the indicators that are listed in the respective row of Table 4, with the exceptions of those also appearing in the bottom row that enter the doctor visits model directly.
Constancy of the thresholds across individuals implies an assumption of reporting homogeneity. If this does not hold, in particular, if the thresholds vary with some of the covariates, then the predicted health scores will reflect not only information on true health but also variation in the reporting of that health. So, although this instrumental variables approach corrects errors-in-variables bias deriving from the mismeasurement of need, it does nothing to deal with the bias due to education-related reporting errors. Under the assumptions that were stated above, the education effect will continue to be underestimated, although not to the same extent as when raw self-reported health is used to proxy need.
3.3. Heterogeneous reporting behaviour
Our third specification of the domain-specific health variables uses predictions of latent health scores from an extended ordered probit model in which the reporting thresholds are made functions of individual characteristics and so the parameters of the latent index represent true health effects, and not a mixture of health and reporting effects. This hierarchical ordered probit (HOPIT) model (King et al., 2004) is identified by using the information from the vignette ratings. The validity of the approach rests on two assumptions—vignette equivalence and response consistency. The first corresponds to the assumption that, up to random measurement error, all respondents understand the vignette description as corresponding to the same level of functioning on a unidimensional scale. This is required so that responses to a given vignette can be interpreted as reflecting heterogeneity in the reporting of a given level of functioning. In the present context, the assumption is made more plausible by the separation of health into six domains with vignettes being rated within each of these. It does require that language translation does not distort vignette descriptions, resulting in differences in their interpretation across countries. Given that improvement of cross-country comparability was the main motivation for the inclusion of vignettes in the SHARE study, we presume that sufficient care was taken with their translation. The second assumption is that respondents rate the vignettes in the same way as they do their own health. If this did not hold then it would not be valid to impose the thresholds that were identified from the vignettes ratings on the reporting of own health, and so the true health effects would not be identified. To date, there has been little formal testing of these identifying assumptions. On vignette equivalence, see Murray et al. (2003), Kristensen and Johansson (2008), Bago d’Uva et al. (2011) and Rice et al. (2011). On response consistency, see van Soest et al. (2011), Bago d’Uva et al. (2011) and Datta Gupta et al. (2010).
The first component of the HOPIT model captures respondents’ ratings of the vignettes. The perceived latent health level of vignette j in domain d, , is specified to depend solely on a dummy indicator identifying the vignette being rated and a random, normally distributed error:
(4) |
The absence of observable characteristics of the respondent from model (4) follows from the assumption of vignette equivalence. The observed categorical vignette rating Vjdi relates to through the reporting thresholds:
(5) |
, which are now defined as functions of the same covariates that enter the latent index of own health in equation (2),
(6) |
Inclusion of the individual’s characteristics in the thresholds is possible because the assumption of vignette equivalence ensures that all the systematic variation in the vignette ratings can be attributed to reporting behaviour. In principle, it would be possible to include an error term in equation (6) representing unobservable heterogeneity in reporting styles. We do not do so since there are only three vignette ratings within each domain from which to identify the individual effects. With relatively small samples identification is likely to be weak.
The second component of the HOPIT model concerns the individual’s categorical rating of his own health. This is assumed to be determined by the position of a latent health index in relation to thresholds as in expressions (2)–(3) with the important difference that the thresholds are no longer assumed constant but are constrained to be equal to those in equation (6) identified from the vignettes component of the model. This follows from the response consistency assumption that any systematic biases in the reporting of own health correspond to those observed in the reporting of the vignettes. The HOPIT model therefore consists of ordered probit models for the reporting of own health and health of the vignettes with the cross-equation restriction that the threshold parameters are equal. It is assumed that the error terms in the vignette and own latent health equations, υjdi and εdi respectively, are independent for all i, j and d. While maintaining the basic assumptions of vignette equivalence and response consistency, it is possible to relax the distributional assumptions of the HOPIT model (King and Wand, 2007).
The predicted latent own-health scores from the HOPIT model are used to proxy need for curative care in the hurdle model of visits to a doctor. By both instrumenting reported health with objective indicators and purging reporting heterogeneity by using the vignettes, this procedures deals with both sources of bias that are present when the raw self-reports are used to proxy need. Since both biases are expected to be downwards, we expect the education effects that are estimated by using this need proxy to be larger than those obtained by using the two other proxies. In addition to the exclusion restrictions on the health indicators and the vignette equivalence and response consistency assumptions, identification also requires that the vignette ratings do not directly explain consultations with a doctor. The latter implies that perceptions of health in general, as opposed to perceptions of own health, do not impact on healthcare seeking behaviour. Although it by no means amounts to a formal test, we found no evidence of any effect of the vignette responses when these were entered into a model of visits to a doctor.
To take account of the sampling variability of the predicted health scores that are used in the hurdle models of doctor visits, we bootstrap the whole procedure, using 50 replications, to obtain standard errors.
4. Results
Estimates of the ordered probit and HOPIT models are not presented. Using the same SHARE data, estimates of education-related disparities in each of the six self-reported health domains with and without adjustment for reporting heterogeneity by using the HOPIT model are presented in Bago d’Uva et al. (2008). Hurdle models are estimated separately for GP and specialist visits. In addition to need for curative care proxied in three alternative ways from self-rated health, all models include the full set of diagnosed conditions listed in Table 4, as well as age, gender, country dummy variables and education. Only the education coefficients are presented in Tables 5, 6, 7, and 8, both from restricted models in which the education effects are assumed homogeneous across countries and from more flexible models in which they are allowed to vary through the introduction of education–country interaction terms.
Table 5.
Country | Education | Results for models differentiated by means of controlling for need in 6 health domains
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Reported health category
|
Latent health index from ordered probit
|
Latent health index from HOPIT model
|
||||||||
Coefficient | SE‡ | Joint significance | Coefficient | SE | Joint significance | Coefficient | SE | Joint significance | ||
Restricted model with homogeneous education effects (1) | ||||||||||
Lower secondary | 0.004 | 0.132 | 0.049 | 0.027 | 0.139 | 0.103 | 0.091 | 0.135 | 0.319 | |
Upper secondary | 0.015 | 0.121 | 0.100 | 0.143 | 0.201 | 0.140 | ||||
Tertiary | −0.287§ | 0.129 | −0.186 | 0.128 | −0.044 | 0.132 | ||||
Unrestricted model with country-specific education effects (2) | ||||||||||
Belgium | Lower secondary | −0.718§§ | 0.436 | 0.401 | −0.754 | 0.525 | 0.783 | −0.627 | 0.550 | 0.647 |
Upper secondary | −0.337 | 0.454 | −0.404 | 0.484 | −0.234 | 0.495 | ||||
Tertiary | −0.508 | 0.450 | −0.437 | 0.498 | −0.221 | 0.530 | ||||
France | Lower secondary | −0.480 | 0.448 | 0.076 | −0.289 | 0.455 | 0.076 | −0.114 | 0.447 | 0.620 |
Upper secondary | −0.214 | 0.315 | −0.137 | 0.350 | 0.060 | 0.365 | ||||
Tertiary | −0.774§§ | 0.308 | −0.663§§ | 0.350 | −0.466 | 0.407 | ||||
Italy | Lower secondary | −0.120 | 0.340 | 0.599 | −0.087 | 0.374 | 0.237 | 0.033 | 0.394 | 0.488 |
Upper secondary or tertiary | 0.261 | 0.332 | 0.494 | 0.438 | 0.583 | 0.510 | ||||
Germany | Upper secondary | 0.085 | 0.431 | 0.201 | 0.174 | 0.500 | 0.842 | 0.313 | 0.506 | 0.622 |
Tertiary | −0.485 | 0.464 | −0.315 | 0.539 | −0.109 | 0.578 | ||||
Greece | Lower secondary | 0.617§§ | 0.339 | 0.017 | 0.503 | 0.427 | 0.002 | 0.722 | 0.623 | 0.464 |
Upper secondary | −0.413§§ | 0.213 | −0.317 | 0.233 | −0.263 | 0.244 | ||||
Tertiary | −0.215 | 0.250 | −0.063 | 0.299 | 0.012 | 0.346 | ||||
Netherlands | Lower secondary | 0.435 | 0.348 | 0.245 | 0.436 | 0.396 | 0.940 | 0.700 | 0.432 | 0.049 |
Upper secondary | 0.726§§ | 0.389 | 0.838§ | 0.401 | 1.293§ | 0.477 | ||||
Tertiary | 0.221 | 0.376 | 0.265 | 0.335 | 1.018§ | 0.482 | ||||
Spain | Lower secondary | 0.536 | 0.413 | 0.018 | 0.624 | 0.483 | 0.735 | 0.581 | 0.489 | 0.094 |
Upper secondary or tertiary | −0.691§ | 0.336 | −0.631 | 0.400 | −0.567 | 0.407 | ||||
Sweden | Lower secondary | 0.020 | 0.334 | 0.257 | −0.025 | 0.407 | 0.218 | 0.077 | 0.412 | 0.871 |
Upper secondary | 0.579§§ | 0.318 | 0.410 | 0.408 | 0.298 | 0.418 | ||||
Tertiary | 0.314 | 0.307 | 0.149 | 0.313 | 0.240 | 0.330 | ||||
Likelihood ratio tests, LR, of restricted (1) against unrestricted (2) model | ||||||||||
LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | ||
36.91 | 20 | 0.0120 | 33.85 | 20 | 0.0272 | 36.33 | 20 | 0.0140 |
All models include controls for health in six domains, diagnosed chronic conditions, age, gender and country. The reference education category is primary school or less except for Germany. The first panel contains estimated education coefficients from a restricted model with no education–country interactions. The second panel contains country-specific education effects from a model with interactions. The ordered probit and HOPIT models which were used to predict the health indices include the same age, gender, education, country and education–country interactions, plus the health indicators listed in Table 4.
Bootstrap standard error.
Significance relative to the reference education category, which is the lowest, at 5%.
Significance relative to the reference education category, which is the lowest, at 10%.
Table 6.
Country | Education | Results for models differentiated by means of controlling for need in 6 health domains
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Reported health category
|
Latent health index from ordered probit
|
Latent health index from HOPIT model
|
||||||||
Coefficient | SE‡ | Joint significance | Coefficient | SE | Joint significance | Coefficient | SE | Joint significance | ||
Restricted model with homogeneous education effects (1) | ||||||||||
Lower secondary | 0.058 | 0.055 | 0.154 | 0.102 | 0.065 | 0.133 | 0.149§ | 0.067 | 0.019 | |
Upper secondary | 0.044 | 0.051 | 0.119§ | 0.057 | 0.206§§ | 0.067 | ||||
Tertiary | −0.068 | 0.059 | 0.029 | 0.070 | 0.141* | 0.081 | ||||
Unrestricted model with country-specific education effects (2) | ||||||||||
Belgium | Lower secondary | 0.144 | 0.130 | 0.569 | 0.144 | 0.153 | 0.783 | 0.213 | 0.164 | 0.316 |
Upper secondary | 0.162 | 0.129 | 0.164 | 0.179 | 0.329* | 0.190 | ||||
Tertiary | 0.064 | 0.131 | 0.085 | 0.158 | 0.228 | 0.178 | ||||
France | Lower secondary | −0.187 | 0.150 | 0.011 | −0.091 | 0.166 | 0.076 | 0.090 | 0.200 | 0.300 |
Upper secondary | −0.038 | 0.089 | 0.019 | 0.118 | 0.179 | 0.124 | ||||
Tertiary | −0.353§§ | 0.111 | −0.274§ | 0.112 | −0.080 | 0.146 | ||||
Italy | Lower secondary | 0.065 | 0.137 | 0.097 | 0.251 | 0.185 | 0.237 | 0.281 | 0.200 | 0.222 |
Upper secondary or tertiary | −0.253* | 0.133 | −0.059 | 0.195 | −0.057 | 0.210 | ||||
Germany | Upper secondary | −0.107 | 0.131 | 0.538 | −0.051 | 0.159 | 0.842 | 0.036 | 0.162 | 0.877 |
Tertiary | −0.179 | 0.163 | −0.101 | 0.173 | −0.047 | 0.205 | ||||
Greece | Lower secondary | 0.301* | 0.163 | 0.019 | 0.205 | 0.164 | 0.002 | 0.246 | 0.160 | 0.003 |
Upper secondary | 0.304§ | 0.126 | 0.409§§ | 0.138 | 0.420§§ | 0.143 | ||||
Tertiary | 0.367§ | 0.151 | 0.499§§ | 0.156 | 0.562§§ | 0.174 | ||||
Netherlands | Lower secondary | 0.033 | 0.185 | 0.981 | −0.006 | 0.171 | 0.940 | 0.104 | 0.193 | 0.412 |
Upper secondary | 0.061 | 0.203 | 0.095 | 0.192 | 0.261 | 0.226 | ||||
Tertiary | −0.009 | 0.210 | 0.028 | 0.184 | 0.373 | 0.238 | ||||
Spain | Lower secondary | −0.074 | 0.129 | 0.783 | −0.076 | 0.141 | 0.735 | −0.077 | 0.140 | 0.750 |
Upper secondary or tertiary | −0.079 | 0.155 | 0.073 | 0.221 | 0.070 | 0.231 | ||||
Sweden | Lower secondary | 0.471§ | 0.226 | 0.205 | 0.653* | 0.343 | 0.218 | 0.645* | 0.355 | 0.337 |
Upper secondary | 0.239 | 0.197 | 0.287* | 0.165 | 0.155 | 0.183 | ||||
Tertiary | 0.224 | 0.216 | 0.278 | 0.220 | 0.149 | 0.232 | ||||
Likelihood ratio tests, LR, of restricted (1) against unrestricted (2) model | ||||||||||
LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | ||
31.67 | 20 | 0.0470 | 35.24 | 20 | 0.0189 | 35.11 | 20 | 0.0195 |
See the first footnote to Table 5.
Bootstrap standard error.
Significance relative to the reference education category, which is the lowest, at 5%.
Significance relative to the reference education category, which is the lowest, at 1%.
Significance relative to the reference education category, which is the lowest, at 10%.
Table 7.
Country | Education | Results for models differentiated by means of controlling for need in 6 health domains
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Reported health category
|
Latent health index from ordered probit
|
Latent health index from HOPIT model
|
||||||||
Coefficient | SE‡ | Joint significance | Coefficient | SE | Joint significance | Coefficient | SE | Joint significance | ||
Restricted model with homogeneous education effects (1) | ||||||||||
Lower secondary | 0.387§ | 0.107 | < 0.0005 | 0.368§ | 0.115 | < 0.0005 | 0.414§ | 0.118 | < 0.0005 | |
Upper secondary | 0.521§ | 0.099 | 0.497§ | 0.103 | 0.554§ | 0.106 | ||||
Tertiary | 0.779§ | 0.109 | 0.782§ | 0.131 | 0.910§ | 0.145 | ||||
Unrestricted model with country-specific education effects (2) | ||||||||||
Belgium | Lower secondary | 0.327 | 0.268 | 0.103 | 0.315 | 0.287 | 0.081 | 0.405 | 0.293 | 0.033 |
Upper secondary | 0.477§§ | 0.268 | 0.481 | 0.318 | 0.579§§ | 0.328 | ||||
Tertiary | 0.656* | 0.274 | 0.708* | 0.290 | 0.829§ | 0.295 | ||||
France | Lower secondary | 0.421 | 0.296 | 0.010 | 0.429 | 0.295 | 0.037 | 0.513§§ | 0.306 | 0.033 |
Upper secondary | 0.198 | 0.186 | 0.172 | 0.218 | 0.261 | 0.231 | ||||
Tertiary | 0.697§ | 0.212 | 0.690§ | 0.239 | 0.755§ | 0.257 | ||||
Italy | Lower secondary | 0.564* | 0.288 | < 0.0005 | 0.556* | 0.270 | 0.001 | 0.609* | 0.284 | 0.002 |
Upper secondary or tertiary | 1.072§ | 0.269 | 1.147§ | 0.302 | 1.192§ | 0.347 | ||||
Germany | Upper secondary | 0.359 | 0.270 | 0.060 | 0.334 | 0.272 | 0.044 | 0.387 | 0.275 | 0.049 |
Tertiary | 0.758* | 0.322 | 0.784* | 0.314 | 0.872* | 0.358 | ||||
Greece | Lower secondary | −0.318 | 0.349 | < 0.0005 | −0.383 | 0.305 | < 0.0005 | −0.207 | 0.335 | < 0.0005 |
Upper secondary | 0.881§ | 0.209 | 0.866§ | 0.195 | 0.888§ | 0.205 | ||||
Tertiary | 1.050§ | 0.245 | 1.127§ | 0.235 | 1.125§ | 0.261 | ||||
Netherlands | Lower secondary | 0.011 | 0.310 | 0.993 | 0.030 | 0.359 | 0.994 | 0.139 | 0.391 | 0.848 |
Upper secondary | −0.065 | 0.343 | −0.043 | 0.373 | 0.164 | 0.392 | ||||
Tertiary | −0.016 | 0.343 | 0.008 | 0.382 | 0.417 | 0.520 | ||||
Spain | Lower secondary | 0.723§ | 0.274 | 0.017 | 0.753§ | 0.237 | 0.006 | 0.706§ | 0.247 | 0.016 |
Upper secondary or tertiary | 0.518§§ | 0.297 | 0.529 | 0.391 | 0.498 | 0.429 | ||||
Sweden | Lower secondary | 0.469 | 0.354 | 0.057 | 0.409 | 0.404 | 0.063 | 0.455 | 0.418 | 0.033 |
Upper secondary | 0.509 | 0.321 | 0.440 | 0.318 | 0.361 | 0.333 | ||||
Tertiary | 0.856§ | 0.316 | 0.811§ | 0.301 | 0.938§ | 0.319 | ||||
Likelihood ratio tests, LR, of restricted (1) against unrestricted (2) model | ||||||||||
LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | ||
27.97 | 20 | 0.1102 | 30.84 | 20 | 0.0573 | 21.88 | 20 | 0.3468 |
See the first footnote to Table 5.
Bootstrap standard error.
Significance relative to the reference education category, which is the lowest, at 1%.
Significance relative to the reference education category, which is the lowest, at 10%.
Significance relative to the reference education category, which is the lowest, at 5%.
Table 8.
Country | Education | Results for models differentiated by means of controlling for need in 6 health domains
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Reported health category
|
Latent health index from ordered probit
|
Latent health index from HOPIT model
|
||||||||
Coefficient | SE‡ | Joint significance | Coefficient | SE | Joint significance | Coefficient | SE | Joint significance | ||
Restricted model with homogeneous education effects (1) | ||||||||||
Lower secondary | 0.177§ | 0.107 | 0.121 | 0.149 | 0.128 | 0.484 | 0.180 | 0.127 | 0.541 | |
Upper secondary | −0.067 | 0.097 | −0.026 | 0.110 | 0.050 | 0.114 | ||||
Tertiary | −0.024 | 0.105 | −0.010 | 0.119 | 0.103 | 0.135 | ||||
Unrestricted model with country-specific education effects (2) | ||||||||||
Belgium | Lower secondary | 0.312 | 0.250 | 0.428 | 0.361 | 0.280 | 0.531 | 0.551§ | 0.284 | 0.270 |
Upper secondary | −0.044 | 0.248 | 0.121 | 0.267 | 0.305 | 0.239 | ||||
Tertiary | 0.074 | 0.247 | 0.126 | 0.252 | 0.378 | 0.264 | ||||
France | Lower secondary | 0.295 | 0.285 | 0.273 | 0.380 | 0.421 | 0.266 | 0.554 | 0.430 | 0.114 |
Upper secondary | 0.138 | 0.184 | 0.160 | 0.221 | 0.379§ | 0.216 | ||||
Tertiary | −0.208 | 0.201 | −0.194 | 0.208 | 0.030 | 0.244 | ||||
Italy | Upper secondary | 0.286 | 0.296 | 0.624 | 0.383 | 0.238 | 0.269 | 0.482§ | 0.264 | 0.169 |
Tertiary | 0.056 | 0.264 | 0.312 | 0.378 | 0.356 | 0.401 | ||||
Germany | Upper secondary | −0.212 | 0.235 | 0.500 | −0.275 | 0.236 | 0.504 | −0.152 | 0.243 | 0.719 |
Tertiary | −0.017 | 0.270 | −0.123 | 0.293 | 0.063 | 0.318 | ||||
Greece | Lower secondary | −0.163 | 0.400 | 0.354 | −0.460 | 0.374 | 0.103 | −0.295 | 0.439 | 0.179 |
Upper secondary | −0.028 | 0.206 | −0.178 | 0.256 | −0.202 | 0.269 | ||||
Tertiary | 0.348 | 0.237 | 0.287 | 0.249 | 0.261 | 0.249 | ||||
Netherlands | Lower secondary | 0.410 | 0.312 | 0.105 | 0.479 | 0.371 | 0.451 | 0.782§§ | 0.400 | 0.234 |
Upper secondary | −0.159 | 0.358 | 0.163 | 0.411 | 0.596 | 0.454 | ||||
Tertiary | −0.068 | 0.351 | 0.076 | 0.375 | 0.878§ | 0.532 | ||||
Spain | Upper secondary | −0.256 | 0.251 | 0.174 | −0.546 | 0.402 | 0.339 | −0.605 | 0.392 | 0.296 |
Tertiary | −0.523§ | 0.295 | −0.732 | 0.540 | −0.654 | 0.567 | ||||
Sweden | Lower secondary | −0.310 | 0.417 | 0.439 | −0.252 | 0.438 | 0.801 | −0.215 | 0.427 | 0.889 |
Upper secondary | 0.041 | 0.354 | 0.119 | 0.324 | −0.002 | 0.334 | ||||
Tertiary | −0.501 | 0.366 | −0.232 | 0.314 | −0.246 | 0.334 | ||||
Likelihood ratio tests, LR, of restricted (1) against unrestricted (2) model | ||||||||||
LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | LR | Degrees of freedom | p-value | ||
20.46 | 20 | 0.4295 | 27.05 | 20 | 0.1339 | 32.05 | 20 | 0.0428 |
See the first footnote to Table 5.
Bootstrap standard error.
Significance relative to the reference education category, which is the lowest, at 10%.
Significance relative to the reference education category, which is the lowest, at 5%.
4.1. Visits to a general practitioner
The top panel of Table 5 gives the education coefficients, constrained to be equal across countries, from logit models of the binary decision to visit a GP. Using the reported category within each health domain, without instrumenting by using health indicators or purging reporting heterogeneity by using vignettes, to proxy need for curative care (results in the third, fourth and fifth columns), individuals with tertiary level education appear significantly less likely than those with no more than primary education to visit a GP. If we were to presume that self-rated health and diagnosed conditions are adequate proxies for need, then this would indicate inequity in access to GP care to the disadvantage of the better educated. However, when self-rated health is instrumented by using the objective health indicators (sixth, seventh and eighth columns), the magnitude of the coefficient on tertiary education falls and it loses significance. Correcting, in addition, for reporting heterogeneity by using predictions of health from HOPIT models identified from the vignettes results in a further large fall in the magnitude and significance of the coefficient on tertiary education. In this specification, the education categories are clearly not jointly significant. The loss of significance of education is not due to increases in the standard errors, which are relatively constant across the specifications, but rather to the fall in the magnitude of the tertiary education effect. The positive coefficients on the secondary education categories increase as more adjustments are made to the reported health variables but they never reach significance. These results are entirely consistent with our predictions that using self-rated health to proxy need will bias estimates away from showing an educational advantage in healthcare use because of both errors in variables and education-correlated reporting errors and that instrumenting reported health by using objective health indicators will only partially correct the bias.
The tests at the foot of Table 5 indicate that the restricted model is rejected relative to one that allows education effects to vary by country. For most countries, the results are broadly consistent with those from the restricted model. There is an apparent lower propensity for the higher education groups to consult a GP when raw self-rated health is used to proxy need, which is eroded as reported health is instrumented and then corrected for reporting heterogeneity. This pattern is most apparent for Belgium, France, Greece and Spain. The Netherlands is clearly an exception. Even using raw self-rated health there is some evidence that higher education groups are more likely than those with primary education to visit a GP. When adjustment is made for reporting heterogeneity, but not so much when self-rated health is instrumented, the pro-higher-education advantage becomes much more marked both in magnitude and in significance. Sweden also starts with some indication of an advantage to the better educated. Note that the average probability of visiting a GP is lower in Sweden and the Netherlands than in all other countries except Greece (Table 1). It could be that the higher educated are less constrained by the more restricted access to GP services. Unlike the Netherlands, the education disparity does not rise in Sweden when adjustment is made for reporting heterogeneity. This is because the reporting bias is in the opposite direction in Sweden—the higher educated are less likely to report a given condition as representing a health problem (Bago d’Uva et al., 2008).
Results from the truncated negative binomial model of the (positive) number of visits to a GP are even more consistent with our expectations about the effect of correcting measurement error (Table 6). In the results from the model with no education–country interactions there are no significant education effects when self-rated health is used to proxy need. The coefficients of all three education categories increase when reported health is instrumented by using objective health indicators and it emerges that those with upper secondary education visit a GP significantly more than those with primary schooling. Adjusting for reporting heterogeneity leads to large increases in the coefficients and all three become significant. On average across countries, GP utilization is significantly greater for all education groups in comparison with the least educated. There is clear evidence of inequity in the intensity of GP care that is not evident when reported health categories are used to proxy need.
The tendency for a higher education bias to become more evident as cumulative adjustments are made to self-rated health is apparent for most countries, particularly Belgium, Greece and the Netherlands, although significance is much lower owing to the small cell sizes in the models with country–education interactions. In France and Italy the results are similar to those from the logit model—an apparent disparity to the disadvantage of the highest education group is removed once account has been taken of the tendency of the highly educated to report their health more negatively. Greece is the country with the greatest education-related inequality in visits to a GP. It also has one of the lowest mean rates of GP consultation (Table 1), possibly reflecting both the lack of a family doctor service and the very heavy reliance on out-of-pocket payments for healthcare (Organisation for Economic Co-operation and Development, 2010). Although primary care is free to those with public health insurance, long waiting times for consultations may lead patients, at least those who are willing to pay, to turn to providers who do not accept public insurance patients.
4.2. Specialist visits
Consistent with the findings of previous European studies (van Doorslaer et al., 2000, 2004), access to specialist care displays greater bias in favour of higher education groups than does GP care. On average across countries, the probability of seeing a specialist rises significantly and monotonically with education even when self-rated health in used to proxy need (the third, fourth and fifth columns in the top panel of Table 7). Instrumenting with health indicators using an ordered probit model of reported health has little effect on the education effects, which remain highly significant (the sixth, seventh and eighth columns). Adjusting for reporting heterogeneity by using the HOPIT model increases the coefficients, particularly for the top education category. So, again, there appears to be a marginal effect of using the vignettes to correct for reporting heterogeneity over that achieved by exploiting information on objective health indicators.
Unlike for GP care, the likelihood ratio tests do not indicate substantial cross-country variation in the education effects on specialist care (Table 7, bottom row). Indeed, the consistent educational advantage in access to specialist care is striking. Even using raw self-rated health to proxy need, there is a significant education gradient in the probability of accessing specialist care in every country except the Netherlands, which stands out by having a gradient in access to GP, but not specialist, care. This is consistent with GP gate-keepers in the Netherlands being effective in ensuring equitable access to specialist care, possibly withstanding pressure from higher social groups seeking privileged access. The reader is referred toBago d’Uva and Jones (2009) for further discussion of the role of gate-keepers, and other European health system characteristics, in explaining inequality in the distribution of GP and specialist visits. In general, the country-specific education effects on the probability of using specialist care increase when adjustment is made for reporting heterogeneity.
The effect of education on the number of specialist visits is very different from that on the probability of having at least one visit. Conditional on coming into contact with a specialist, there is little or no evidence that the intensity of consultations varies systematically with education (Table 8). In the restricted model with homogeneous education effects, the education coefficients move in the direction of an education advantage when adjustment is made for measurement error in reported health, but they do not come close to reaching significance. As with the binary specialist decision, homogeneity of the education effects across countries is not clearly and consistently rejected. After adjustment for reporting heterogeneity, but not before, at least one of the education groups use specialist care more intensively than those with primary education only in Belgium, France, Italy and the Netherlands. The lack of significance of the effects in this model could be due to the smaller sample size, given that less than half of respondents use specialist care. But analysis of data from the European Community Household Panel Survey also reveals larger income elasticities in the probability of contacting a specialist than in the conditional number of contacts (Bago d’Uva and Jones, 2009). It is plausible that inequity is at the extensive rather than the intensive margin. Consistent with the hypothesis that was advanced in Section 1 that information disparities may be responsible for the education gradient in health and healthcare, the lower education groups may lack information on the type of specialist care from which they could benefit and consequently fail to make contact with an appropriate specialist. But, once contact has been made, the doctor will be largely responsible for the course of treatment and the number of consultations. On this interpretation, the problem is not one of unequal treatment of equal presented need but unequal presentation for treatment, i.e. the problem is on the demand side rather than the supply side.
5. Conclusion
Equal treatment for equal need is a founding principle of many European healthcare systems. Monitoring the performance of systems with respect to this goal is a major challenge. A weakness of previous research has been the failure to identify individuals in equal need by using the self-reported health measures that are typically available in large household surveys. In this paper, we addressed this weakness by using two methods to correct bias in the estimated education gradient in visits to a doctor conditional on need. The first approach instruments self-rated health with objective health indicators. This deals with bias arising from the fact that self-rated health is an imperfect proxy for medical care need. The second method goes a step further by purging self-rated health of education-related reporting errors by using information on reporting styles from the rating of health vignettes.
Analysis of SHARE data on elderly Europeans confirms it is likely that bias is present in studies that use self-rated health as a proxy for need with the goal of estimating inequity in the distribution of visits to a doctor. Instrumenting self-reported health with objective health indicators generally shifts the distribution of healthcare conditional on measured need in the direction of inequality favouring more highly educated groups. There is a further, and typically larger, shift in the same direction when correction is made for the observed tendency of the more highly educated to rate their health more negatively. On average across countries, whereas the probability of contacting a GP conditionally on self-rated health is lower for the higher educated, this perverse inequity is no longer apparent when self-rated health is purged of both sources of bias. Using self-rated health to proxy need, there is no apparent inequality in the number of visits to a GP. But inequity to the advantage of the better educated emerges when corrections are made to reported health. The probability of accessing specialist care displays inequity to the advantage of the better educated even without any correction for measurement error, but the disparity becomes even larger once differential reporting thresholds have been taken into account.
These results are a warning against complacency in the equity performance of European health systems. The distribution of primary care is perhaps not as equitable as is often believed and specialist care is even less equitably distributed than has hitherto been realized. But the implications of the results potentially stretch beyond the monitoring of inequity to the understanding of its causes. With universal coverage ensured and relatively low financial barriers to access in most European countries, we may rightly wonder how it can possibly be that equity remains an elusive goal. Our hypothesis is that variation in health expectations, as reflected in the rating of vignettes, at least in part, drives inequality in the utilization of healthcare. Given that the better educated report a given level of health more negatively, it seems likely that they will also be quicker to seek healthcare for any given condition. Viewed in this way, reporting heterogeneity is not simply a nuisance to be purged from health measures to evaluate equity in healthcare use better; it is potentially an important factor in improving understanding of health seeking behaviour.
Acknowledgments
We thank the Associate Editor and two referees for extremely useful comments. We are grateful to Isabelle Soerjomataram for her guidance in selecting relevant indicators of each health domain. This paper derives from the Netspar-funded project ‘Health and income, work and care across the life cycle II’. Teresa Bago d’Uva was also funded by a VENI grant from the Netherlands Organisation for Scientific Research and Owen O’Donnell and Eddy van Doorslaer by National Institute of Aging grant 1R01AG037398. The research reported uses data from release 2 of the SHARE 2004 project. The SHARE data collection has been primarily funded by the European Commission through the fifth framework programme (project QLK6-CT-2001-00360 in the thematic programme ‘Quality of life’). Additional funding came from the US National Institute on Aging (U01 AG09740-13S2, P01 AG005842, P01 AG08291, P30 AG12815, Y1-AG-4553-01 and OGHA 04-064). The Belgian Science Policy Office funded data collection in Belgium. Further support by the European Commission through the sixth framework programme (projects SHARE-I3, RII-CT-2006-062193, and COMPARE, CIT5-CT-2005-028857) is gratefully acknowledged.
Contributor Information
Teresa Bago d’Uva, Erasmus University Rotterdam and Tinbergen Institute, Rotterdam, The Netherlands.
Maarten Lindeboom, Free University Amsterdam and Tinbergen Institute, Amsterdam, The Netherlands.
Owen O’Donnell, Erasmus University Rotterdam, Tinbergen Institute, Rotterdam, The Netherlands, and University of Macedonia, Thessaloniki, Greece.
Eddy van Doorslaer, Erasmus University Rotterdam and Tinbergen Institute, Rotterdam, The Netherlands.
References
- Bago d’Uva T, Jones AM. Health care utilisation in Europe: new evidence from the ECHP. J Hlth Econ. 2009;28:265–279. doi: 10.1016/j.jhealeco.2008.11.002. [DOI] [PubMed] [Google Scholar]
- Bago d’Uva T, Jones AM, van Doorslaer E. Measurement of horizontal inequity in health care utilisation using European panel data. J Hlth Econ. 2009;28:280–289. doi: 10.1016/j.jhealeco.2008.09.008. [DOI] [PubMed] [Google Scholar]
- Bago d’Uva T, Lindeboom M, O’Donnell O, van Doorslaer E. Slipping anchor?: testing the vignettes approach to identification and correction of reporting heterogeneity. J Hum Resour. 2011;46 doi: 10.1353/jhr.2011.0005. in the press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bago d’Uva T, O’Donnell O, van Doorslaer E. Health reporting by educational level and its impact on the measurement of health inequalities in Europe. Int J Epidem. 2008;37:1375–1383. doi: 10.1093/ije/dyn146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beam Dowd J, Zajacova A. Does the predictive power of self-rated health for subsequent mortality risk vary by socioeconomic status in the US? Int J Epidem. 2007;36:1214–1221. doi: 10.1093/ije/dym214. [DOI] [PubMed] [Google Scholar]
- Börsch-Supan A, Hank K, Jürges H. A new comprehensive and international view on ageing: introducing the survey of health, ageing and retirement in Europe. Eur J Agng. 2005;2:245–253. doi: 10.1007/s10433-005-0014-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Börsch-Supan A, Jürges H. The Survey of Health, Ageing and Retirement in Europe—Methodology. Mannheim: Mannheim Research Institute for the Economics of Aging; 2005. [Google Scholar]
- Bound J. Self reported versus objective measures of health in retirement models. J Hum Resour. 1991;26:107–137. [Google Scholar]
- Cutler DM, Lleras-Muney A. Education and health: evaluating theories and evidence. In: Schoeni RF, House JS, Kaplan GA, Pollack H, editors. Making Americans Healthier: Social and Economic Policy as Health Policy. New York: Russell Sage Foundation; 2008. [Google Scholar]
- Datta Gupta N, Kristensen N, Pozzoli D. External validation of the use of vignettes in cross-country health studies. Econ Modllng. 2010;27:854–865. [Google Scholar]
- van Doorslaer E, Koolman X, Jones A. Explaining income-related inequalities in doctor utilisation in Europe. Hlth Econ. 2004;13:629–647. doi: 10.1002/hec.919. [DOI] [PubMed] [Google Scholar]
- van Doorslaer E, Wagstaff A, van der Burg H, Christiansen T, De Graeve D, Duchesne I, Gerdtham UG, Gerfin M, Geurts J, Gross L, Häkkinen U, John J, Klavus J, Leu RE, Nolan B, O’Donnell O, Propper C, Puffer F, Schellhorn M, Sundberg G, Winkelhake O. Equity in the delivery of health care in Europe and the US. J Hlth Econ. 2000;19:553–583. doi: 10.1016/s0167-6296(00)00050-3. [DOI] [PubMed] [Google Scholar]
- Drèze J, Sen AK. India: Development and Participation. Oxford: Oxford University Press; 2002. [Google Scholar]
- Frijters P, Haisken-DeNew JP, Shields M. The causal effect of income on health: evidence from German reunification. J Hlth Econ. 2005;24:997–1017. doi: 10.1016/j.jhealeco.2005.01.004. [DOI] [PubMed] [Google Scholar]
- Glied S, Lleras-Muney A. Technological innovation and inequality in health. Demography. 2008;45:741–761. doi: 10.1353/dem.0.0017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grootendorst PV. A comparison of alternative models of prescription drug utilization. Hlth Econ. 1995;4:183–198. doi: 10.1002/hec.4730040304. [DOI] [PubMed] [Google Scholar]
- Gurmu S. Generalized hurdle count data regression models. Econ Lett. 1998;58:263–268. [Google Scholar]
- Huisman M, van Lenthe F, Mackenbach J. The predictive ability of self-assessed health for mortality in different educational groups. Int J Epidem. 2007;36:1207–1213. doi: 10.1093/ije/dym095. [DOI] [PubMed] [Google Scholar]
- Jurges H. Self-assessed health, reference levels and mortality. Appl Econ. 2008;40:569–582. [Google Scholar]
- Kapteyn A, Smith J, van Soest A. Vignettes and self-reports of work disability in the US and the Netherlands. Am Econ Rev. 2007;97:461–473. [Google Scholar]
- Kapteyn A, Smith J, van Soest A. Discussion Paper 4388. Institute for the Study of Labor; Bonn: 2009. Work disability, work, and justification bias in Europe and the U.S. [Google Scholar]
- King G, Murray CJL, Salomon J, Tandon A. Enhancing the validity and cross-cultural comparability of measurement in survey research. Am Polit Sci Rev. 2004;98:184–191. [Google Scholar]
- King G, Wand J. Comparing incomparable survey responses: evaluating and selecting anchoring vignettes. Polit Anal. 2007;15:46–66. [Google Scholar]
- van Kippersluis H, O’Donnell O, van Doorslaer E. Long run returns to education: does schooling lead to an extended old age? J Hum Resour. 2011 to be published. [PMC free article] [PubMed] [Google Scholar]
- Kristensen N, Johansson E. New evidence on cross-country differences in job satisfaction using anchoring vignettes. Lab Econ. 2008;15:96–117. [Google Scholar]
- Lindeboom M, van Doorslaer E. Threshold shift and index shift in self-reported health. J Hlth Econ. 2004;23:1083–1099. doi: 10.1016/j.jhealeco.2004.01.002. [DOI] [PubMed] [Google Scholar]
- Lleras-Muney A. The relationship between education and adult mortality in the United States. Rev Econ Stud. 2005;72:189–221. [Google Scholar]
- Lleras-Muney A, Lichtenberg F. Working Paper 9185. National Bureau of Economic Research; Cambridge: 2002. The effect of education on medical technology adoption: are the more educated more likely to use new drugs? [Google Scholar]
- Mackenbach JP, Stirbu I, Roskam AJR, Schaap MS, Menvielle G, Leinsalu M, Kunst AE. Socioeconomic inequalities in health in 22 European countries. New Engl J Med. 2008;358:2468–2481. doi: 10.1056/NEJMsa0707519. [DOI] [PubMed] [Google Scholar]
- Murray CJL, Chen LC. Understanding morbidity change. Popln Develpmnt Rev. 1992;18:481–503. [Google Scholar]
- Murray CJL, Ozaltin E, Tandon A, Salomon J. Empirical evaluation of the anchoring vignettes approach in health surveys. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. [Google Scholar]
- O’Donnell O, Propper C. Equity and the distribution of UK National Health Service resources. J Hlth Econ. 1991;10:1–19. doi: 10.1016/0167-6296(91)90014-e. [DOI] [PubMed] [Google Scholar]
- Oreopoulos P. Estimating average and local average treatment effects of education when compulsory school laws really matter. Am Econ Rev. 2006;96:152–175. [Google Scholar]
- Organisation for Economic Co-operation and Development. OECD Health Data 2010. Paris: Organisation for Economic Co-operation and Development; 2010. [Google Scholar]
- Prince MJ, Reischies F, Beekman AT, Fuhrer R, Jonker C, Kivela SL, Lawlor BA, Lobo A, Magnusson H, Fichter M, van Oyen H, Roelands M, Skoog I, Turrina C, Copeland JR. Development of the EURO-D scale—a European Union initiative to compare symptoms of depression in 14 European centres. Br J Psychiatr. 1999;174:330–338. doi: 10.1192/bjp.174.4.330. [DOI] [PubMed] [Google Scholar]
- Rantanen T, Guralnik JM, Foley D, Masaki K, Leveille S, Curb JD, White L. Midlife hand grip strength as a predictor of old age disability. J Am Med Ass. 1999;281:558–560. doi: 10.1001/jama.281.6.558. [DOI] [PubMed] [Google Scholar]
- Rantanen T, Volpato S, Ferucci L, Heikkinen E, Fried LP, Guralnik JM. Handgrip strength and cause-specific and total mortality in older disabled women: exploring the mechanism. J Am Geriatr Soc. 2003;51:636–641. doi: 10.1034/j.1600-0579.2003.00207.x. [DOI] [PubMed] [Google Scholar]
- Rice N, Robone S, Smith PC. Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. Eur J Hlth Econ. 2011;12:141–162. doi: 10.1007/s10198-010-0235-5. [DOI] [PubMed] [Google Scholar]
- Sen A. Health: perception versus observation. Br Med J. 2002;324:860–861. doi: 10.1136/bmj.324.7342.860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh-Manoux A, Dugravot A, Shipley MJ, Ferrie JE, Martikainen P, Goldberg M, Zins M. The association between self-related health and mortality in different educational groups. Int J Epidem. 2007;36:1222–1228. doi: 10.1093/ije/dym170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith JP. The impact of socioeconomic status on health over the life-course. J Hum Resour. 2007;42:739–764. [Google Scholar]
- Smith JP, Kington R. Demographic and economic correlates of health in old age. Demography. 1997;34:159–170. [PubMed] [Google Scholar]
- van Soest A, Delaney L, Harmon C, Kapteyn A, Smith JP. Validating the use of anchoring vignettes for the correction of response scale differences in subjective questions. J R Statist Soc A. 2011;174:575–595. doi: 10.1111/j.1467-985X.2011.00694.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tandon A, Murray CJL, Salomon JA, King G. Statistical models for enhancing cross-population comparability. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. pp. 727–746. [Google Scholar]
- United Nations Educational, Scientific and Cultural Organization. International Standard Classification of Education 1997. Paris: United Nations Educational, Scientific and Cultural Organization; 1997. [Google Scholar]
- Wagstaff A, van Doorslaer E. Measuring and testing for inequity in the delivery of health care. J Hum Resour. 2000a;35:716–733. [Google Scholar]
- Wagstaff A, van Doorslaer E. Equity in health care finance and delivery. In: Culyer AJ, Newhouse JP, editors. Handbook of Health Economics. Amsterdam: North-Holland; 2000b. pp. 1803–1862. [Google Scholar]
- Winkelmann R. Health care reform and the number of doctor visits—an econometric analysis. J Appl Econmetr. 2004;19:455–472. [Google Scholar]