Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Sep 15.
Published in final edited form as: Soc Sci Med. 2008 Jan 14;66(8):1659–1669. doi: 10.1016/j.socscimed.2007.11.046

Details for Manuscript Number SSM-D-06-02193R1 “Epidemiologic Analysis of Racial/Ethnic Disparities: Some Fundamental Issues and A Cautionary Example”

Jay S Kaufman 1
PMCID: PMC2744386  NIHMSID: NIHMS44628  PMID: 18248866

Abstract

Racial/ethnic health disparities are a primary focus of epidemiologic research, encompassing both sociological hypotheses about differential treatment as well as biomedical hypotheses about distinctive etiologic processes that might underlie observed disparities. These two main currents in disparities research are often pitted against one another as opposing paradigms. Despite contentious debate about the balance between these hypotheses in the etiology of existing disparities, one consideration that has been largely ignored is that there are important distinctions in the statistical justifications for these two types of inferences. In this article I review the foundations of causal inference in etiologic epidemiology as applied to studies of racial/ethnic health disparities. I describe normative applications of quantitative techniques for causal inference as they are practiced in research on discrimination in health care and also for research on innate predisposition. I then show why the latter is an injudicious application of this statistical methodology, and illustrate this point with the example of an influential study in the biomedical literature that purported to demonstrate a lesser response to angiotensin-converting-enzyme inhibitor therapy in black as compared with white patients with left ventricular dysfunction.

Keywords: health disparities, statistical inference, causality, etiology, race/ethnicity

Introduction

Racial/ethnic health disparities are a primary focus of epidemiologic research (Kaufman & Cooper 2001). This research encompasses both sociological hypotheses, for example about differential treatment (Smedley et al 2003), as well as biomedical hypothesis about distinctive etiologic processes that might underlie observed disparities, such as genetic predispositions (Cooper 2004). These two main currents in disparities research are often pitted against one another as opposing paradigms, and they have indisputable political implications that affect the way research is conducted, interpreted and debated. For example, some political conservatives have decried research emphasis on differential treatment, arguing that it detracts attention from the individual health behaviors and innate predispositions that contribute importantly to observed disparities (Satel 2001, Satel and Klick 2005). In contrast, other researchers have argued that the observed social disparities in health are a necessary consequence of political and economic policies and ideologies at the societal level (Navarro 2001, Levins 2000), and have asserted that excessive attention to individual factors such as genetics and behaviors is a distraction motivated by ideological rather than scientific justifications (Goodman 2000).

Despite the voluminous and contentious debate on this question of the balance between individual and social factors in the etiology of racial/ethnic health disparities, one important consideration that has been largely ignored is that the statistical justification for these two types of inferences is fundamentally distinct. Although data is collected, analyzed and interpreted in relation to both of these general hypotheses, the basis for statistical inference and causal attribution differs in significant ways. This difference can be most readily demonstrated by noting that the hypothesis of social discrimination is one in which the causal process under study is in the mind of the physician (or other decision-maker), in response to some acute stimulus, such as the presentation of a patient whose race is observed. The hypothesis of individual predilection or predisposition, on the other hand, places the causal process within the physiology or psychology of the affected individual. In this case, the individual experiences a behavior or physiologic state because of who they are, a condition that is chronic and fundamentally inalterable.

There are numerous ramifications of this conceptual distinction between the two causal hypotheses, but one notable example is that it is possible to design experimental trials to study discrimination directly, whereas it is not possible to design an experimental study that directly assesses innate predilection or predisposition. Scientists can certainly learn about mechanisms at the level of individual physiology and how these contribute to population-level health disparities, but the process is necessarily inductive. For example, an understanding of the role of hemoglobin S in the etiology of malaria helps to explain population-level disparities in the incidence of sickle-cell anemia between people of West African ancestry in contrast to people of Northern European ancestry. This knowledge did not (and could not) arise from statistical inference in population studies of risk factors and disease, however, but rather from physiological studies and inferences from evolutionary theory (Fix 2003).

In this article I review the foundations of causal inference in etiologic epidemiology as it is applied to studies of racial/ethnic health disparities. I will describe normative applications of quantitative techniques for causal inference as they are practiced in research on discrimination in health care and also for research on innate predisposition. I will then show why the latter is an injudicious application of this statistical methodology, and I will illustrate this point with the example of an influential study in the biomedical literature that purported to demonstrate a lesser response to angiotensin-converting-enzyme inhibitor therapy in black as compared with white patients with left ventricular dysfunction (Exner et al. 2001).

What Epidemiologists Do

Causal inference is fundamentally related to experimentation, which is why the randomized controlled trial (RCT) is widely considered to be the “gold standard” for establishment of causality in biomedicine. In an experimental design such as an RCT, the treatment of interest is assigned or withheld without regard to any measured or unmeasured characteristic of the individual. Therefore, the cumulative experiences of the two groups at the end of the study cannot logically result from any systematic differences in covariate distributions (Rubin 1974). It is the absence of the alternative explanation of an unmeasured common cause that makes a properly conducted RCT such a compelling argument for a causal effect of treatment expressed in counterfactual terms. That is, the treated and untreated groups can each serve as valid substitute populations for the other’s unobserved (counterfactual) experiences, and an effect measure formed by the contrast between the groups is an unbiased estimate of the true unobservable causal contrast between treated and untreated summary values for the target population of interest (Maldonado and Greenland 2002).

In observational studies, in contrast with RCTs, there may exist any number of measured or unmeasured quantities (Z) that are associated with (but not affected by) a point-exposure of interest (X) and causally precede the outcome (Y). In general, such variables will act to confound the observed relation, meaning that the observed association in the data would not converge to the true causal effect as the sample size becomes infinite (Greenland et al. 1999). The true causal effect is the one that would be achieved from an experimental manipulation of X. Formally, if actually forcing X to some specific value x1 would result in a different probability distribution of Y than if we had forced X to some alternate value x2, we say that X has a causal effect on Y, and furthermore that the magnitude of this causal effect can be described as some contrast of these values, such as a difference or a ratio. Pearl developed succinct notation that expresses such an effect without specifying the form of contrast or the levels to which X is fixed (Pearl 2000):

Pr(Y=y|SET[X=x]) (eq. 1)

This notation then allows us to express confounding as the divergence between any such contrast in experimentally manipulated data and the same contrast in passively observed data:

Pr(Y=y|SET[X=x])Pr(Y=y|X=x) (eq. 2)

If confounding can be attributed entirely to covariate(s) Z, then adjustment, for example via standardization, allows for the unbiased estimation of the true causal effect:

Pr(Y=y|SET[X=x])z[Pr(Y=y|X=x,Z=z)Pr(Z=z)] (eq. 3)

where summation is over observed values Z=z, and Z is unaffected by X (Rosenbaum 1984). This is the basic theoretical foundation of statistical adjustment for observational data, and motivates the approach to all etiologic (as opposed to purely descriptive) analyses. The appeal to experimental manipulation for a definition of confounding is essential in order to tie the results back to the real world in some way. That is, the observed data are already a description of the world as it exists factually. Statistical adjustment creates a new set of numbers which do not pertain to the factual world, and therefore one must ask: to what world do they pertain? The answer is that they describe distributions of outcome Y under hypothetical manipulations of the exposure, letting all other quantities run their natural course (Greenland 2005).

The Experimental Model Applied to Disparities Arising from Discrimination

Applying the analytic epidemiologic model described above to health disparities arising from differential treatment is straightforward because the experimental intervention is well-defined. Consider a patient’s socially recognized racial identity in relation to some medical diagnosis, evaluation or procedure, such as referral for right-heart catheterization (Schulman et al. 1999). What we want to know is whether the causal process to refer or not refer, which takes place in the physician’s mind, is affected (consciously or unconsciously) by the patient’s race. In the conduct of an experiment, we can hold all other factors constant. That is, we can present a decision-maker with two patients who are identical in all relevant respects except for the one characteristic of interest. If the physician’s decision is consistently associated with patient’s race over a large number of such matched pairs in this experimental design, then the only plausible interpretation is that the decision-maker is using race in order to decide how to act. If there is no good evidence base for considering race in this way, then the decision-making process is discriminatory in the pejorative sense of the word.

A large number of experimental trials with this basic design have been conducted over the years in order to document the role that patient demographics play in clinical decision-making. For example, Loring and Powell (1988) constructed two psychiatric case presentations that were intended to represent undifferentiated schizophrenia, and these were randomly assigned to be from one of five categories, consisting of each of four race/sex combinations (black/white and male/female) and a fifth group lacking any demographic information. These profiles were then assigned a diagnosis by 290 psychiatrists who returned questionnaires through the mail. Even though the case vignettes were identical except for the demographic information, the proportions of various diagnoses varied substantially by randomly assigned sex and race of the hypothetical patients. For example, black patients, especially black men, were much more likely to be assigned the diagnosis of paranoid schizophrenia, indicating that clinicians perceived in these descriptions greater degrees of violence, suspiciousness and dangerousness than for the identical white patients. Furthermore, of 19 respondents who said that insufficient information was available to form a judgment, 12 had vignettes for which race and gender had not been provided.

Many other experimental trials of this basic design have been conducted over the years to demonstrate the causal role of race in decision-making in a variety of settings, usually in situations in which there is general agreement that conditioning the decision on race has no rational justification. For example, Bertrand and Mullainathan sent fictitious resumes to help-wanted ads in Boston and Chicago newspapers, randomly assigning African-American- or White-sounding names to the otherwise identical applications. In this study, white names received fifty percent more callbacks for interviews. What these experimental studies have in common is the assurance that there are no variables, measured or unmeasured, that are correlated with the exposure (race) and independently predictive of the outcome. The patients or job applicants can have no such characteristics because, in fact, they don’t exist. They are represented only by a written case presentation or job application that is known to be invariant. Any perceived differences in the outcome distribution, therefore, are attributable to the decision-makers’ imposition of racial stereotypes or prejudices onto the imaginary study subjects, which is exactly the effect that one wishes to isolate and quantify.

The Analytic Epidemiologic Model Applied to Disparities Arising from Discrimination

In the experimental designs described above, the causal effect of interest is readily defined:

Pr(Y=y|SET[Race=r]) (eq. 4)

and the “setting” action is literal, because the case vignettes or job applications are physically manipulated. With this model in mind, however, it is easy to extend the same logic to observational studies in which no actual manipulation is achieved, but for which statistical manipulation of the observed data is relied upon as a method for estimating what would happen in an experimental scenario. If there are covariates Z that are associated with (but not affected by) race and which are independently predictive of the outcome, these must be measured and adjusted for in order to have confidence that the association measure in the study has a causal interpretation. Specifically, this causal interpretation is the outcome distribution contrast that would be observed under a randomization of race to the case presentations, rather than contrasting individuals based on their observed races.

For example, Todd and colleagues (2000) sought to determine the causal effect of patient race on receipt of analgesics for extremity fractures in hospital emergency rooms. The authors reviewed records from an urban emergency department in Atlanta, selecting for study all black and white patients presenting with new, isolated long-bone fractures over a multi-year study period. They collected all available medical and demographic information on these patients along with their recorded level of analgesic administration, and used regression modeling to adjust simultaneously for these multiple covariates that are potentially associated with race and are predictive of treatment. For example, if black patients were more likely to be female, and if females generally receive more analgesia, then sex would be a potential confounder of the causal effect of interest. In this case, the authors found that after adjustment for all relevant measured covariates, white patients were significantly more likely than black patients to receive analgesics, despite similar records of pain complaints in the medical record. The risk of receiving no analgesic at all was 66% greater for black patients than for white patients, an effect that could not be explained by any confounding factor known to the investigators.

A less secure basis for causal inference in the observational study of discrimination, in contrast to the experimental designs described above, arises because it is always possible that there is some variable which is known to the decision-maker but not to the analyst, and which creates a spurious association between race and outcome. For example, if black patients in the Todd et al. study were more likely to have some legitimate counter-indication for analgesia that was not recorded in the medical record, then it would appear to the analyst that emergency room physicians were acting irrationally, when in fact their actions were justifiably motivated by this hidden variable. This is the “fundamental problem of causal inference” that plagues all observational research (Holland 1986), and the only general solution is to have sufficient subject matter knowledge and sufficiently good data that one can have confidence that no confounder of any importance remains unmeasured.

Heckman (1998) notes that even if all relevant covariates are measured, other modeling improprieties, such as incorrect specification of the model form, can lead to bias in causal estimation. Furthermore, if the decision is based in part on an unmeasured characteristic, and if it is obtained by exceeding some critical threshold value, then changes in the distribution of the unmeasured characteristic can also lead to observed inequality in the outcome even in the absence of discrimination. This argument was in fact made recently by former Harvard University President Lawrence Summers in his famously controversial speech about gender inequality in science and engineering (Fogg 2005). For example, consider a hypothetical study of gender discrimination in tenure decisions for women scientists at Harvard. Suppose that some unmeasured aspect of mathematical ability has the same mean value in men and in women, but a larger variance in men. If the tenure decision rests on some absolute threshold of ability, then the longer right-tail of the distribution of this unmeasured trait in men will give men a higher probability of exceeding this value, even when there is no discrimination.

The Analytic Epidemiologic Model Applied to Disparities Arising from Innate Factors

Many investigators wish to evaluate hypotheses of racial disparity that arise from intrinsic factors, including genetic differences (Fuller 2003) as well as differences in innate abilities such as intelligence (Rowe DC 2005) or athletic prowess (Marks 2000). The dilemma that arises is that application of the usual epidemiologic analysis rests, as it did in the previous examples, on the direct analogy of the RCT. What we hope to achieve in an analysis of observational data is the effect estimate that we would have been able to observe in the (hypothetical) RCT that we did not conduct. But the obvious problem is that this (hypothetical) RCT is no longer so readily definable for intrinsic factors (Kaufman and Cooper 1999). The problem is definitional because the no-confounding criterion in equation 2 above is expressed in terms of the physical randomization of the study subjects, even if this manipulation is hypothetical. For intrinsic factors, however, the model loses any semblance of interpretability.

This limitation of observational epidemiology has been noted previously (Holland 1986, Holland 2001, Kaufman and Cooper 1999, Zuberi 2001), but even if one grants such a definition, there is still the practical problem of adjusting for a potentially impossible number of covariates. What finite set of measurable quantities can we adjust for in equation 3 to make the treated and untreated groups (e.g. blacks and whites) each serve as valid substitute populations for the other’s unobserved (counterfactual) experiences? (Maldonado and Greenland 2002). Given the pervasive social distinction made between racial/ethnic groups in racially stratified societies, it seems implausible to suggest that within some definable cross-classifications of covariates, blacks and white might be considered to be exchangeable in all respects except for the exposure (Kaufman et al. 1997).

If the basis for causal inference is subverted for non-manipulable (i.e., intrinsic) factors, then how can we speak colloquially about health disparities arising from differential distributions of biologic traits? For example, the age-adjusted 2001 incidence of melanoma in black men in the US was 1.1 per 100,000, whereas for white men the incidence was 21.5 per 100,000, or 20-fold higher (U.S. Cancer Statistics Working Group 2004). Most people would be comfortable attributing this disparity almost entirely to differences in average skin color between these populations, even though this is a trait that is immutable, and for which no randomized intervention is readily conceivable. Furthermore, we might even speak of a counterfactual in which we imagine what would be the incidence in populations of African origin if they were to have light skin, all other factors being equal, and most people would take a number close to 21.5/100,000 as a reasonable estimate for this counterfactual. Or to take a somewhat less obvious example, consider the ΔF508 mutation in the CFTR gene, a trait that in homozygotes leads to an inability to transport salt in cells of the lungs called cystic fibrosis (Rowe SM et al. 2005). US whites have a population prevalence of this mutation of about 1/2800 individuals, whereas for US blacks the mutation occurs in only about 1/35,000 individuals (Phillips et al. 1995). This variant appears to be sufficient (although not necessary) for the development of the cystic fibrosis phenotype, and therefore it seems entirely reasonable to attribute much of the observed racial disparity to differential distribution of this trait at the population level.

What the skin cancer and cystic fibrosis examples demonstrate is that we can make inductive arguments about intrinsic traits as causes of disparities so long as we can assume plausibly that environmental factors are distributed more or less equivalently between the two groups (e.g. ultra-violent radiation and skin cancer in the US) or that environmental factors are essentially irrelevant (e.g. ΔF508 mutation and cystic fibrosis). In this situation, no covariate adjustment (except perhaps age) is necessary, and the causal inference model represented by equation 3 can be expressed simply as:

Pr(Y=y|Race=r) (eq. 5)

This is because subject-matter knowledge assures that there are no factors that are both strongly correlated with race and highly independently predictive of the outcome, and therefore no “setting” (i.e., physical manipulation) of the exposure is necessary. If we were to consider cystic fibrosis mortality as opposed to incidence, however, then this prima facie no-confounding assumption would no longer be tenable, because socioeconomic and other factors that are associated with race can clearly contribute to disease severity and medical care, which in turn affect mortality (Britton 1989, O’Connor et al. 2002).

For racial disparities in common multi-factorial disorders such as cardiovascular disease and common cancers, the inductive inferential process exemplified by the cystic fibrosis incidence example does not readily apply. If there is any single sufficient cause for diseases such a diabetes or hypertension, it remains unknown, and most researchers believe that a large number of genetic and environmental factors contribute interactively to risk, each with a relatively small effect. Differential prevalence of some candidate gene would therefore provide little information about differential prevalence of disease phenotype, since multiple pathways can lead to the same observed disease endpoint. Moreover, we know that contrasting social and economic environments lead to wide variations in phenotype even for groups with common ancestry, such as the comparison of diabetes and hypertension risk across the diverse social environments of the West African Diaspora (Cooper et al. 1997). When aspects of the physical or social environment affect disease risk, and when these are correlated with race, then the effect of race will be confounded, and this confounding cannot be eliminated for both definitional and practical reasons. The definitional reason is that the criteria for no-confounding (equation 2) involves a model of physical manipulation of the exposure that has no interpretability when race is the exposure. The practical reason is that in a racially stratified society, in which race is correlated with virtually all cultural traits, social interactions and economic options, the dimensionality of covariate vector Z in equation 3 is impossibly large. That is, one could never hope to measure sufficiently many characteristics of study subjects to make the two groups exchangeable (i.e., valid substitute populations for the others’ average counterfactual outcome experiences).

The conclusion regarding the epidemiologic approach to observational data on racial health disparities seems unambiguous. When the investigator’s causal hypothesis involves unjustified discrimination on the part of a decision-maker, then the statistical approach to causal inference is generally defensible. The exposure effect can be theoretically randomized, which provides a meaningful standard of what it would mean for the effect to be unconfounded. Furthermore, when the data are observed rather than experimental, so that adjustment for covariates is necessary, the universe of potentially confounding variables is often tractable. This is because the causal process involves the deliberations of a decision-maker, and so the factors that impinge upon this process can themselves be studied and enumerated. On the other hand, when the investigator’s causal hypothesis involves an innate characteristic of the study subject, such as genetics, physiology or psychology, then the statistical approach to causal inference on the basis of multivariate adjustment is difficult to justify. There is no meaning to the randomization model in equation 1, and therefore the no-confounding criterion in equation 2 remains undefined. Furthermore, since there is no hope in the foreseeable future of knowing all of the social and environmental mechanisms that affect risk of chronic diseases such as hypertension and diabetes, and since racism creates imbalances in an essentially infinite set of social and material factors, no adjustment is plausibly adequate. In common practice, adjustment sets are generally so limited that the assertion of causal inference in this context cannot be considered credible.

An Example of the Analytic Epidemiologic Model Applied to Innate Factors

In 2001, the New England Journal of Medicine published an article by Canadian cardiologist Derek Exner, who had collaborated with Jay Cohn of the University of Minnesota to re-analyze the combined arms of the “Studies of Left Ventricular Dysfunction” (SOLVD) trials (Exner et al. 2001). This secondary analysis compared the efficacy of the angiotensin converting enzyme (ACE) inhibitor enalapril with placebo in black and white heart failure patients. Exner et al. concluded that current therapeutic recommendations apply to white patients but not necessarily to black patients, and therefore that future therapeutic recommendations should be tailored according to racial background.

This article has proven quite influential, having been cited nearly 200 times in the peer-reviewed literature as of the end of 2006, as documented by Journal Citation Reports (JCR), a database published by the Institute for Scientific Information (ISI). Specifically, the article has been cited heavily by proponents of race-specific therapies, as evidence of differential response due to innate differences in physiology of disease between blacks and whites (e.g., Cohn 2002, Cohn 2003). Moreover, this article has been cited in treatment guidelines as justification for discouraging use of ACE inhibitors in black patients (e.g., Khan et al. 2004). It was also cited by the US Food and Drug Administration (FDA) to justify the new policy that clinical trial data must be reported by participants’ race (Haga and Venter 2003).

Given this substantial impact, it is reasonable to question whether the logic of the analytic exercise conducted in the Exner et al. article is valid in relation to the critique discussed above. The authors stress the randomized design of the SOLVD trials, as though this would confer to them some advantage in obtaining the desired inference. But this appeal to randomization is either misguided or disingenuous, since the focus of the analyses in this article is not on the average treatment effect of enalapril (i.e., the focus is not on the factor that was randomized), but rather on the effect measure modification of this average treatment effect by another variable, race, which was not randomly assigned. Furthermore, there is much prior information available which suggests that environmental determinants of disease are not well balanced between the racial groups.

The authors began their report by noting that a previously published analysis of the same SOLVD data showed a smaller estimated treatment effect in black than in white patients for two outcomes (death from any cause or hospitalization for heart failure), but that this differential treatment effect did not attain statistical significance after adjustment for measured covariates (p = 0.08). (Dries et al 1999) The authors therefore set out to match each of the black subjects with up to four white subjects on several measured factors: enrollment in either the prevention or the treatment trial, ejection fraction, assigned therapy (enalapril or placebo), sex, and age in three broad groupings. The authors claimed that this matching strategy would increase their statistical power for the comparison of treatment effects by race, but this assertion is in fact questionable because race was not randomized (Greenland and Morgenstern 1990). In cohort studies, matching is primarily indicated in order to reduce costs in the collection of data. Once the data are already collected, however, one can’t generally do better by throwing away a large proportion of these data, as the SOLVD analysts did in this paper. This matching strategy could be justified if the authors suspected effect measure heterogeneity, because in this case the additional white subjects would be “off the support” for the causal comparison of interest. (Manski 1993) In this case, however, the results would be generalizable only to the range of values represented by the matching regime (e.g., ejection fractions at which both black and white subjects were observed). The authors made no such restriction in their interpretation, however.

A total of 6797 participants met the inclusion criteria for the study, 4228 from the prevention trial and 2569 from the treatment trial. Of these, 800 categorized themselves as black. Another 5719 participants categorized themselves as white and thus were eligible to be matched with one of the black participants. Of the 5719 white participants, 1196 (21%) were matched with the black patients; the remaining 79% of white patients were discarded. For 579 (72%) of the black participants there was only a single white match, and for 14 black participants (2%), no suitable white match was found.

Patients were randomly assigned to receive up to 20 mg of enalapril or placebo daily, a dosage that could be modified by individual physicians who naturally were not blinded to the patient’s race. The analysts collected a number of covariates, including medical care factors such as other drug therapy prescriptions, and social factors such as the presence of financial distress during the 12 months before enrollment and the highest level of education attained. The primary outcome measures were deaths from any cause and hospitalization for heart failure.

Compared to matched white patients, black patients were in general younger, had higher mean blood pressure, more exposure to recent financial distress, and lower average attained educational level. They were also much less likely to be using aspirin, beta-blockers, or anti-arrhythmics. In light of substantial differences in other dimensions of health and well-being that exist in the wider society, there is no doubt that many other unmeasured variables also differed substantially between the two groups. The matching procedure used by the authors reduced, but did not eliminate, differences in means for observed variables. For example, the relative probability of taking aspirin comparing white to black patients was reduced from 1.88 overall to 1.73 in the matched participants. It is therefore certain that substantial differences remained between the matched participants in a host of other unmeasured variables, not to mention residual confounding due to categorization of the measured variables (Kaufman et al 1997). For example, 37% of black participants and 24% of matched white participants reported experiencing financial distress before entry into the study, but there is no basis to believe that blacks and white experienced similar levels of deprivation, stress or hardship within the broadly defined category of financial distress.

An important observation is that the absolute risk of the outcomes was higher for black than for matched white participants. That is, even after matching on ejection fraction and other factors, the black group was sicker on average than the white group. For example, 256 (33%) of the matched black participants died, compared to 311 (26%) of the matched white participants (relative risk = 1.25, 95% CI 1.09, 1.44). The discrepancy in baseline risk was even more extreme in the case of the hospitalization, which affected 238 matched black participants (30%) and 226 (19%) matched white participants (relative risk = 1.60, 95% CI 1.37, 1.88). Nonetheless, the authors ignored this discrepancy in baseline risk and focused on the observation that there was an adjusted 49% reduction (95% CI: 30%, 63%) in the risk of heart failure hospitalization for the treated group of matched white patients, whereas among the black patients the adjusted risk reduction was only 14%, and not significantly greater than 0 at the p < 0.05 criterion.

It is well appreciated in the theoretical epidemiologic literature that groups with higher baseline risk will in general have more modest response to treatment when ratio measures of effect are employed, as they were in this analysis (Maldonado & Greenland 2002). This is because the counterfactual ratio that defines that causal effect includes in both the numerator and the denominator the proportion of the population that would experience the outcome under either treatment assignment. In this application, for instance, there is some proportion of the participants p1 that will be hospitalized for heart failure regardless of whether they receive enalapril or placebo, there is a proportion of the participants p2 that will be hospitalized for heart failure only if they receive placebo, and there is a remaining proportion of the participants p3 that will not be hospitalized for heart failure regardless of which treatment they receive. A treatment effect is the contrast between the outcome proportion if the entire study cohort were treated with placebo and the outcome proportion if the entire study cohort were treated with enalapril. This contrast may be constructed as the difference measure [(p1 + p2) − (p1)] = p2, which reflects the fact that those participants susceptible to a treatment effect are those in the p2 group only. A ratio measure of effect, such as that used by these authors, however, takes the contrast as [(p1 + p2)/p1], so that those doomed to be hospitalized under either treatment are no longer cancelled out of the treatment effect measure. For the ratio contrast, an increase in the cohort proportion p1 necessarily forces the measure closer to the null, even if the susceptible population in the p2 group is held constant.

Returning to the Exner et al. analysis, it is clear that a population group with higher baseline risk because they are sicker can be characterized as having a larger proportion p1. Knowing that the sicker black study population has higher baseline risk of hospitalization than for whites, therefore, it would be possible to predict a priori that the ratio measure of treatment effect for this group will be constrained to be closer to the null. (White and Elbourne 2005) Membership in the p1 sub-population simply represents hospitalization through some causal mechanism that is not affected by enalapril. For example, over half of all heart failure hospitalizations result from excessive sodium retention that precipitates volume overload. (Bennett et al. 1998) Patients with good educational and social support resources may successfully avoid hospitalization by regular self-weighing and by then adjusting diuretic dosages in response to fluctuations (Smith et al. 1997). If the social support and patient-physician communication factors that facilitate this successful self-management are unmeasured, as they are in the SOLVD data, then they manifest as a reduced treatment effect simply by inflating the proportion p1 in the cohort sub-population.

Exner et al. do remark in their discussion that “[i]t is also possible that the findings may have resulted from differences between the groups in compliance, diet, medical follow-up, or access to care.” (p. 1356) Nonetheless, they then go on to ignore these caveats and conclude that “the overall population of black patients with heart failure may be underserved by current therapeutic recommendations….[I]t seems appropriate to consider current therapeutic recommendations as applying to white patients but not necessarily to black patients.” The report ends with the suggestion that “therapeutic recommendations may need to be tailored according to racial background.” (p. 1357)

A Critique of the Exner et al Article as an Example of the Analytic Epidemiologic Model Applied to Innate Factors

The Exner et al. article highlights the inherent problems of attempting statistical inferences about innate factors in observational data. The analysis corresponds to no randomized trial that could be described, even hypothetically, because the causal inference desired is the contrast between a treatment effect for blacks and a treatment effect for whites under the premise that groups are balanced on all important causes of the outcome. This premise motivated the crude matching strategy, but the small number of factors matched and the large number of strong unmeasured causes makes this strategy dubious at best. The authors list a few of the important unmeasured variables that presumably remained unbalanced between the two groups, but the stated conclusions appear untempered by this concern, and the article is widely cited as having demonstrated a universally reduced capacity for therapeutic response among blacks as if the assumption were satisfied. Conditioning on measured variables in the analysis, in addition to matching, is a further attempt to create exchangeability between the race groups. For example, the authors adjusted for the binary variable representing financial hardship, but the adjustment would successfully remove the confounding bias due to this variable only if blacks and whites who reported having experienced financial hardship were equivalent with respect to economic and social factors that might relate to risk of hospitalization. Given the clear violation of this assumption demonstrated in extant demographic data, the equality represented in expression 3 above can not be considered credible, even as an approximation.

The discussion above noted that in practice we often allow inductive arguments about intrinsic traits as causes of disparities as long as we can assume plausibly that environmental factors are distributed more or less equivalently between the two groups. In the Exner et al. example, however, this is not the case, since psycho-social and economic factors are known to be unbalanced and highly relevant for a sociological outcome such as hospitalization. The larger baseline difference for this outcome variable makes it even clearer that unmeasured factors that differ between blacks and whites in general are important in determining the attained value. Nonetheless, even though there were two primary outcomes defined, and hospitalization was the weaker of the two (in the sense that baseline risk was substantially higher for blacks and assignment was made without the decision-maker being blinded to patient race), the null finding for mortality is largely ignored when the paper is cited as evidence of differential treatment efficacy. Furthermore, the null finding for the harder endpoint of mortality has been replicated elsewhere (Shekelle et al 2003), and re-analysis of the SOLVD data has shown that hospitalization is unique among possible end-points in demonstrating a treatment effect differential (Dries et al 2002).

In summary, the analytic strategy pursued in the Exner et al paper provides very little insight into the nature of the observed disparites. The authors’ etiologic hypothesis is clearly stated as a physiologic difference between blacks and whites, and yet the analytic epidemiologic model applied to innate factors is dubious for the reasons described above. In this specific application, however, the situation is made even more tenuous by the selection of an outcome for which risk in the unexposed is the most discrepant of all possible outcomes, and for which assignment is made subjectively and without being blinded to patient race. These problems are compounded by an unjustified matching strategy that tosses out about 70% of the available data. The result is a finding that has little inferential value for either supporting or refuting the physiologic hypothesis of interest. Moreover, the observational analysis is designed to mimic a randomized trial that cannot be defined in practical terms. Even allowing that the motivating hypothetical trial remains indefinable, the alternate hypothesis that unmeasured determinants of the outcome are imbalanced between the groups is so likely, and so plausibly consequential, that there is little credibility associated with the assertion that the results of this analysis approximate what would be obtained in the imaginary trial.

The weak basis for inference in the Exner et al. paper must be contrasted not only with its appearance as the lead article in the most prestigious American medical journal, but also with its considerable influence on the field. It is over-interpreted not just as evidence that “black patients with chronic heart failure … derive less benefit from angiotensin-converting enzyme inhibitors, on average, then do whites” (Satel and Klick 2005), but also to support the general notion that “… there is now growing evidence for genetic factors being responsible for individual response to therapy.” (Hovind et al 2004). Indeed, Bond et al (2004) cite the article to support the assertion “[R]acial differences in responses to angiotensin converting enzyme inhibitors are thought to result from both genetic and environmental factors.” Recall that the Exner et al article in fact involved no analysis of genetic variants whatsoever, and yet this rather surprising extrapolation is characteristic of many of the citing articles. For example, in an article on racial variation in the prevalence of atrial fibrillation among heart failure patients, Ruo et al. (2004) cite Exner et al. to support the statement “Another possible explanation for the lower prevalence of atrial fibrillation in African Americans than in Caucasians with heart failure may be intrinsic racial differences in atrial membrane stability, atrial conduction pathways, or genetic polymorphisms leading to different susceptibility to the development of atrial fibrillation. For example, polymorphisms have been found to be associated with racial differences in … response to treatment for heart failure.” Moreover, the Exner et al. article has also been interpreted as extending this logic to other completely unrelated conditions. Hughes and colleagues (2002), for example, review genetic influences on rheumatoid arthritis in blacks, noting that racial variation in therapeutic response for this condition has never been observed. Nonetheless, they cite the Exner et al. analysis as evidence that such racial variation in treatment response might reasonably be expected.

The application of statistical reasoning in epidemiology has a clear foundation, rooted in the notion of randomization that underlies all models, tests and quantitative inferences (Greenland 1990). Observational epidemiology makes use of this paradigm by analogy, arguing that we may at times be able to mimic the process of physical randomization that occurs in trials through statistical adjustment. That is, conditional on some measured factors, we can assert that no important unmeasured determinants of the outcome remain substantially unbalanced between treatment groups. This analogy has often proven quite useful, and has led to insights of enormous public health importance over the previous 50 years, such as the discovery of the causal association between cigarettes smoking and lung cancer. There are settings in which this analogy does not hold, however, and these settings include the attempt to discover innate physiological differences between racial/ethnic groups. The discussion above explains why analytic epidemiology can be used sensibly and successfully to identify discrimination in treatment as a cause of racial disparities in health, but not for the identification of innate predispositions as a cause of racial disparities in health. The article by Exner and colleagues is an example of how such an analysis can be largely unhelpful for evaluating the hypothesis of interest, and in fact even potentially harmful. Despite its questionable inferential validity, however, this article has proven quite influential, and has been cited to support a large number of claims about racial predisposition to disease, many of which have very little to do with the actual content of the article. This apparent eagerness to embrace the message of racial essentialism therefore seems to represent a very strong prior belief on the part of many researchers. Until this strong predilection for racial essentialism in biological thinking abates, there would seem to be little hope that a more sensible and honest approach to statistical inference in observational data will take hold more widely.

Acknowledgments

This work was supported in part by a Robert Wood Johnson Foundation Investigator Award in Health Policy Research. The views expressed imply no endorsement by the Robert Wood Johnson Foundation. Additional support was received from the National Center on Minority Health and Health Disparities (P60 MD000244).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Bennett SJ, Huster GA, Baker SL, Milgrom LB, Kirchgassner A, Birt J, Pressler ML. Characterization of the precipitants of hospitalization for heart failure decompensation. Am J Crit Care. 1998;7(3):168–74. [PubMed] [Google Scholar]
  2. Bertrand M, Mullainathan S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review. 2004;94(4):991–1013. [Google Scholar]
  3. Bond V, Millis RM, Adams RG, Williams D, Obisesan TO, Oke LM, Blakely R, Vaccaro P, Franks BD, Neita M, Davis GC, Lewis-Jack O, Dotson CO. Normal exercise blood pressure response in African-American women with parental history of hypertension. Am J Med Sci. 2004;328(2):78–83. doi: 10.1097/00000441-200408000-00002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Britton JR. Effects of social class, sex, and region of residence on age at death from cystic fibrosis. BMJ. 1989 Feb 25;298(6672):483–7. doi: 10.1136/bmj.298.6672.483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cohn JN. Contemporary treatment of heart failure: Is there adequate evidence to support a unique strategy for African-Americans? Pro position. Current Hypertension Reports. 2002;4(4):307–10. doi: 10.1007/s11906-996-0009-8. [DOI] [PubMed] [Google Scholar]
  6. Cohn JN, Loscalzo J, Franciosa JA. Nitric oxide’s role in heart failure: Pathophysiology and treatment – Introduction. Journal of Cardiac Failure. 2003;9(5):S197–8. doi: 10.1054/s1071-9164(03)00587-6. [DOI] [PubMed] [Google Scholar]
  7. Cooper RS, Rotimi C, Ataman S, McGee D, Osotimehin B, Kadiri S, Muna W, Kingue S, Fraser H, Forrester T, Bennett F, Wilks R. Hypertension prevalence in seven populations of African origin. American Journal of Public Health. 1997;87:160–8. doi: 10.2105/ajph.87.2.160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cooper RS. The role of genetics in ethnic disparities in health. In: Anderson NB, Bulatao RA, Cohen B, editors. Critical Perspectives on Racial and Ethnic Differences in Health in Late Life. Washington, DC: National Academy of Sciences/National Research Council Press; 2004. pp. 269–309. [Google Scholar]
  9. Dries DL, Exner DV, Gersh BJ, Cooper HA, Carson PE, Domanski MJ. Racial differences in the outcome of left ventricular dysfunction. N Engl J Med. 1999 Feb 25;340(8):609–16. doi: 10.1056/NEJM199902253400804. [DOI] [PubMed] [Google Scholar]
  10. Dries DL, Strong MH, Cooper RS, Drazner MH. Efficacy of angiotensin-converting enzyme inhibition in reducing progression from asymptomatic left ventricular dysfunction to symptomatic heart failure in black and white patients. J Am Coll Cardiol. 2002;40(2):311–7. doi: 10.1016/s0735-1097(02)01943-5. [DOI] [PubMed] [Google Scholar]
  11. Exner DV, Dries DL, Domanski MJ, Cohn JN. Lesser response to angiotensin-converting-enzyme inhibitor therapy in black as compared with white patients with left ventricular dysfunction. N Engl J Med. 2001 May 3;344(18):1351–7. doi: 10.1056/NEJM200105033441802. [DOI] [PubMed] [Google Scholar]
  12. Fix AG. Simulating hemoglobin history. Hum Biol. 2003 Aug;75(4):607–18. doi: 10.1353/hub.2003.0053. [DOI] [PubMed] [Google Scholar]
  13. Fogg P. Harvard’s President Wonders Aloud About Women in Science and Math. [accessed 4/1/07)];The Chronicle of Higher Education. 2005 51(21):A12. See full text of Summers’ remarks at: http://www.president.harvard.edu/speeches/2005/nber.html.
  14. Fuller KE. Health disparities: reframing the problem. Med Sci Monit. 2003 Mar;9(3):SR9–15. [PubMed] [Google Scholar]
  15. Goodman AH. Why genes don’t count (for racial differences in health) Am J Public Health. 2000;90(11):1699–702. doi: 10.2105/ajph.90.11.1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Greenland S. Randomization, statistics, and causal inference. Epidemiology. 1990;1(6):421–9. doi: 10.1097/00001648-199011000-00003. [DOI] [PubMed] [Google Scholar]
  17. Greenland S, Morgenstern H. Matching and efficiency in cohort studies. Am J Epidemiol. 1990;131(1):151–9. doi: 10.1093/oxfordjournals.aje.a115469. [DOI] [PubMed] [Google Scholar]
  18. Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14:29–46. [Google Scholar]
  19. Greenland S. Epidemiologic measures and policy formulation: lessons from potential outcomes. [(accessed 4/1/07)];Emerging Themes in Epidemiology. 2005 2:5. doi: 10.1186/1742-7622-2-5. http://www.ete-online.com/content/2/1/5. [DOI] [PMC free article] [PubMed]
  20. Haga SB, Venter JC. Genetics. FDA races in wrong direction. Science. 2003 Jul 25;301(5632):466. doi: 10.1126/science.1087004. [DOI] [PubMed] [Google Scholar]
  21. Heckman JJ. Detecting discrimination. J Econ Perspect. 1998;12(2):101–16. [Google Scholar]
  22. Holland PW. Statistics and casual inference. J Am Stat Assoc. 1986;81(396):945–70. [Google Scholar]
  23. Holland PW. The false linking of race and causality: lessons from standardized testing. Race and Society. 2001;4(2):219–33. [Google Scholar]
  24. Hovind P, Tarnow L, Rossing P, Carstensen B, Parving HH. Improved survival in patients obtaining remission of nephrotic range albuminuria in diabetic nephropathy. Kidney International. 2004;66 (3):1180–6. doi: 10.1111/j.1523-1755.2004.00870.x. [DOI] [PubMed] [Google Scholar]
  25. Hughes LB, Moreland LW, Bridges SL., Jr Genetic influences on rheumatoid arthritis in African Americans. Immunol Res. 2002;26(1–3):15–26. doi: 10.1385/IR:26:1-3:015. [DOI] [PubMed] [Google Scholar]
  26. Kaufman JS, Cooper RS, McGee D. Socioeconomic status and health in Blacks and Whites: The problem of residual confounding and the resiliency of race. Epidemiology. 1997;6:621–8. [PubMed] [Google Scholar]
  27. Kaufman JS, Cooper RS. Seeking causal explanations in social epidemiology. Am J Epidemiol. 1999;150(2):113–20. doi: 10.1093/oxfordjournals.aje.a009969. [DOI] [PubMed] [Google Scholar]
  28. Kaufman JS, Cooper RS. Commentary: considerations for use of racial/ethnic classification in etiologic research. Am J Epidemiol. 2001;154(4):291–8. doi: 10.1093/aje/154.4.291. [DOI] [PubMed] [Google Scholar]
  29. Khan NA, McAlister FA, Campbell NR, Feldman RD, Rabkin S, Mahon J, Lewanczuk R, Zarnke KB, Hemmelgarn B, Lebel M, Levine M, Herbert C Canadian Hypertension Education Program. The 2004 Canadian recommendations for the management of hypertension: Part II--Therapy. Can J Cardiol. 2004;20(1):41–54. [PubMed] [Google Scholar]
  30. Levins R. Is capitalism a disease? The crisis in US public health. Monthly Review. 2000;51(4):8–33. [PubMed] [Google Scholar]
  31. Loring M, Powell B. Gender, race, and DSM-III: a study of the objectivity of psychiatric diagnostic behavior. J Health Soc Behav. 1988;29(1):1–22. [PubMed] [Google Scholar]
  32. Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol. 2002;31(2):422–9. [PubMed] [Google Scholar]
  33. Manski CF. Identification problems in the social sciences. Sociological Methodology. 1993;23:1–56. [Google Scholar]
  34. Marks J. Review of Taboo, by Jon Entine. Human Biology. 2000;72:1074–8. [Google Scholar]
  35. Navarro V, Shi L. The political context of social inequalities and health. Soc Sci Med. 2001;52(3):481–91. doi: 10.1016/s0277-9536(00)00197-0. [DOI] [PubMed] [Google Scholar]
  36. O’Connor GT, Quinton HB, Kahn R, Robichaud P, Maddock J, Lever T, Detzer M, Brooks JG Northern New England Cystic Fibrosis Consortium. Case-mix adjustment for evaluation of mortality in cystic fibrosis. Pediatr Pulmonol. 2002;33(2):99–105. doi: 10.1002/ppul.10042. [DOI] [PubMed] [Google Scholar]
  37. Pearl J. Causality: Models, Reasoning, and Inference. Cambridge, UK: Cambridge University Press; 2000. [Google Scholar]
  38. Phillips OP, Bishop C, Woods D, Elias S. Cystic fibrosis mutations among African Americans in the southeastern United States. J Natl Med Assoc. 1995;87(6):433–5. [PMC free article] [PubMed] [Google Scholar]
  39. Rosenbaum PR. The consequences of adjustment for a concomitant variable that has been affected by the treatment. J Royal Stat Soc, Series A. 1984;147(5):656–66. [Google Scholar]
  40. Rowe DC. Under the skin: On the impartial treatment of genetic and environmental hypotheses of racial differences. American Psychologist. 2005;60:163–73. doi: 10.1037/0003-066X.60.1.60. [DOI] [PubMed] [Google Scholar]
  41. Rowe SM, Miller S, Sorscher EJ. Cystic fibrosis. N Engl J Med. 2005 May 12;352(19):1992–2001. doi: 10.1056/NEJMra043184. [DOI] [PubMed] [Google Scholar]
  42. Rubin DB. Estimating causal effects of treatment in randomized and nonrandomized studies. J Educational Psych. 1974;66(5):688–701. [Google Scholar]
  43. Ruo B, Capra AM, Jensvold NG, Go AS. Racial variation in the prevalence of atrial fibrillation among patients with heart failure: the Epidemiology, Practice, Outcomes, and Costs of Heart Failure (EPOCH) study. J Am Coll Cardiol. 2004;43(3):429–35. doi: 10.1016/j.jacc.2003.09.035. [DOI] [PubMed] [Google Scholar]
  44. Satel S. Medicine’s race problem. Policy Review. 2001;110:49–58. [Google Scholar]
  45. Satel S, Klick J. The Institute of Medicine report - too quick to diagnose bias. Perspectives In Biology and Medicine. 2005;48 (1):S15–25. [PubMed] [Google Scholar]
  46. Shekelle PG, Rich MW, Morton SC, Atkinson CS, Tu W, Maglione M, Rhodes S, Barrett M, Fonarow GC, Greenberg B, Heidenreich PA, Knabel T, Konstam MA, Steimle A, Warner Stevenson L. Efficacy of angiotensin-converting enzyme inhibitors and beta-blockers in the management of left ventricular systolic dysfunction according to race, gender, and diabetic status: a meta-analysis of major clinical trials. J Am Coll Cardiol. 2003;41(9):1529–38. doi: 10.1016/s0735-1097(03)00262-6. [DOI] [PubMed] [Google Scholar]
  47. Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, Gersh BJ, Dube R, Taleghani CK, Burke JE, Williams S, Eisenberg JM, Escarce JJ. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N Engl J Med. 1999 Feb 25;340(8):618–26. doi: 10.1056/NEJM199902253400806. [DOI] [PubMed] [Google Scholar]
  48. Smedley BD, Stith AY, Nelson AR, editors. Unequal treatment: Confronting racial and ethnic disparities in health care. Washington, DC: National Academies Press; 2003. [PubMed] [Google Scholar]
  49. Smith LE, Fabbri SA, Pai R, Ferry D, Heywood JT. Symptomatic improvement and reduced hospitalization for patients attending a cardiomyopathy clinic. Clin Cardiol. 1997;20(11):949–54. doi: 10.1002/clc.4960201109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Stone R. The assumptions on which causal inferences rest. J Royal Stat Soc, Series B. 1993;55(2):455–66. [Google Scholar]
  51. Todd KH, Deaton C, D’Adamo AP, Goe L. Ethnicity and analgesic practice. Annals of Emergency Medicine. 2000;35(1):11–16. doi: 10.1016/s0196-0644(00)70099-0. [DOI] [PubMed] [Google Scholar]
  52. U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2001 Incidence and Mortality Web-based Report Version. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2004. [(accessed 4/1/07)]. Available at: http://www.cdc.gov/cancer/npcr/uscs/ [Google Scholar]
  53. White IR, Elbourne D. Assessing subgroup effects with binary data: can the use of different effect measures lead to different conclusions? [(accessed 4/1/07)];BMC Med Res Methodol. 2005 5(1):15. doi: 10.1186/1471-2288-5-15. Available at: http://www.biomedcentral.com/1471-2288/5/15. [DOI] [PMC free article] [PubMed]
  54. Zuberi T. Thicker than blood: An essay on how racial statistics lie. Minneapolis: University of Minnesota Press; 2001. [Google Scholar]

RESOURCES