Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2013 Aug 30;42(4):1026–1028. doi: 10.1093/ije/dyt124

Rebuttal: When it comes to scientific inference, sometimes a cigar is just a cigar

Kenneth J Rothman 1,2,*, John EJ Gallacher 3, Elizabeth E Hatch 2
PMCID: PMC3781006  PMID: 24062292

We are grateful to the editors for suggesting that our submission become a debate piece, as we value critical discussion. We are gratified that the three invited counterpoints not only agree with our position but add useful insights. Elwood summed up our view when (referring to the White Paper on the U.S. National Children’s Study1) he commented that ‘the concept of external validity given confuses statistical inference with scientific inference’.2 Richiardi et al. echoed our point that representativeness is not desirable even if the goal is to study effect-measure modification: ‘Similarly using non-representative samples may enhance our ability to assess heterogeneity with regards to potential effect modifiers, e.g. by ensuring that there are adequate numbers in each of the ethnic groups to be considered if we suspect or are interested in potential modification by ethnicity’.3 And we especially liked Nohr and Olsen’s quotable remark that ‘Representativeness is time and place specific and will therefore always be a historical concept … .’4

Richiardi et al. suggested that ‘Perhaps Rothman and colleagues go too far in arguing that representativeness should be avoided as a matter of principle, and we consider that there are some situations where representativeness is the most sensible approach. For example, it would be rare for researchers to only study one age-group, and to then attempt to extrapolate their findings to other age-groups, if sufficient numbers and funding were available to also sample adequate numbers from these other age-groups’. But we in fact acknowledged that there is a role for representativeness is certain circumstances, as when ‘public-health professionals may rely on representative samples to describe the health status of specific populations’.5 Nevertheless, when studying effects across a range of a variable such as age, representativeness is not the most effective way to do so, as Richiardi et al. themselves stated.3 We also note that representativeness can mitigate the problem that historically some groups, such as women, children and minorities, have been underrepresented or omitted from research studies. Sampling representativeness, however, is not necessary to fix that problem. Deliberate oversampling of the understudied groups would do so, and be scientifically more efficient.

Thus, despite some quibbles, all three of the counterpoints are well aligned with our contention that representativeness, althoughit may have a place in health surveys, is not a proper goal for scientific studies. The only real disagreement comes from Ebrahim and Davey Smith,6 who expressed concern regarding this emerging consensus. We believe, however, that their defence of the classical view of representativeness does not clarify our understanding of the topic.

Ebrahim and Davey Smith portray us as ‘elevating’ causal hypothesis testing to the level of science and ‘denigrating’ descriptive epidemiology as ‘not science’.6 Our point was simply that not everything that an epidemiologist does is science, in the sense that not every activity of epidemiologists adds to the inventory of statements describing how nature works. We did acknowledge that some studies should use representative sampling. We also said that these studies are ‘informative, but they are not science in the same way that causal studies about how nature operates are science’.5 Yes, we were making a distinction between theoretical and applied work, but references to ‘elevation’ of causal studies and ‘denigration’ of population health surveys come from Ebrahim and Davey Smith, not from us. Distinguishing one set of goals from another does not necessarily imply a hierarchy. When Karl Popper used falsifiability as a demarcation criterion to distinguish scientific statements from metaphysical statements, the distinction was not an attempt to denigrate metaphysics. We consider the placing of the Mars rover on Mars to be a stupendous engineering achievement, but to describe it as engineering and not science is not a slur, and does not detract from the achievement. Sometimes a cigar is just a cigar.

Ebrahim and Davey Smith suggested that non-representative study groups may produce biased associations. Their argument is based on a shaky premise, namely that bias is judged by the difference between the expected study results and the population average, as if the population average represents a unique truth. If the association being measured varies by subgroup, the overall association in a study that is weighted differently from the source population is simply a summary that is weighted to a different standard. If replicating the association in the source population is desired, the investigators can standardize to that distribution. They can do so without sampling subjects according to that weighting, and may have good reason to weight differently from the source population, for example, to maintain study efficiency. The point is that non-representativeness does not produce biased results simply because the study population differs by age or some other factor from a designated target population, unless we circularly define the results to be biased because they are nonrepresentative. It is fine to pursue representativeness when it is needed, as in a health survey, but lack of representativeness does not automatically amount to bias.

We of course agree that selection factors can lead to selection bias (which we would define as systematic errors stemming from procedures or factors involved in subject participation). For the bias affecting the relation to alcohol consumption and stroke in the American Cancer Society study7 to materialize, Ebrahim and Davey Smith theorized that the study participants may have comprised heavy drinking epidemiologists with a low risk of stroke, because they smoke less, exercise heavily and exhibit health consciousness, but the participants who were lighter drinkers smoked more, exercised less and showed less health awareness, and consequently had a comparatively high risk of stroke. It seems unlikely that these assumptions could account for the magnitude of bias that Ebrahim and Davey Smith attribute to them, but even if this dubious situation did occur, representative sampling would tend to obscure rather than solve the problem, as the study would still comprise volunteers and health awareness would still be a possible selection factor distinguishing study participants from those who choose not to participate. The bias could be controlled, with or without representative sampling, by measuring and controlling for health awareness, using information about health-seeking behaviour such as medical screening visits, influenza vaccinations and other indicators of the selection factor underlying their concern. Behaviours such as smoking and exercise, which Ebrahim and Davey Smith name as sources of the supposed bias, were actually measured and controlled in the American Cancer Society study.7

We think it unwise to assume, as Ebrahim and Davey Smith have done in their discussion of studies of vitamin C and antioxidants, that any finding from a randomized trial must be valid and would overrule contradictory findings from non-experimental studies. For example, no one seems to think that a randomized trial that showed that smoking cessation was associated with an increase in lung cancer8 should trump evidence on smoking and lung cancer from other sources, which include the cohort study of 35 000 highly selected, male, UK doctors who were followed for decades.9 Trials also harbour sources of uncertainty and error, and to argue in favour of representativeness because some cohort studies based on volunteers obtain results that differ from those of selected trials (also based on volunteers) is simplistic. Reconciling conflicting results is challenging, but one should be wary of simply ascribing the differences to study design, as Hernán illustrated in his reconciliation of the conflicting results from trials and cohort studies reporting the relation between hormone replacement therapy and cardiovascular mortality.10

Ebrahim and Davey Smith argue that lack of representativeness leads to greater confounding in epidemiological studies (‘many variables of interest will be associated with participation and essentially volunteer samples may suffer from greater degrees of confounding than less selected samples’6). One of our points was that traditional design of experiments involves holding potential confounding factors constant, to prevent confounding. In contrast, Ebrahim and Davey Smith appear to believe that confounding is reduced by replicating the population associations within the study population, rather than holding variables constant. But that would merely reproduce in the study population whatever confounding exists in the parent population. Consider age, for example: perfect representative sampling will not reduce age confounding in a study if age is confounding in the source population. In contrast, restriction by age would do so. True, restriction would prevent studying the extent to which age is an effect-measure modifier, but as stated earlier, if that is a study goal, representative sampling is an inefficient way to achieve it.

These arguments are not merely academic. Epidemiological studies focused on causal explanations are often large and expensive and can do without the burden of attempting to achieve representativeness. As the U.S. National Children’s Study illustrates, the futile pursuit of representativeness where it is not needed can be extraordinarily costly. We acknowledge that descriptive studies that are intended to be applicable to specific populations at a specific time may need to pursue representativeness. The latter are specialized studies, whose design should not carry over into aetiological research. A clear separation of these two aims will reduce unnecessary costs and lead to better studies. Fortunately, all the writers contributing to this discussion seem to agree on some basics. Even Ebrahim and Davey Smith, despite considerable areas of disagreement, offer a concluding sentence that could easily have characterized our position: ‘We feel that representativeness should neither be avoided nor uncritically universally adopted, but its value evaluated in each particular setting’.6

Acknowledgement

We are grateful to Lauren Wise for helpful suggestions.

Funding

K.J.R. and E.E.H. were supported by grant # R01 HD-060680 from the National Institute of Child Health and Human Development. J.E.J.G. was supported by funding from the UK Biobank.

Conflict of interest: None declared.

References


Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES