The research design of the randomised controlled trial is primarily associated today with medicine. It tends either to be ignored or regarded with suspicion by many in such disciplines as health promotion, public policy, social welfare, criminal justice, and education. However, all professional interventions in people’s lives are subject to essentially the same questions about acceptability and effectiveness. As the social reformers Sidney and Beatrice Webb pointed out in 1932, there is far more experimentation going on in “the world sociological laboratory in which we all live” than in any other kind of laboratory, but most of this social experimentation is “wrapped in secrecy” and thus yields “nothing to science.”1
Summary points
Many social scientists argue that randomised controlled trials are inappropriate for evaluating social interventions, but they ignore a considerable history, mainly in the United States, of the use of randomised controlled trials to assess different approaches to public policy and health promotion
A tradition of experimental sociology was well established by the 1930s, built on the early use of controlled experiments in psychology and education
From the early 1960s to early 1980s randomised experiments were considered the optimal design for evaluating public policy interventions in the United States, and major evaluations using this design were carried out
This approach became less popular as policy makers reacted negatively to evidence of “near zero” effects
Lessons to be learnt about implementing randomised controlled trials in real life settings include the difficulty of assessing complex multi-level interventions and the challenge of integrating qualitative data
The Webbs argued for a more “scientific” social policy, with social scientists being trained in experimental methods and evaluations of social interventions being carried out by independent investigators. They were apparently unaware that a strong tradition in experimental sociology had already been established, mainly in the United States. This was a precursor to a period between the early 1960s and the late 1980s when randomised controlled trials became the ideal for American evaluators assessing a wide range of public policy interventions. This history is conveniently overlooked by those who contend that randomised controlled trials have no place in evaluating social interventions. It shows clearly that prospective experimental studies with random allocation to generate one or more control groups is perfectly possible in social settings. Notably, too, the history of experimentation in social science predates that in medicine in certain key respects.
A short history of control groups
The original meaning of “control” is “check”—the word comes from “counter-roll,” a duplicate register or account made to verify an official account.2 The term “control” entered scientific language in the 1870s in the sense of a standard of comparison used to check inferences deduced from an experiment. The main use of the term was in experimental psychology.3
In 1901 the American educationalists Thorndike and Woodworth identified the need for a control group in their experiments on the use of training to improve mental function.4 A series of experiments with schoolchildren that addressed questions about the transferability of memory skills from one subject to another, reported by Winch in 1908,5 were among the first to use the design of pretest, intervention, post-test in the experimental group and pretest, nothing, post-test in the control group. These educational and psychology researchers invented randomised assignment to experimental treatments and Latin square designs independently of, and considerably earlier than, R A Fisher’s work at the Rothamsted Agricultural Research Station.6 The psychologist C S Peirce introduced both the idea of randomisation and that of “blindness” into psychology experiments in the 1880s.7
Selection of experimental and control subjects by means of the principle of chance is described in McCall’s How to Experiment in Education, published in 1923: “Representativeness [of research subjects] can be secured by making a chance selection from the total group, or a chance selection from a chance portion of the total group .... Just as representativeness can be secured by the method of chance, so equivalence may be secured by chance .... One method of equating by chance is to mix the names of the subjects to be used. Half may be drawn at random. This half will constitute one group while the other half will constitute the other group.”8 McCall’s book also describes the Latin square design under the name of the “rotation experiment”; this had been used in educational experiments as early as 1916.9
The major impetus driving these new approaches to assessing effectiveness was not the desire to imitate natural science, but, rather, to respond to an uneasiness within the research community of educational psychology about the inability of existing evaluation methods to rule out plausible rival hypotheses. Similar methodological developments were occurring in other spheres. For example, in 1924-5 an experiment using a mail campaign to increase electoral turnout was carried out in Chicago, in which housing precincts were assigned either to receive individual mail appeals or not.10 This experiment followed earlier research which had suggested that the strength of local party organisation was the main factor distinguishing voters from non-voters, but the research design used in the first study had made it impossible to have confidence in this finding. Thus, in the social field as well as later in medicine, the advantages of prospective experimental studies with randomly chosen controls were seen to offer an important solution to the problem of linking intervention with outcomes.
Experimental sociology
Two other American social scientists, Ernest Greenwood at Columbia University and F Stuart Chapin at the University of Minnesota, pioneered the application of experimental methods to the study of social problems in the early decades of the 20th century. Chapin first wrote on this theme in 1917; his Experimental Designs in Sociological Research, published in 1947, details nine experimental studies carried out by his research team and a number undertaken elsewhere covering such topics as rural health education, the social effects of public housing, recreation programmes for “delinquent” boys, and the effects of student participation in extracurricular activities.11 Chapin was particularly interested in reviewing the use of experimental research designs in “the normal community situation” because of the objection, voiced at the time, that experimental studies could only be done in “laboratory” settings.
Ernest Greenwood’s Experimental Sociology, published in 1945, outlined the theoretical rationale for applying experimental methods to social issues.12 He defined an experiment as “the proof of a causal hypothesis through the study of two controlled contrasting situations,” recommended the use of case studies as a prelude to experimental research, and supported Fisher’s strategy of randomisation as the best way of securing equivalent study groups. Chapin’s and Greenwood’s interest in experimental research designs was stimulated by the social reform concerns of the Depression, and informed by a desire to establish the most effective methods of improving people’s lives. Their work was part of a general move in the United States to make social science more experimental; by 1931 at least 26 universities there were offering courses in experimental sociology.13
A golden age of evaluation
Donald Campbell and Julian Stanley’s Experimental and Quasi-experimental Designs for Research published in 196614 is to social research what Fisher’s Design of Experiments (1935) is to medical research. Campbell’s paper “Reforms as experiments” established an explicit link between social reform and the use of rigorous experimental design.15 His complaint that the randomised control group design had not often been used in the social arena prompted another American experimentalist, Robert Boruch, to publish a bibliography of these in 1974.16 This listed 83 “randomised field experiments” in such areas as criminal justice, legal policy, social welfare, education, mass communications, and mental health. A revised version of the bibliography produced four years later updated the total in these areas to 245.17
This period in the United States has been nicknamed the “golden age of evaluation.”18 It was one in which there was an enormous burst of activity in applying the randomised controlled trial design to the evaluation of public policy. The table shows nine of the major evaluations of broadly based social programmes initiated between the 1960s and early 1980s. Four of the studies were of income maintenance experiments,19–23 one focused on an experimental housing allowance scheme,24,25 two examined programmes for supporting disadvantaged workers,19,26 and two examined interventions for former prison inmates.27 All the studies included one or more prospectively generated control groups, either by some method of random allocation or by matching. Supporting all this effort was a government mandate specifying that 1% of budgets for social programmes had to be spent on evaluation. There was widespread recognition that social services were in a mess while expenditure on them was rising exponentially; and, for a time at least, there was a consensus in policy circles that randomised controlled experiments provided the best way of assessing effectiveness.
Other evaluations (not shown in the table) carried out during this period included the Manhattan bail bond experiment with pre-trial release for prisoners,28 the Rand Corporation’s well known study of health insurance (several components of which used a randomised controlled trial design),29 and studies of educational performance contracting.30
The reasons why the use of randomised controlled trials in evaluating policy interventions has declined in attractiveness in the United States over the past 20 years are as interesting as those explaining its acceptance in the first place. A primary one was disenchantment with the apparent ineffectiveness (sometimes seemingly damaging effects) of the interventions in some of the evaluations. Secondly, policy makers were often impatient with the length of time it took for evaluations of their favoured approaches to provide answers: this was particularly marked in the case of the income experiments. As Senator Moynihan appositely said, “The bringing of systematic inquiry to bear on social issues is not an easy thing. There is no guarantee of pleasant and simple answers, but if you make a commitment to an experimental mode it seems to me ... something larger is at stake when you begin to have to deal with the results.”31
Conclusions
All claims to successful expertise need to tackle the issue of causal inference—how do people know that what they do works, and how can they reasonably demonstrate this to others? As Stanley noted in 1957, “Expert opinions, pooled judgements, brilliant intuitions, and shrewd hunches are frequently misleading.”32 Among the reasons why randomised controlled trials gained legitimacy in medicine was the realisation that the decisions of the medical profession need to be regulated.33 The history of social experimentation indicates clearly that all the same issues have attended attempts to evaluate the impact of social interventions.
Experts in the social domain, like those in medicine, have resisted the notion that rigorous evaluation of their work is more likely to give reliable answers than their own individual preferences. When randomised controlled trials find that new “treatments” are no better than old ones, a retreat to other methods of evaluation is particularly likely, as though the prime task is not to identify whether anything works but to prove that something does.
The forgotten history of social experimentation also shows that, as in clinical research, implementing randomised controlled trials in real life settings commonly carries a number of hazards: low participation rates or high attrition, problems with “informed consent,” unanticipated side effects of the intervention, a problematic relation between research and policy.
There are many lessons to be learnt from this experience about the challenges of randomised controlled trials, including the difficulty of establishing the effectiveness of complex multi-level interventions and the problem of integrating ethnographic or qualitative data. But, as Chapin wrote in 1931, “Experimental method in sociology does not mean interference with individual movement or freedom. It does not endanger life or limb or moral character.”34 On the contrary, what randomised controlled trials offer in the social domain is exactly what they promise to medicine: protection of the public from potentially damaging uncontrolled experimentation and a more rational knowledge about the benefits to be derived from professional intervention.
Table.
Trial | Years | Aim | Design | Outcomes assessed |
---|---|---|---|---|
Income maintenance experiments | ||||
New Jersey-Pennsylvania negative income tax experiment (see Ferber and Hirsch,19 Rossi and Lyall20) | 1968-72 | To study effects on work incentives of negative income tax | Random allocation of 1216 low income families to 8 intervention and 1 control groups | Participation in labour force; consumption expenditure; health and family behaviour; school attendance |
Rural negative income tax experiment (see Ferber and Hirsch,19 Maynard21) | 1970-2 | To replicate above experiment in poor rural areas with non-intact families with female or male heads | Stratified random allocation of 809 low income families to 5 intervention and 1 control groups | Participation in labour force; consumption expenditure; health and family behaviour; school attendance |
Gary income maintenance experiment (see Ferber and Hirsch,19 Kehrer,22 Kehrer and Wolin23) | 1971-4 | To study effects on participation in labour force and other family behaviours of different levels and forms of income maintenance, day care subsidies, and information and referral services | Stratified random allocation of 1799 low income, single parent families | Participation in labour force; consumption expenditure; health and family behaviour; school attendance; social and psychological attitudes |
Denver-Seattle income maintenance experiments (see Ferber and Hirsch,19 Rossi and Lyall20) | 1970-91 | To study effects on participation in labour force and other household behaviours of different levels and forms of income maintenance, job counselling, and training subsidies | Stratified random allocation of 2042 families allocated to 84 experimental “cells” with different combinations of support levels, tax rates, etc, and 1 control group | Participation in labour force; consumption expenditure; health and family behaviour; school attendance |
Housing allowances | ||||
Experimental housing allowance program (demand experiment) (see Bradbury and Downs,24 Friedman and Weinberg25) | 1978-80 | To study effects on households’ housing behaviour of different forms of housing allowances and estimate of cost effectiveness | Stratified random allocation of 2241 low income households to 17 intervention groups with different housing allowance formulae and 2 control groups | Quality of housing; housing consumption behaviour; mobility |
Supporting workers programmes | ||||
Supported work program (see Ferber and Hirsch19) | 1975-8 | To study effects and costs of supported work environment for disadvantaged workers | Random allocation of 6616 disadvantaged workers to 1 intervention and 1 control group | Participation in labour force; hours worked; total earnings |
Texas worker adjustment program (see Bloom26) | 1984-5 | To study effects and costs of combination of job search assistance and occupational skills training for displaced workers | Random allocation of 2259 hard to employ individuals by random numbers table to 2 intervention and 1 control groups on 1 site, and 1 intervention and 1 control groups on 2 sites | Earnings; unemployment; unemployment benefits |
Penal experiments | ||||
Living insurance for ex-prisoners (LIFE) (see Rossi et al27) | 1971-4 | To study effects on re-arrests and participation in labour force of different levels of post-release payment and job assistance schemes | Stratified random allocation of 432 released prisoners to 3 intervention groups (payments only, counselling and placement only, both combined) and 1 control group | Arrests and convictions by type of offence; participation in labour force; health and living arrangements |
Transitional aid research project (TARP) (see Rossi et al27) | 1975-7 | To study effects on re-arrests and participation in labour force of different levels of post-release payment and job assistance schemes | Stratified random allocation of 3982 released prisoners to 4 intervention groups with combinations of different payment periods and tax rates (3 groups) and job placement services (1 group) plus 2 control groups | Arrests and convictions by type of offence; participation in labour force; health and living arrangements |
References
- 1.Webb S, Webb B. Methods of social study. London: Longmans Green; 1932. pp. 224–225. [Google Scholar]
- 2.Boring EG. The nature and history of experimental control. Am J Psychol. 1954;LXVII:573–589. [PubMed] [Google Scholar]
- 3.Murphy G, Murphy LB, Newcomb TM. Experimental social psychology. New York: Harper and Bros; 1937. [Google Scholar]
- 4.Thorndike EL, Woodworth RS. The influence of improvement in one mental function upon the efficiency of other functions. Psychol Rev. 1901;8:247–261. , 384-95, 553-64. [Google Scholar]
- 5.Winch WH. The transfer of improvement in memory in school-children. Br J Psychol. 1908;2:284–293. [Google Scholar]
- 6.Campbell DT. Methodology and epistemology for social science: selected papers. Chicago: University of Chicago Press; 1988. p. 321. [Google Scholar]
- 7.Stigler SM. The history of statistics. Cambridge, MA: Belknap Press; 1986. p. 253. [Google Scholar]
- 8.McCall WA. How to experiment in education. New York: Macmillan; 1923. pp. 38–41. [Google Scholar]
- 9.Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin; 1966. p. 2. [Google Scholar]
- 10.Gosnell HF. Getting out the vote: an experiment in the stimulation of voting. Westport, CT: Greenwood Press; 1977. (Originally published 1927.) [Google Scholar]
- 11.Chapin FS. Experimental designs in sociological research. New York: Harper and Row; 1947. [Google Scholar]
- 12.Greenwood E. Experimental sociology. New York: Octagon Books; 1976. p. 72. . (Originally published 1945.) [Google Scholar]
- 13.Brearley HC. Experimental sociology in the United States. Social Forces 1931;Dec:196-9.
- 14.Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin; 1966. [Google Scholar]
- 15.Campbell DT. Reforms as experiments. Am Psychol. 1924;4:409–429. [Google Scholar]
- 16.Boruch RF. Bibliography: illustrative randomized field experiments for program planning and evaluation. Evaluation. 1974;2:83–87. doi: 10.1177/0193841x7800200411. [DOI] [PubMed] [Google Scholar]
- 17.Boruch RF. Randomized field experiments for program planning, development, and evaluation. Eval Q. 1978;2(4):655–695. doi: 10.1177/0193841x7800200411. [DOI] [PubMed] [Google Scholar]
- 18.Rossi PH, Wright JD. Evaluation research: an assessment. Ann Rev Sociol. 1984;10:331–352. [Google Scholar]
- 19.Ferber R, Hirsch WZ. Social experimentation and economic policy. Cambridge: Cambridge University Press; 1982. [Google Scholar]
- 20.Rossi PH, Lyall KC. Reforming public welfare. New York: Russell Sage Foundation; 1976. [Google Scholar]
- 21.Maynard R. The effects of the rural income maintenance experiment on the school performance of children. Am Econ Rev. 1977;67:370–375. [Google Scholar]
- 22.Kehrer KC. The Gary income maintenance experiment: summary of initial findings. In: Cook TD, DelRosario ML, Hennigan KM, Mark MM, Trochim WMK, editors. Evaluation studies review annual. No 3. Beverly Hills, CA: Sage; 1978. pp. 437–444. [Google Scholar]
- 23.Kehrer KC, Wolin CM. Impact of income maintenance on low birthweight: evidence from the Gary experiment. J Hum Resource. 1979;XIV(4):434–462. [PubMed] [Google Scholar]
- 24.Bradbury KL, Downs A, editors. Do housing allowances work? Washington, DC: Brookings Institution; 1981. [Google Scholar]
- 25.Friedman J, Weinberg DH, editors. The great housing experiment. Beverly Hills, CA: Sage; 1983. [Google Scholar]
- 26.Bloom HS. Back to work: testing re-employment services for displaced workers. Kalamazoo, MI: WE Upjohn Institute for Employment Research; 1990. [Google Scholar]
- 27.Rossi PH, Berk RA, Leniham KJ. Money, work and crime: experimental evidence. New York: Academic Press; 1980. [Google Scholar]
- 28.Botein B. The Manhattan bail project: its impact on criminology and the criminal law process. Texas Law Rev. 1965;43:319–331. [Google Scholar]
- 29.Manning WG, Leibowitz A, Goldberg GA, Rogers WH, Newhouse JP. A controlled trial of the effect of a prepaid group practice on use of services. N Engl J Med. 1984;310:105–110. doi: 10.1056/NEJM198406073102305. [DOI] [PubMed] [Google Scholar]
- 30.Gramlich EM, Koshel PP. Educational performance contracting. Washington, DC: Brookings Institution; 1975. [Google Scholar]
- 31.Nathan RP. Social science and government. New York: Basic Books; 1988. p. 61. [Google Scholar]
- 32.Stanley JC. Controlled experimentation in the classroom. J Exp Educ. 1957;25:195–201. [Google Scholar]
- 33.Matthews JR. Quantification and the quest for medical certainty. Princeton, NJ: Princeton University Press; 1995. [Google Scholar]
- 34.Chapin FS. The problem of controls in experimental sociology. J Educ Sociol. 1931;IV:541–551. [Google Scholar]