Skip to main content
Indian Journal of Thoracic and Cardiovascular Surgery logoLink to Indian Journal of Thoracic and Cardiovascular Surgery
. 2022 Aug 8;38(5):562–565. doi: 10.1007/s12055-022-01401-7

The importance of randomization in clinical research

Varun Sundaram 1,2,3,, Padmini Selvaganesan 1,3, Salil Deo 1,3, Mohamad Karnib 1,2,3
PMCID: PMC9424468  PMID: 36050978

Abstract

Studies evaluating average treatment effects (ATE) of an intervention could broadly be classified into those with observational and randomized designs. Observational studies are limited by confounding, in addition to selection and information bias, making the evaluation of ATE hypothesis generating and not hypothesis testing. Randomization attempts to reduce the systemic error introduced by observational studies by ensuring equal distribution of prognostic factors between the treatment and control groups, thereby confirming that any difference in outcomes observed between the two groups is attributable to the treatment. While randomized controlled trials (RCT) remain the gold standard in estimating ATE of therapeutic interventions, they do have inherent limitations due to uncertain external validity. Observational studies can have a complementary role in enhancing RCTs’ ability to inform routine clinical practice. In this review, we focus on the limitations of observational studies, the need for randomization, interpretation, and the limitations of RCTs.

Keywords: Randomization, RCT, Average treatment effects

Introduction

The field of cardiology and cardiac surgery has significantly evolved in the past 3 decades. For instance, the annual mortality of patients with heart failure and reduced ejection fraction (HFrEF) in the placebo group of the Cooperative North Scandinavian Enalapril Survival Study (CONSENSUS) trial (effects of enalapril on mortality in severe congestive heart failure trial) in 1987 was nearly 50% [1]. Contrastingly, the annual mortality in the placebo group of the most recently conducted Empagliflozin in Patients with Heart Failure, Reduced Ejection Fraction (EMPEROR-REDUCED) trial was only 7.5% [2]. This striking improvement in survival in HFrEF over the last 3 decades is largely attributable to robust randomized trials testing innovations in pharmacotherapy and device therapy [35]. In this review, the author will focus on the limitations of observational studies, the need for randomization, interpretation, and the limitations of randomized controlled trials (RCTs).

Limitations of observational studies

Studies evaluating average treatment effects (ATE) of an intervention could broadly be classified into those with observational and randomized designs. Observational studies evaluating treatment effects are often considered to be hypothesis generating and not hypothesis testing. This is owing to patients not being randomized to treatment and control groups, resulting in significant bias. Mauri et al. compared long-term outcomes of bare-metal vs. drug-eluting stent in patients with acute myocardial infarction (AMI). The investigators conducted this retrospective observational study on an unselected, population-based cohort of patients presenting with AMI to acute care, to nonfederal hospitals in Massachusetts (from the Massachusetts state registry). They observed a significant reduction in the hazard for 2-year mortality in favor of drug-eluting stents [6]. However, subsequent RCTs evaluating long-term outcomes between bare-metal and drug-eluting stents failed to demonstrate this observed difference in mortality [7, 8]. Further investigation revealed that the differences in outcomes between the bare-metal and drug-eluting stents in the observational study were plausibly related to selection bias due to systematic pretreatment differences between the two groups, as opposed to the intervention (i.e., drug-eluting stents being systematically avoided in patients who had poor prognosis, such as those who were terminally ill, or with increased co-morbidity burden) [9]. In addition to selection bias, other important methodological limitations of observational research in estimating ATE include confounding and information bias. Confounding is the situation where the apparent association between the treatment and outcome can be entirely or partially explained by association with another risk factor. While advances and innovations in statistical methodologies have helped minimize confounding, residual confounding (from both measured and unmeasured covariates) is still a major issue in observational epidemiological research. Information bias, also known as observation, classification, or measurement bias, is related to the incorrect measurement of exposures and outcomes. The information on exposure and outcome should be gathered in the same way for both the treatment and control groups. However, in observational studies, the information is often gathered differentially for one group, than for another [10, 11].

Need for randomization

Most treatments in modern medicine demonstrate modest effect size, i.e., 10–30% relative risk reduction, especially when it comes to hard clinical outcomes. Despite extensive adjustment and matching techniques, modest treatment effects could be plausibly missed due to systematic error introduced by observational studies. Large, randomized studies attempt to reduce systematic error, enabling investigators to elicit the modest treatment effects [10]. The process of randomization minimizes selection bias by ensuring equal distribution of prognostic factors between the treatment and control groups, thereby confirming that any difference in outcomes observed between the two groups is attributable to the treatment. Furthermore, randomization renders the treatment and control groups also comparable with regard to unknown or unmeasured prognostic factors, e.g., genetic factors, which might influence the outcome of interest [12, 13]. Common protocols of data gathering for both treatment and control groups, coupled with blinding in a prospective RCT, mitigate the observer’s bias. However, information bias could still be an issue in surgical or procedural RCTs as they are often unblinded. Randomized trials are often done in a multicenter setting at a national or international level. Multicentric trials can lead to faster recruitment and increased sample size, and usually cover a broader population sample. Efforts should be made to maintain a similar quality of care across all centers regarding diagnostic criteria, treatments, interventions, and follow-up.

Methods of randomization

Randomization can occur using different methods. The main schemes include unrestricted, restricted, and stratified randomization.

“Unrestricted randomization” is also known as “simple randomization,” where the allocation of interventions to the study subjects occurs by generating a list derived from random numbers (e.g., flipping a coin). This method might result in an imbalance between groups if the number of participants is not large.

“Restricted” or “block randomization,” also referred to as “balanced randomization,” is a method by which allocation sequencing is split into blocks. There are equal numbers of each allocation (treatment and control) within the blocks. This method ensures that the number of study participants randomized to each arm of intervention is balanced. When using this method, it is crucial for the randomization to be double blinded and to conceal the size of the blocks to the investigators to avoid selection bias.

In “stratified randomization,” all study participants are stratified equally into sub-groups according to a selected risk factor, which is believed to be essential for the outcome (e.g., gender or age). Then, random allocation of interventions is carried out within the sub-groups.

Traditionally, RCTs are designed to demonstrate that one intervention is better than the other, labeled as superiority trials. Non-inferiority trials are RCTs where the new treatment is compared to the standard of care to show that it is not an unacceptably worse option.

Multiple RCTs can be combined in a meta-analysis, which can provide a summary of ATE. However, this comes with a significant risk of selection bias, mainly when selecting studies and extracting the data for the meta-analysis.

Interpretation of the strength of evidence of an RCT, using p values and confidence intervals

There are two main errors that are usually encountered in studies investigating ATE. The first error is the systematic error, which could be mitigated by randomization. However, this error cannot be quantified. The next is the random error, which can be quantified with the estimation of the p value. A well-done large RCT should have no systematic error, with minimal random error (p < 0.05).

(i) The first step in the interpretation of an RCT is to assess if the p value is strong enough to reject the null hypothesis. While the scientific community considers a significance level of 5% for the primary outcome (p < 0.05), a dichotomous interpretation of the p value as < or > 0.5 is a superficial interpretation and does not give information on the strength of evidence. This is akin to interpreting laboratory values as normal or abnormal (e.g., leukocytosis or no leukocytosis). A p value of < 0.001 indicates a 1/1000 chance effect of the observed findings, signifying evidence beyond reasonable doubt. In short, p values should be interpreted as a continuum, with a decreasing p value indicating increasing evidence against the null hypothesis. (ii) The next step in the interpretation of the strength of evidence would involve assessment of 95% confidence intervals (CI). A CI close to 1 indicates a null effect, and the closer the upper limit of 95% CI is to 1, the lesser is the evidence to reject the null hypothesis. Table 1 demonstrates various scenarios using p values and confidence intervals for the strength of evidence [1417]. Scenario A is an example of an RCT where the level of evidence is beyond reasonable doubt (p < 0.001 and the upper limit of the 95% CI is 0.92, reasonably away from 1). Scenario B indicates evidence that demonstrates a moderately significant difference in favor of the treatment group (p = 0.02 and the upper limit of the 95% CI is 0.98: closer to 1). Scenario C and D are examples of decreasing evidence against the null hypothesis.

Table 1.

Summary of scenarios based on p values and confidence intervals

Scenario Trial p value HR, 95% CI Interpretation
A Ticagrelor vs Placebo in Acute Coronary Syndrome (PLATO trial) p < 0.001

HR 0.84

95% CI, 0.77–0.92

Very strong level of evidence, proof beyond reasonable doubt
B Losartan vs Atenolol in Hypertension (LIFE trial) p = 0.02

HR 0.87

95% CI, 0.77–0.98

Moderately significant difference
C

Spironolactone vs placebo in HFpEF

(TOPCAT trial)

p = 0.04

(for secondary outcome)

HR 0.83

95% CI, 0.69–0.99

Weak evidence, but substantial uncertainty
D

Prandial vs Fasting glycemic control in DM

(HEART 2D trial)

p = 0.3

HR 0.98

95% CI, 0.80–1.21

No difference in risk or primary outcome

HR hazard ratio, CI confidence interval, DM diabetes mellitus

Limitations of randomized trials; clinical effectiveness vs clinical efficacy

RCTs are unquestionably the gold standard for determining therapeutic efficacy (i.e., performance of an intervention under ideal circumstances) due to their strong internal validity [12]. However, RCTs do have inherent limitations. The role of traditional RCTs in evaluating therapeutic effectiveness (i.e., performance of therapy under real-world conditions) may be restricted due to its uncertain external validity. Even after careful randomization is done while allocating treatments, generalization of ATE should be done with caution as the studied population may be very different from the population treated in normal life. Multiple studies have demonstrated highly efficacious therapies to be less effective in clinical practice. In traditional RCTs, the strict selection criteria to enroll a defined, homogenous patient population are considered the most common reason for the gap between therapeutic efficacy and effectiveness. The patient characteristics and the clinical outcomes of those enrolled in RCTs may be different than those seen in routine clinical practice. For instance, in the Optimal Medical Therapy with or without Percutaneous Coronary Intervention for Stable Coronary Disease (COURAGE) trial, only 10% of the patients initially screened were randomized, thereby excluding a significant proportion of patients seen in clinical practice, i.e., decreased external validity [18]. The investigators from the Scandinavian countries have attempted to overcome the limitation of external validity by performing studies using randomized registries. The Thrombus Aspiration during ST-Segment Elevation Myocardial Infarction (TASTE) trial and the Norwegian Coronary Stent Trial (NORSTENT) had randomized more than 90% of the initial screened patients, thereby increasing external validity without compromising internal validity [19, 20].

Conclusion

RCTs are indispensable to clinical research and continue to remain the gold standard in estimating ATE of therapeutic interventions. Randomization continues to be the only reliable mechanism to eliminate or minimize systematic error and ensure uniform distribution of measured and unmeasured confounders between the treatment and control groups. While observational studies cannot replace RCTs, they can be performed to establish the clinical effectiveness, which may considerably enhance the ability of RCTs to inform routine clinical practice, guidelines, and health policy.

Funding

None.

Declarations

Ethics approval

Not applicable as this is a review.

Informed consent

Not applicable.

Human and animal rights

Not applicable.

Conflict of interest

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.CONSENSUS Trial Study Group Effects of enalapril on mortality in severe congestive heart failure. Results of the Cooperative North Scandinavian Enalapril Survival Study (CONSENSUS) N Engl J Med. 1987;316:1429–35. doi: 10.1056/NEJM198706043162301. [DOI] [PubMed] [Google Scholar]
  • 2.Packer M, Anker SD, Butler J, et al. Cardiovascular and renal outcomes with empagliflozin in heart failure. N Engl J Med. 2020;383:1413–1424. doi: 10.1056/NEJMoa2022190. [DOI] [PubMed] [Google Scholar]
  • 3.Cleland JGF, Daubert J-C, Erdmann E, et al. The effect of cardiac resynchronization on morbidity and mortality in heart failure. N Engl J Med. 2005;352:1539–1549. doi: 10.1056/NEJMoa050496. [DOI] [PubMed] [Google Scholar]
  • 4.McMurray JJV, Packer M, Desai AS, et al. Angiotensin-neprilysin inhibition versus enalapril in heart failure. N Engl J Med. 2014;371:993–1004. doi: 10.1056/NEJMoa1409077. [DOI] [PubMed] [Google Scholar]
  • 5.Packer M, Bristow MR, Cohn JN, et al. The effect of carvedilol on morbidity and mortality in patients with chronic heart failure. U.S. Carvedilol Heart Failure Study Group. N Engl J Med. 1996;334:1349–55. doi: 10.1056/NEJM199605233342101. [DOI] [PubMed] [Google Scholar]
  • 6.Mauri L, Silbaugh TS, Garg P, et al. Drug-eluting or bare-metal stents for acute myocardial infarction. N Engl J Med. 2008;359:1330–1342. doi: 10.1056/NEJMoa0801485. [DOI] [PubMed] [Google Scholar]
  • 7.Kastrati A, Dibra A, Spaulding C, et al. Meta-analysis of randomized trials on drug-eluting stents vs. bare-metal stents in patients with acute myocardial infarction. Eur Heart J. 2007;28:2706–13. doi: 10.1093/eurheartj/ehm402. [DOI] [PubMed] [Google Scholar]
  • 8.Feinberg J, Nielsen EE, Greenhalgh J, et al. Drug-eluting stents versus bare-metal stents for acute coronary syndrome. Cochrane Database Syst Rev. 2017;8:CD012481. doi: 10.1002/14651858.CD012481.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wong BYL. Drug-eluting versus bare-metal stents in acute myocardial infarction. N Engl J Med. 2009;360:300. doi: 10.1056/NEJMc082174. [DOI] [PubMed] [Google Scholar]
  • 10.Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359:248–252. doi: 10.1016/S0140-6736(02)07451-2. [DOI] [PubMed] [Google Scholar]
  • 11.Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32:51–63. doi: 10.1016/0021-9681(79)90012-2. [DOI] [PubMed] [Google Scholar]
  • 12.Rosenberger WF, Uschner D, Wang Y. Randomization: the forgotten component of the randomized clinical trial. Stat Med. 2019;38:1–12. doi: 10.1002/sim.7901. [DOI] [PubMed] [Google Scholar]
  • 13.Baggerly K. Experimental design, randomization, and validation. Clin Chem. 2018;64:1534–1535. doi: 10.1373/clinchem.2017.273334. [DOI] [PubMed] [Google Scholar]
  • 14.Wallentin L, Becker RC, Budaj A, et al. Ticagrelor versus clopidogrel in patients with acute coronary syndromes. N Engl J Med. 2009;361:1045–1057. doi: 10.1056/NEJMoa0904327. [DOI] [PubMed] [Google Scholar]
  • 15.Ruwald ACH, Westergaard B, Sehestedt T, et al. Losartan versus atenolol-based antihypertensive treatment reduces cardiovascular events especially well in elderly patients: the Losartan Intervention For Endpoint reduction in hypertension (LIFE) study. J Hypertens. 2012;30:1252–1259. doi: 10.1097/HJH.0b013e328352f7f6. [DOI] [PubMed] [Google Scholar]
  • 16.Pitt B, Pfeffer MA, Assmann SF, et al. Spironolactone for heart failure with preserved ejection fraction. N Engl J Med. 2014;370:1383–1392. doi: 10.1056/NEJMoa1313731. [DOI] [PubMed] [Google Scholar]
  • 17.Raz I, Wilson PWF, Strojek K, et al. Effects of prandial versus fasting glycemia on cardiovascular outcomes in type 2 diabetes: the HEART2D trial. Diabetes Care. 2009;32:381–386. doi: 10.2337/dc08-1671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Boden WE, O'Rourke RA, Teo KK, et al. Optimal medical therapy with or without PCI for stable coronary disease. N Engl J Med. 2007;356:1503–1516. doi: 10.1056/NEJMoa070829. [DOI] [PubMed] [Google Scholar]
  • 19.Fröbert O, Lagerqvist B, Olivecrona GK, et al. Thrombus aspiration during ST-segment elevation myocardial infarction. N Engl J Med. 2013;369:1587–1597. doi: 10.1056/NEJMoa1308789. [DOI] [PubMed] [Google Scholar]
  • 20.Bønaa KH, Mannsverk J, Wiseth R, et al. Drug-eluting or bare-metal stents for coronary artery disease. N Engl J Med. 2016;375:1242–1252. doi: 10.1056/NEJMoa1607991. [DOI] [PubMed] [Google Scholar]

Articles from Indian Journal of Thoracic and Cardiovascular Surgery are provided here courtesy of Springer

RESOURCES