Skip to main content
JAMA Network logoLink to JAMA Network
. 2020 Apr 17;3(4):e203082. doi: 10.1001/jamanetworkopen.2020.3082

Prevalence of Multiplicity and Appropriate Adjustments Among Cardiovascular Randomized Clinical Trials Published in Major Medical Journals

Muhammad Shahzeb Khan 1, Maaz Shah Khan 2, Zunaira Navid Ansari 2, Tariq Jamal Siddiqi 2, Safi U Khan 3, Irbaz Bin Riaz 4, Zain Ul Abideen Asad 5, John Mandrola 6, James Wason 7,8, Haider J Warraich 9,10, Gregg W Stone 11, Deepak L Bhatt 12, Samir R Kapadia 13, Ankur Kalra 13,
PMCID: PMC7165301  PMID: 32301992

This cross-sectional study assesses the prevalence of multiplicity and whether appropriate multiplicity adjustments were performed among cardiovascular randomized clinical trials published in 6 medical journals with a high impact factor.

Key Points

Questions

What is the prevalence of multiplicity among cardiovascular randomized clinical trials published in 6 medical journals with a high impact factor, and how frequently are multiplicity adjustments made in these trials?

Findings

In this cross-sectional study, data were collected from the past issues of 6 journals with high impact factors published between August 1, 2015, and July 31, 2018. Of 511 cardiovascular randomized clinical trials included in this analysis, 300 had some form of multiplicity; of these 300, only 85 adjusted for multiplicity.

Meaning

Among contemporary cardiovascular randomized clinical trials, it appears that multiplicity adjustments are infrequently reported.

Abstract

Importance

Multiple analyses in a clinical trial can increase the probability of inaccurately concluding that there is a statistically significant treatment effect. However, to date, it is unknown how many randomized clinical trials (RCTs) perform adjustments for multiple comparisons, the lack of which could lead to erroneous findings.

Objectives

To assess the prevalence of multiplicity and whether appropriate multiplicity adjustments were performed among cardiovascular RCTs published in 6 medical journals with a high impact factor.

Design, Setting, and Participants

In this cross-sectional study, cardiovascular RCTs were selected from all over the world, characterized as North America, Western Europe, multiregional, and rest of the world. Data were collected from past issues of 3 cardiovascular journals (Circulation, European Heart Journal, and Journal of the American College of Cardiology) and 3 general medicine journals (JAMA, The Lancet, and The New England Journal of Medicine) with high impact factors published between August 1, 2015, and July 31, 2018. Supplements and trial protocols of each of the included RCTs were also searched for multiplicity. Data were analyzed December 20 to 27, 2018.

Exposures

Data from the selected RCTs were extracted and verified independently by 2 researchers using a structured data instrument. In case of disagreement, a third reviewer helped to achieve consensus. An RCT was considered to have multiple treatment groups if it had more than 2 arms; multiple outcomes were defined as having more than 1 primary outcome, and multiple analyses were defined as analysis of the same outcome variable in multiple ways. Multiplicity was examined only for the analysis of the primary end point.

Main Outcomes and Measures

Outcomes of interest were percentages of primary analyses that performed multiplicity adjustment of primary end points.

Results

Of 511 cardiovascular RCTs included in this analysis, 300 (58.7%) had some form of multiplicity; of these 300, only 85 (28.3%) adjusted for multiplicity. Intervention type and funding source had no statistically significant association with the reporting of multiplicity risk adjustment. Trials that assessed mortality vs nonmortality outcomes were more likely to contain a multiplicity risk in their primary analysis (66.3% [177 of 267] vs 50.4% [123 of 244]; P < .001), and larger trials vs smaller trials were less likely to make any adjustments for multiplicity (35.6% [52 of 146] vs 21.4% [33 of 154]; P = .001).

Conclusions and Relevance

Findings from this study suggest that cardiovascular RCTs published in medical journals with high impact factors demonstrate infrequent adjustments to correct for multiple comparisons in the primary end point. These parameters may be improved by more standardized reporting.

Introduction

Previous studies1,2,3,4 have raised concerns about selective reporting of outcomes in randomized clinical trials (RCTs). However, few reports have focused on multiplicity, which (along with incomplete reporting) is a major factor contributing to nonreproducibility of published claims.5 Multiplicity refers to the “potential inflation of type I error rate as a result of multiple testing, for example because of multiple subgroup comparisons, comparisons across multiple treatment arms, analysis of multiple outcomes, and multiple analyses of the same outcome at different times.”6

Negative consequences associated with multiplicity could be prevented by complete and accurate reporting of analyses outlined in the registered trial protocols. Multiplicity could also be mitigated by statistical adjustment when multiple analyses are specified a priori. Several statistical methods, such as defining coprimary outcome variables, performing various stepwise procedures,7,8,9,10 applying methods for multiple-group comparisons11,12 and including gatekeeping or hierarchical testing, have been proposed for multiplicity adjustment.13,14

To our knowledge, no study has reported on the prevalence of multiplicity among cardiovascular RCTs and, when applicable, whether appropriate multiplicity adjustments were implemented. To fill this knowledge gap, we conducted a cross-sectional study of cardiovascular RCTs published in medical journals with high impact factors to assess the reporting quality of statistical analyses, including the frequency with which multiplicity adjustments were reported.

Methods

This cross-sectional study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.15 It also followed methods from the American Heart Association on standards for cardiac prevention and treatment studies.16

Data Sources and Search Strategy

Three cardiovascular journals (Circulation, European Heart Journal, and Journal of the American College of Cardiology) and 3 general medicine journals (JAMA, The Lancet, and The New England Journal of Medicine) published between August 1, 2015, and July 31, 2018, were searched for general trial characteristics, multiplicity error, and multiplicity correction to assess the pool of recent and contemporary cardiovascular clinical trials. Data were analyzed December 20 to 27, 2018. These journals were chosen based on their high impact factor, broad readership, and reputation of publishing important clinical trials used in the development of guidelines. Supplements and trial protocols of each of the included RCTs were also searched for general trial characteristics, multiplicity error, and multiplicity correction.

Study Selection

Articles were selected if they reported results of cardiovascular RCTs and compared at least 2 treatment groups. Excluded were brief communications, research letters, and animal studies. Data from the selected RCTs were extracted and verified by 2 of us (M.S.K. and Z.N.A.) independently using a structured data instrument and then cross-checked by another of us (T.J.S).

Data Extraction

Data were extracted from both the primary and secondary articles. Primary articles were defined as reports on an empirical research study conducted by the authors analyzing data collected for the first time, while secondary articles were studies derived from data collected and analyzed from primary articles. We analyzed and extracted data only from the analysis of the primary end point of each RCT because multiplicity in a secondary analysis is generally exploratory or hypothesis generating. A multiplicity coding manual was developed to investigate the reporting of primary statistical analyses, multiple analyses, and adjustments for multiplicity issues (eAppendix in the Supplement). The multiplicity coding manual was pretested and modified by coding 15 articles initially. Two of us (M.S.K. and Z.N.A.) coded each article separately and discussed any inconsistencies in the data and modified the multiplicity coding manual accordingly. The rest of the articles were then coded according to this multiplicity coding manual. The complete published articles were searched for general trial characteristics, multiplicity error, and multiplicity correction by the coders, along with additional supplementary material (eg, trial protocols and appendixes if they were referred to in the article). The order of the articles was randomized for each coder.

To measure the extent of agreement between the 2 independent coders, the κ statistic was used and calculated according to the methods by Landis and Koch.17 The frequency of discrepancies between the coders was computed using the Kappa Calculator (Statistics Solutions),18 and the κ statistics were assessed for several outcomes. There was substantial agreement in reproducibility for the presence of multiplicity, with κ = 0.76 (95% CI, 0.51-0.90) in the main text and κ = 0.78 (95% CI, 0.55-0.89) after adjusting for these multiplicity errors. Overall interobserver agreement in extracting data was good, and any discrepancies were resolved after discussion. When consensus could not be reached, another of us (T.J.S.) arbitrated. Finally, a post hoc search was done to assess if the authors of articles stated that their trial was exploratory or hypothesis generating in any section of the article.

General RCT Characteristics

The following information was extracted from the RCTs: (1) the number of randomized participants; (2) region of the world where the trial was conducted (North America, Western Europe, multiregional, or rest of the world [multiregional was defined as any trial that had multiple sites across the world, and rest of the world was defined as any trial having sites in a region that was not located in either North America or Western Europe]); (3) intervention type (drugs, procedures [eg, a different approach or method of implementing treatment], medical devices, surgery, testing or imaging, or other [eg, diet]); and (4) funding source. Trial size was extracted (as a proxy for trial phase considering its inconsistent definition) and categorized as small (≤500 participants per group) or large (>500 participants per group). Trial type (prespecification of a primary end point) was categorized and extracted as either a mortality trial (defined as any trial where the primary outcome was mortality during treatment) or a nonmortality trial.

Outcome Data

Data were also extracted for whether the article had risk of multiplicity (ie, contained multiple analyses, a term that encompasses any of the following: multiple treatment groups, multiple outcome variables, and multiple analyses of the same outcome variable) and whether the authors defined the methods used for multiplicity correction. An RCT was considered to have multiple treatment groups if it had more than 2 arms, multiple outcome variables were defined as having more than 1 primary outcome, and multiple analyses were defined as analysis of the same outcome variable in multiple ways. All 3 of these scenarios were weighted equally. We considered multiplicity adjustment sufficient when an article outlined that it attempted to adjust for multiple comparisons.

Statistical Analysis

Descriptive statistics were used to assess the proportion of RCTs with (1) multiple primary analyses and (2) a multiplicity adjustment for the analysis of the primary end point. Also recorded were the class of multiplicity in the primary analysis (multiple treatment groups, multiple outcome variables, or multiple analyses of the same outcome variable) and frequencies of each of the methods used to adjust for multiplicity in the primary analysis. Multiplicity was examined only for the analysis of the primary end point. It was deemed unnecessary for secondary analyses, which are generally exploratory or hypothesis generating. Outcomes of interest were percentages of primary analyses that performed multiplicity adjustment of primary end points. Two-sided χ2 tests were used to examine the association between (1) intervention type, (2) funding source, (3) trial size, and (4) trial type; it was noted whether risk for a familywise error because of multiple comparisons was present and whether the RCT adjusted for multiple comparisons. The method described by Holm8 adjusts for multiple comparisons between type of intervention type, funding source, trial size, and trial type. According to this method, the smallest P value from all planned comparisons is compared with a significance level of .05 divided by K, where K represents the number of comparisons to be made. If the null hypothesis is rejected, the next smallest P value is compared with a significance level of P = .05 divided by K minus 1, and so on until the null hypothesis can no longer be rejected. In this scenario, a total of 8 comparisons were made; therefore, the significance level was set to α = .006 (according to .05 ÷ by K − 1, where K = 8 in this case) in the initial step and to α = .05 in the last step (α being the Holm-corrected significance level). A statistical software package (SPSS, version 23; IBM) was used for all analyses.

Results

Literature Search

The initial search identified 2166 trials, which were transferred to a reference management software program (EndNote, Clarivate Analytics). The titles and abstracts of the identified studies were then screened to exclude irrelevant studies. Full-text studies were subsequently obtained and evaluated for the remaining 1273 reports. After assessing for relevance, 511 articles were included in the final analysis.

Study Characteristics

Of 511 cardiovascular RCTs included in this analysis, 123 (24.1%) were published in Journal of the American College of Cardiology, 112 (21.9%) in Circulation, 107 (20.9%) in European Heart Journal, 71 (13.9%) in The New England Journal of Medicine, 55 (10.8%) in The Lancet, and 43 (8.4%) in JAMA (Table 1). Approximately half (248 [48.5%]) of the trials were industry funded, and approximately half (243 [47.6%]) were large trials. Approximately half (251 [49.1%]) of the trials used a drug intervention. A total of 229 trials (44.8%) made use of composite outcomes as their primary outcome variable.

Table 1. Characteristics of 511 Included Randomized Clinical Trials.

Variable No. (%)
2015 (n = 106) 2016 (n = 162) 2017 (n = 159) 2018 (n = 84) Total (N = 511)
Total participants, No. 377 591 596 297 687 921 371 819 NA
Participants per trial, No. 3562 3681 4327 4426 NA
Journals
Cardiovascular 74/342 (21.6) 105/342 (30.7) 105/342 (30.7) 58/342 (17.0) 342/511 (66.9)
General medicine 32/169 (18.9) 57/169 (33.7) 54/169 (32.0) 26/169 (15.4) 169/511 (33.1)
Region
North America 9 (8.5) 26 (16.0) 31 (19.5) 14 (16.7) 80 (15.7)
Western Europe 25 (23.6) 39 (24.1) 34 (21.4) 12 (14.3) 110 (21.5)
Multiregional 35 (33.0) 61 (37.7) 57 (35.8) 29 (34.5) 182 (35.6)
Rest of the world 15 (14.2) 11 (6.8) 11 (6.9) 18 (21.4) 55 (10.8)
Not mentioned 22 (20.8) 25 (15.4) 26 (16.4) 11 (13.1) 84 (16.4)
Intervention type
Drugs 60 (56.6) 79 (48.8) 73 (45.9) 39 (46.4) 251 (49.1)
Procedures 10 (9.4) 23 (14.2) 23 (14.5) 18 (21.4) 74 (14.5)
Medical devices 7 (6.6) 8 (4.9) 8 (5.0) 5 (6.0) 28 (5.5)
Surgery 5 (4.7) 11 (6.8) 7 (4.4) 2 (2.4) 25 (4.9)
Testing or imaging 0 4 (2.5) 7 (4.4) 4 (4.8) 15 (2.9)
Other 24 (22.6) 37 (22.8) 41 (25.8) 16 (19.0) 118 (23.1)
Funding source
No source 0 1 (0.6) 0 1 (1.2) 2 (0.4)
Government funding 27 (25.5) 52 (32.1) 45 (28.3) 24 (28.6) 148 (29.0)
University or organization 17 (16.0) 25 (15.4) 30 (18.9) 23 (27.4) 95 (18.6)
Industry 58 (54.7) 76 (46.9) 80 (50.3) 34 (40.5) 248 (48.5)
Not mentioned 2 (1.9) 6 (3.7) 1 (0.6) 1 (1.2) 10 (2.0)
Other 2 (1.9) 2 (1.2) 3 (1.9) 1 (1.2) 8 (1.6)
Trial size
≤500 Participants per group 59 (55.7) 86 (53.1) 80 (50.3) 43 (51.2) 268 (52.4)
>500 Participants per group 47 (44.3) 76 (46.9) 79 (49.7) 41 (48.8) 243 (47.6)

Abbreviation: NA, not applicable.

Risk of Multiplicity

Of 511 cardiovascular RCTs included in this analysis, 300 (58.7%) had some form of multiplicity (282 of 511 [55.2%] did not mention whether they did or did not adjust for multiplicity). Of these 300 trials, 81 (27.0%) had multiple treatment groups, 45 (15.0%) identified multiple outcome variables as primary, 170 (56.7%) had multiple analyses of the same outcome variable, 3 (1.0%) had multiple treatment groups and multiple outcome variables, and 1 (0.3%) had multiple treatment groups and multiple analyses (Table 2).

Table 2. Multiplicity Adjustment.

Variable Frequency, No. (%)
Primary analysis contained multiple analyses, of those that identified a primary analysis (n = 511) 300 (58.7)
Types of multiple analyses included, of those with multiple primary analyses (n = 300)
Multiple treatment groups 81 (27.0)
Multiple outcome variables 45 (15.0)
Multiple analyses of the same outcome variable 170 (56.7)
Multiple treatment groups and multiple outcome variables 3 (1.0)
Multiple treatment groups and multiple analyses 1 (0.3)
Adjusted for all multiple comparisons, of those with multiple primary analyses (n = 300) 85 (28.3)

Multiplicity Adjustment

Among 300 RCTs, only 85 (28.3%) adjusted for multiplicity for all primary analyses (Table 2). Of 511 trials, 289 (56.6%) did not mention whether they did or did not attempt to adjust for multiple comparisons. Forty-one trials (48.2%) had multiple analyses of the same outcome variable that adjusted for multiplicity, 22 (25.9%) had multiple treatment groups that adjusted for multiplicity, and 19 (22.4%) had multiple outcome variables that adjusted for multiplicity. The individual multiplicity correction tests are also listed in Table 3.

Table 3. Methods Used to Adjust for Multiplicity.

Variable Articles that adjusted for multiplicity, No. (%) P value
For all primary analyses (N = 85) With multiple treatment groups (n = 22) With multiple outcome variables (n = 19) With multiple analyses of the same outcome variable (n = 41) With multiple treatment groups and multiple outcome variables (n = 3)
≥2 Coprimary outcome variables with statistically significant treatment associations 1 (1.2) 0 1 (5.3) 0 0 <.001
Bonferroni adjustment 15 (17.6) 4 (18.2) 2 (10.5) 8 (19.5) 1 (33.3) <.001
Hochberg test 5 (5.9) 2 (9.1) 1 (5.3) 2 (4.9) 0 <.001
Dunn test 3 (3.5) 2 (9.1) 0 1 (2.4) 0 <.001
Gatekeeping or hierarchical testing 19 (22.4) 5 (22.7) 2 (10.5) 11 (26.8) 1 (33.3) <.001
Holm test 3 (3.5) 1 (4.5) 1 (5.3) 1 (2.4) 0 .001
Adjusted P value to account for multiplicity 26 (30.6) 2 (9.1) 11 (57.9) 13 (31.7) 0 <.001
Coprimary outcome variables and gatekeeping 3 (3.5) 3 (13.6) 0 0 0 <.001
Fixed sequence test 2 (2.4) 0 1 (5.3) 0 1 (33.3) <.001
Hommel test 2 (2.4) 0 0 2 (4.9) 0 <.001
Dunnett test 3 (3.5) 3 (13.6) 0 0 0 <.001
Tukey test 3 (3.5) 0 0 3 (7.3) 0 <.001

Limitation Specified of Performing Multiplicity Adjustment

Of 300 trials with multiplicity error risk, 19 (6.3%) were exploratory or hypothesis generating. Twelve of these trials mentioned this exploratory nature in the Discussion section of the article, 5 mentioned it in the Methods section, and 2 mentioned it in more than 1 section of the article. Of the 85 trials that adjusted for multiplicity, 68 (80.0%) mentioned that they adjusted for multiplicity in the main text of the article, and 17 (20.0%) only mentioned it in the supplement or trial protocol.

Determinants of Performing Multiplicity

Intervention type and funding source had no statistically significant association with the reporting of multiplicity risk adjustment (Table 4). Trials that assessed mortality vs nonmortality outcomes were more likely to contain a multiplicity risk in their primary analysis (66.3% [177 of 267] vs 50.4% [123 of 244]; P < .001). Although larger trials had no association with specifying an analysis of the primary end point or containing a multiplicity error risk within their analysis, they were less likely than smaller trials to make any adjustments to correct for multiplicity issues (35.6% [52 of 146] vs 21.4% [33 of 154]; P = .001). All of these results were statistically significant after application of the Holm test.

Table 4. Comparisons of Reporting for Intervention Type, Funding Source, Trial Size, and Trial Type.

Variable Primary analysis has multiple analyses, No./total No. (%) Adjusted for multiplicity, No./total No. (%)
Intervention type
Drugs 157/251 (62.5) 48/157 (30.6)
Procedures 42/74 (56.8) 12/42 (28.6)
Medical devices 17/28 (60.7) 4/17 (23.5)
Surgery 17/25 (68.0) 2/17 (11.8)
Testing or imaging 8/15 (53.3) 1/8 (12.5)
Other 59/118 (50.0) 18/59 (30.5)
P valuea .27 .22
Funding source
None 0/2 0/0
Government 88/148 (59.5) 21/88 (23.9)
University or organization 54/95 (56.8) 16/54 (29.6)
Industry 150/248 (60.5) 46/150 (30.7)
Not mentioned 4/10 (40.0) 1/4 (25.0)
Other 4/8 (50.0) 1/4 (25.0)
P valuea .41 .74
Trial size
≤500 Participants per group 146/268 (54.5) 52/146 (35.6)
>500 Participants per group 154/243 (63.4) 33/154 (21.4)
P valuea .04 .001
Trial type
Mortality 177/267 (66.3) 47/177 (26.6)
Nonmortality 123/244 (50.4) 38/123 (30.9)
P valuea <.001 .09
a

P values from χ2 tests. For 8 comparisons, the ordered P values are compared with the Holm-corrected significance levels set at the following values until a comparison fails to reach significance: .0062, .007, .008, .01, .0125, .016, .025, and .05.

Discussion

Our report demonstrates that 58.7% of 511 cardiovascular RCTs included in this analysis contained multiple analyses within their methods and that 55.2% of the total RCTs did not report whether they adjusted for multiple comparisons. Trials that assessed mortality were more likely than nonmortality trials to have some form of multiplicity, which is not unexpected because mortality is usually not the sole end point. However, because of the exigent nature of the mortality component and because some researchers consider mortality a safety end point as well, authors might be inclined to claim an association even if the overall end point fails to show effectiveness. These results have important implications for the performance and interpretation of cardiovascular RCTs.

Articles mentioning that they did not adjust for multiplicity often provided some justification, such as stating that their study was exploratory or hypothesis generating. Some justifications were unique to the trial; for example, 1 article mentioned that a chance finding could not be ruled out because of multiple testing and the sample size of the subgroups.19 It is possible that results might change before and after multiplicity adjustment; for example, in a trial where P = .046 for the primary outcome, the adjusted P value may have been different after multiplicity correction.20

Among 85 of 511 included articles that adjusted for multiplicity for all primary analyses (Table 3), half of the trials used a composite outcome as their primary outcome. Composite outcomes allow increased statistical precision and efficiency with fewer participants to detect a statistically significant difference among comparators,21 especially in the case of total mortality, which is a rare event requiring more power and an extended follow-up to show a difference between interventions.22 Although the use of composite end points is acceptable and in some instances beneficial, it may also increase the risk of introducing a multiplicity error if the observed treatment effect was associated with a softer clinical end point.21,23 For example, a trial where all-cause mortality, myocardial infarction, and recurrent angina are components of a composite end point, recurrent angina might be considered the softest of the 3 components. The present study did not examine the details of each composite end point of every trial or consider composite end points to be a source of multiplicity; however it suggests the need for future investigators and researchers to apply methods to avoid the possibility of a multiplicity error, as described by Sankoh et al.21

Uncertainty in interpreting research results is common and may be attributable to a lack of statistical power or the use of questionable research practices, or it may reflect decisions a researcher makes to conduct a trial.24 These uncertainties might explain gaps in the reporting of multiplicity and adjustments made. Conversely, one could also argue that such gaps are less a reflection of multiplicity issues but rather reflect the unavailability of the trial protocol and statistical analysis plan. We suggest that all RCTs in medical journals should describe the trial protocol–specific analytic plan, including the methods used to adjust for multiple comparisons or acknowledgment of the lack of correction for multiplicity. We believe that this inclusion is especially relevant because most clinicians do not have statistical expertise.

Because the criteria used to classify trial phase (ie, phase 1, 2, or 3) were inconsistent among the RCTs in this study, trial size was used as a proxy for trial phase. For drug interventions, smaller trials (≤500 participants per group) may more likely reflect early to middle stages of development, and larger trials (>500 participants per group) may more likely reflect confirmatory stages.25,26 Among 161 RCTs, Gewandter et al27 found no association between trial size and funding source, with multiplicity adjustment most likely because of the limited power of a study to perform such an analysis. Our analysis included a larger sample of both industry-sponsored RCTs (n = 248) and large RCTs with multiplicity issues (n = 154) (Table 4). We found that smaller trials were more likely to be adjusted for multiplicity. Funding source had no association with adjusting for multiplicity. This observation suggests that RCTs of drugs in early to middle stages of development may be more likely to adjust for multiplicity.

The appropriateness of testing procedures is guided by information on statistical features of a study design or analytic strategy and differs depending on whether there is a single source of multiplicity or several sources and whether there are multiple treatment groups, multiple outcome variables, or multiple analyses of the same outcome variable. Dmitrienko and D’Agostino23 provide some guidance on how to choose the most appropriate test for multiplicity corrections; they state that nonparametric tests, such as the Holm test, can be applied to most multiplicity problems involving a single source of multiplicity. In cases where the association between statistical tests is known, such as in clinical trials with several dose-control comparisons and patient populations, more specific parametric tests, such as the Dunnett test, may be applied. In an effort to better explain types of multiple analyses and multiple outcome variables, detailed examples are listed in eTable 1 in the Supplement.

Limitations

This study has several limitations. First, we assessed the reporting quality of the methods used for multiplicity adjustments but not necessarily the quality of statistical practices used. Because the methods may have been prespecified but not stated in the articles,28,29 this report is subject to reporting bias. Second, studies were recorded as having adjusted for multiplicity in the primary analysis only if the authors adjusted for all instances of multiplicity. However, this approach does not consider the trials that tried to adjust for some but not all sources of multiple comparisons. Also, whether a study had multiple treatment groups, multiple outcome variables, or multiple analyses of the same outcome variable, all had the same weight in terms of adjusting for multiplicity and thus were considered equally. A third limitation is that we only evaluated the primary outcomes. It is important to note that the secondary end point in a sequence may often influence a conclusion. For instance, if a trial finds a statistically significant difference in major adverse cardiovascular events (myocardial infarction, stroke, and admission for heart failure) and the next end point in the sequence is the myocardial infarction rate, which is not statistically significant, it would be incorrect to conclude a nominally statistically significant stroke association. In addition, we were unable to differentiate our analysis by trial type. Although the objective of this study was to evaluate the overall prevalence of multiplicity among cardiovascular RCTs, it must be remembered that phase 3 trials hold the most importance from a public health perspective, and multiplicity is of lesser concern in phase 2 trials.

Conclusions

This cross-sectional study found frequent inconsistencies associated with multiplicity in primary analysis reporting among cardiovascular RCTs published in medical journals with a high impact factor. These findings adversely reflect on the robustness of data published in journals that carry global reach and generate evidence that can transform clinical guidelines and practice. Our findings suggest that investigators should be encouraged to adjust for multiplicity when warranted. Practical guidelines for multiplicity adjustment in clinical trials (eg, recommendations by Proschan and Waclawiw30) can be consulted. We think that this information should ideally be prespecified in the Methods section of clinical trials before unblinding of the study data (eTable 2 in the Supplement). We believe that it should be the collective responsibility of journal editors, peer reviewers, and readers to pay close attention to the Methods and Statistical Analysis sections of articles reporting clinical trial results to ensure that multiplicity issues have been addressed.

Supplement.

eAppendix. Multiplicity Coding Manual

eTable 1. Examples of Multiple Analyses and Multiple Outcome Variables

eTable 2. Recommendations on Multiplicity Error in Clinical Trials

eReferences.

References

  • 1.Gandhi R, Jan M, Smith HN, Mahomed NN, Bhandari M. Comparison of published orthopaedic trauma trials following registration in ClinicalTrials.gov. BMC Musculoskelet Disord. 2011;12(1):. doi: 10.1186/1471-2474-12-278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Huić M, Marušić M, Marušić A. Completeness and changes in registered data and reporting bias of randomized controlled trials in ICMJE journals after trial registration policy. PLoS One. 2011;6(9):e25258. doi: 10.1371/journal.pone.0025258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA. 2009;302(9):977-. doi: 10.1001/jama.2009.1242 [DOI] [PubMed] [Google Scholar]
  • 4.Milette K, Roseman M, Thombs BD. Transparency of outcome reporting and trial registration of randomized controlled trials in top psychosomatic and behavioral health journals: a systematic review. J Psychosom Res. 2011;70(3):205-217. doi: 10.1016/j.jpsychores.2010.09.015 [DOI] [PubMed] [Google Scholar]
  • 5.Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility mean? Sci Transl Med. 2016;8(341):341ps12. doi: 10.1126/scitranslmed.aaf5027 [DOI] [PubMed] [Google Scholar]
  • 6.Li G, Taljaard M, Van den Heuvel ER, et al. An introduction to multiplicity issues in clinical trials: the what, why, when and how. Int J Epidemiol. 2017;46(2):746-755. [DOI] [PubMed] [Google Scholar]
  • 7.Dmitrienko A, Tamhane AC. Gatekeeping procedures with clinical trial applications. Pharm Stat. 2007;6(3):171-180. doi: 10.1002/pst.291 [DOI] [PubMed] [Google Scholar]
  • 8.Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6(2):65-70. [Google Scholar]
  • 9.Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75(4):800-802. doi: 10.1093/biomet/75.4.800 [DOI] [Google Scholar]
  • 10.Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75(2):383-386. doi: 10.1093/biomet/75.2.383 [DOI] [Google Scholar]
  • 11.Tukey JW. Exploratory Data Analysis. Addison-Wesley; 1977. [Google Scholar]
  • 12.Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):1096-1121. doi: 10.1080/01621459.1955.10501294 [DOI] [Google Scholar]
  • 13.Dmitrienko A, Wiens BL, Tamhane AC, Wang X. Tree-structured gatekeeping tests in clinical trials with hierarchically ordered multiple objectives [published correction appears in Stat Med. 2008;27(17):3452]. Stat Med. 2007;26(12):2465-2478. doi: 10.1002/sim.2716 [DOI] [PubMed] [Google Scholar]
  • 14.Westfall PH, Krishen A. Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J Stat Plan Inference. 2001;99(1):25-40. doi: 10.1016/S0378-3758(01)00077-5 [DOI] [Google Scholar]
  • 15.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative . The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med. 2007;147(8):573-577. doi: 10.7326/0003-4819-147-8-200710160-00010 [DOI] [PubMed] [Google Scholar]
  • 16.Rao G, Lopez-Jimenez F, Boyd J, et al. ; American Heart Association Council on Lifestyle and Cardiometabolic Health; Council on Cardiovascular and Stroke Nursing; Council on Cardiovascular Surgery and Anesthesia; Council on Clinical Cardiology; Council on Functional Genomics and Translational Biology; and Stroke Council . Methodological standards for meta-analyses and qualitative systematic reviews of cardiac prevention and treatment studies: a scientific statement from the American Heart Association. Circulation. 2017;136(10):e172-e194. doi: 10.1161/CIR.0000000000000523 [DOI] [PubMed] [Google Scholar]
  • 17.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
  • 18.Statistics Solutions Kappa Calculator. Accessed August 16, 2019. https://www.statisticssolutions.com/kappa-calculator/
  • 19.Steffel J, Giugliano RP, Braunwald E, et al. Edoxaban versus warfarin in atrial fibrillation patients at risk of falling: ENGAGE AF–TIMI 48 analysis [published correction appears in J Am Coll Cardiol. 2017;70(4):512-513]. J Am Coll Cardiol. 2016;68(11):1169-1178. doi: 10.1016/j.jacc.2016.06.034 [DOI] [PubMed] [Google Scholar]
  • 20.Bobrow K, Farmer AJ, Springer D, et al. Mobile phone text messages to support treatment adherence in adults with high blood pressure (SMS-Text Adherence Support [StAR]): a single-blind, randomized trial. Circulation. 2016;133(6):592-600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sankoh AJ, Li H, D’Agostino RB Sr. Use of composite endpoints in clinical trials. Stat Med. 2014;33(27):4709-4714. doi: 10.1002/sim.6205 [DOI] [PubMed] [Google Scholar]
  • 22.Khan SU, Rahman H, Talluri S, Kaluski E. The clinical benefits and mortality reduction associated with catheter ablation in subjects with atrial fibrillation: a systematic review and meta-analysis. JACC Clin Electrophysiol. 2018;4(5):626-635. doi: 10.1016/j.jacep.2018.03.003 [DOI] [PubMed] [Google Scholar]
  • 23.Dmitrienko A, D’Agostino RB Sr. Multiplicity considerations in clinical trials. N Engl J Med. 2018;378(22):2115-2122. doi: 10.1056/NEJMra1709701 [DOI] [PubMed] [Google Scholar]
  • 24.Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv Methods Pract Psychol Sci. 2018;1(3):337-356. doi: 10.1177/2515245917747646 [DOI] [Google Scholar]
  • 25.Moore RA. What works for whom? determining the efficacy and harm of treatments for pain. Pain. 2013;154(suppl 1):S77-S86. doi: 10.1016/j.pain.2013.03.024 [DOI] [PubMed] [Google Scholar]
  • 26.Moore A, Eccleston C, Derry S, et al. ; ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief; Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors . “Evidence” in chronic pain: establishing best practice in the reporting of systematic reviews. Pain. 2010;150(3):386-389. doi: 10.1016/j.pain.2010.05.011 [DOI] [PubMed] [Google Scholar]
  • 27.Gewandter JS, Smith SM, McKeown A, et al. Reporting of primary analyses and multiplicity adjustment in recent analgesic clinical trials: ACTTION systematic review and recommendations. Pain. 2014;155(3):461-466. doi: 10.1016/j.pain.2013.11.009 [DOI] [PubMed] [Google Scholar]
  • 28.Hill CL, LaValley MP, Felson DT. Discrepancy between published report and actual conduct of randomized clinical trials. J Clin Epidemiol. 2002;55(8):783-786. doi: 10.1016/S0895-4356(02)00440-7 [DOI] [PubMed] [Google Scholar]
  • 29.Soares HP, Daniels S, Kumar A, et al. ; Radiation Therapy Oncology Group . Bad reporting does not mean bad methods for randomised trials: observational study of randomised controlled trials performed by the Radiation Therapy Oncology Group. BMJ. 2004;328(7430):22-24. doi: 10.1136/bmj.328.7430.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Proschan MA, Waclawiw MA. Practical guidelines for multiplicity adjustment in clinical trials. Control Clin Trials. 2000;21(6):527-539. doi: 10.1016/S0197-2456(00)00106-9 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

eAppendix. Multiplicity Coding Manual

eTable 1. Examples of Multiple Analyses and Multiple Outcome Variables

eTable 2. Recommendations on Multiplicity Error in Clinical Trials

eReferences.


Articles from JAMA Network Open are provided here courtesy of American Medical Association

RESOURCES