Abstract
The field of obstetrics and gynecology is constantly replenished with the newest research findings. In an era of rapidly available study publications, there are a number of challenges to interpreting the obstetrics and gynecology literature. Common pitfalls include the over reliance on the dichotomized p-value, lack of transparency, bias in study reporting, limitations of resources, absence of standardized practices and outcomes in study design, and the rare concerns for data integrity. We review these predominant challenges and their potential solutions, in interpreting the OBGYN literature.
Keywords: p-value threshold, scientific rigor, transparency, research bias, statistical power
Introduction
Every month obstetrics and gynecology (OBGYN) journals publish a new round of studies expanding our knowledge and best practices for patient care. Further, these publications direct our field to the next questions to be answered and guide funding. The rapid growth in OBGYN biomedical research is overdue and necessary. But with the brisk rate of new research comes the difficult tasks of maintaining scientific rigor and transparency in reporting. The general consumers of OBGYN literature have a broad range of qualifications, from multi-degreed clinician-scientists to healthcare students to the public. In consuming OBGYN literature, even the “best” science can be difficult to interpret if not presented with clarity, and less rigorous science may be presented as more practice changing than is warranted. There are a number of challenges to interpreting the OBGYN literature. Here we will outline some of the predominant pitfalls, and their potential solutions, in interpreting the OBGYN literature.
The p-value problem
In theory, the p-value is simple, straightforward and easy to use. The value, ranging from 0 to 1, is used as a statistical measure of whether to reject the null hypothesis. The null hypothesis is that there is no difference between the two comparison groups. The p value represents the probability that the observed difference would occur due to chance if the null hypothesis is true. Therefore, a small p-value supports rejecting the null because it suggests that an observed difference is not likely to be attributable to chance.1 A p value of 0.05 is the conventional threshold for significance, whereby the risk of mistakenly rejecting a true null hypothesis (the type 1 (α) error, or “false positive”), is set to 5%. Said another way, a 5% chance of falsely interpreting an association to be true has been considered acceptable. In the 1930s, Fisher supported the use of “significance” or “non-significance” using the 0.05 threshold.2,3 While Fisher, and others, subsequently retracted the support of this dichotomization as an oversimplification, the damage has been done.4,5 In modern biomedical research, including obstetrics and gynecology, the dichotomous “significant” or “non-significant” mindset persists.
Such a mindset is problematic for interpretation of the OBGYN literature. It results in an approach to studies whereby a statistically significant result (p<0.05) results in the interpretation that the findings are “true.” This approach is flawed for a number of reasons. First, results with a statistically significant p-value may still be false. This is especially true if a study is not rigorously designed or executed (e.g., trial inappropriately powered for the selected outcome, multiplicity is not addressed, etc.).6 Second, identifying a statistically significant result (p<0.05) is not equivalent to a clinically significant effect size or reproducibility. Third, using this traditional approach, p-values of 0.04 and 0.06 result in disparate interpretations of trial results even though the clinical significance of such results may be similar.6,7 In a well-designed trial with clinically relevant findings and for which the cost of type I error is minimal, it may be important not to disregard evidence with a p-value in the range of 0.05 to 0.1. After all, the p value is simply a probability and should be interpreted as such in context with other factors. We make decisions with a low probability of certainty frequently in clinical practice, often guided by cost, efficiency, and risk/benefits of one choice over another. Results with a non-significant p-value should not always result in elimination of consideration from the clinical context just as we weigh these other factors. The p-value dichotomization, therefore, is an easy albeit flawed approach for our consumption of study results.7
Consideration of a lower p-value threshold
The argument has been made for a reduction in the p-value threshold to 0.005 for statistical significance.8,9 Such a move has been proposed as a stop-gap solution, whereby a reduction in the statistical significance threshold may focus advancement of trial results to clinical practice changes or funding of larger trials to those with the highest likelihood of being reproducible. Such a change may also diminish the undue influence of underpowered trials, multiplicity, and p-hacking.1,8
Wayant et al carried out an evaluation of the interpretation of phase 3 randomized controlled trials (RCTs) published in three journals (The Journal of the American Medical Association, Lancet, New England Journal of Medicine) in 2017 if the p-value threshold for significance were changed from 0.05 to 0.005.10 Of 174 endpoints with a p-value <0.05 from 203 trials, 70.7% (123/174) would maintain statistical significance (p-value <0.005), while 29.3% (51/174) would no longer be considered significant (p-value 0.005 to 0.05).10 In obstetrics and gynecology, we carried out a similar evaluation of 202 RCTs in six journals (three OBGYN and three non-OBGYN) and found that only 54.4% (49/90) of trials with statistically significant results (p<0.05) maintained significance with re-interpretation (p<0.005).11 When limited to the same three high impact non-OBGYN journals studied by Wayant et al, 66.7% of trials maintained significance (p<0.005).10,11 These studies suggest that lowering the p-value threshold would require reinterpretation of the majority of published trials in obstetrics and gynecology. While this may be a temporizing measure, a p-value threshold reduction does not eliminate the underlying problem with dichotomization of result interpretation, nor address underlying study quality.6,12
Elimination of the p-value
An alternative approach to minimize over-reliance on dichotomized interpretation is the elimination of p value reporting altogether. Advocates argue that the removal of the stand-alone p-value would eliminate concerns surrounding arbitrary dichotomization of results, and encourage a more holistic consideration of study results.7,13 Measurements of effect size have been put forth as a more robust replacement. Presentation of odds ratios or relative risk, mean difference, and absolute differences should be encouraged. The presentation of the confidence interval allows for understanding of the precision around point estimates, which provides more meaning to result interpretation.7,13,14 Many journals have moved to explicitly require presentation of results using effect sizes with either secondary presentation of a p-value, or complete elimination of the presentation of p-values.15–17
The use of effect size-based interpretation requires a more sophisticated understanding of the statistical tests used and the results presented. Critical appraisal of the literature is a skill set that may require additional training. This has previously been explored amongst OBGYN medical students and residents.18,19 Without better training of clinicians, less easily interpretable results will mean less direct clinician engagement with research and more reliance on professional societies for interpretation and production of clinical guidelines.
Lack of transparency
The debate surrounding the p-value is adjacent to a larger conversation on transparency in biomedical research. In order for readers to critically appraise study findings, the methodology and results must be clearly presented without bias.
The value of reporting guidelines & pre-trial registration
In an effort to systematize methodology reporting, the Consolidated Standard of Reporting Trials (CONSORT) guidelines were introduced in 1996 and updated in 2001 and 2010. 20 The 2010 CONSORT guidelines state: “Critical appraisal of the quality of clinical trials is possible only if the design, conduct, and analysis of RCTs are thoroughly and accurately described in the report.”20 While the CONSORT guidelines do not produce methodologic quality themselves, they are meant to ensure transparency and clarity for those consuming trial results. The International Committee of Medical Journal Editors (ICMJE), as well as CONSORT guidelines, require trial pre-registration.21 One such example is the United States based National Institutes of Health (NIH) clinicaltrials.gov registry. Requiring investigators to make study protocols available prior to trial execution is another way to produce accountability in biomedical research as they “encourage full transparency with respect to performance and reporting of clinical trials.”22
These guidelines are not unique to randomized controlled trials, as similar iterations exist for observational studies, meta-analyses, and others.23,24 Slight adaptations have also been suggested for reporting guidelines to improve specificity to obstetrics. The proposed CONSORT-OB (OBstetrics) version suggests the addition to the guidelines of specifics relevant to pregnancy (e.g., gestational age, plurality, clearly defined obstetric variables and outcomes).25
Inconsistent use of reporting guidelines poses a significant threat to adequate interpretation of biomedical research. Moher et al compared quality of reporting from RCTs in three high-impact journals pre- and post-CONSORT implementation, finding an increase in the number of checklist items reported after implementation.14 However, various studies of adherence to CONSORT guidelines have found suboptimal reporting in trial publications.26–28 Adams et al evaluated adherence to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist for observational studies in OBGYN.29 In cross-sectional analysis of 198 observational study manuscripts submitted to Obstetrics & Gynecology between 2008 and 2016, 82% were adherent (defined as completing 75% or more checklist items) to STROBE guidelines, with higher adherence among manuscripts accepted for publication.29 Our group performed an assessment of CONSORT guideline compliance, and agreement between protocols in pre-trial registries and in final publications, in a cross sectional analysis of 170 OBGYN RCTs published in six journals between 2017 to 2019. We found that 80% of trials were compliant (defined as 80% or more checklist items completed) with CONSORT.30 Further, while 98% of trials completed pre-trial registration, there were discrepancies between pre-trial registry protocols and the final publications in the pre-specified primary outcomes (23% discordance), secondary outcomes (68% discordance), and planned sample sizes (40% discordance).30
Pre-trial registry and reporting guidelines are only beneficial if used. These studies highlight that while reporting guidelines and pre-trial registry are generally requested by journals, they are not optimally followed.31 There is an opportunity for improved transparency in our reporting. This is imperative to hold researchers accountable and provide clarity to readers.
Publication bias
Not all studies performed in OBGYN go on to be published. Whether a study makes it to publication is often determined by whether statistically significant findings were reported. This publication bias has long been identified as a concern for the scientific community.32–34 If studies supporting the null hypothesis are lost to publication, and therefore also lost to reader consumption, the body of literature within a field becomes distorted, ultimately adversely affecting clinical practice. The loss of publication of “negative” or nonsignificant results impacts clinical care, affects research funding and allocation of limited research infrastructure, and skews available data for evaluation by meta-analyses.22,35–37
Pre-trial registry for RCTs has been opined to help reduce publication bias, and provide a repertoire of trials (published or unpublished) for access for meta-analyses.36 However, in analysis of OBGYN systematic reviews and meta-analyses published in six OBGYN journals, only 18.4% included clinical trial registries in the trial search process. Of these, only 23.4% incorporated unpublished data in their analyses.38 Despite clinical trial registry requirements, these findings suggest under-acknowledgement of unpublished data in meta-analyses, which may further contribute to skewed interpretation of published findings. Albeit, the inclusion of unpublished results in meta-analyses has limitations. The absence of publication may signal flawed design or other study qualities that appropriately precluded publication with inclusion of these data into meta-analysis risking introduction of bias.
A funnel plot is a useful tool to visually assess for publication bias within meta-analyses. A plot of treatment effects (horizontal axis) against study sample size (vertical axis) produces a graphical scatter of included studies.39 In the absence of publication bias or significant heterogeneity, the points produce a funnel with effect estimates wider at the bottom and narrower at the top. The funnel plot should appear symmetric with the scatter points converging around the “true” effect size. In the presence of publication bias, the funnel plot appears asymmetric, and may visually identify to the reader there are concerns. When an adequate number of studies (10 or more) are included in meta-analysis, this can be further evaluated statistically using Egger’s test for heterogeneity.40 Consideration of these assessments when interpreting a meta-analysis are valuable to understanding the risk of bias within the presented results.
Reporting bias
Reporting bias occurs when outcomes that demonstrate statistical significance are preferentially reported, secondary outcomes are emphasized or even replace non-significant primary outcomes, or focus is placed on secondary or unplanned analyses, deviating from the pre-registered plan.41,42 This has also been called “spin.” Irrelevant of the motives, such reporting is misleading to readers and can alter interpretation of results.43,44
In evaluation of 194 RCTs with nonsignificant primary outcomes published in Obstetrics & Gynecology and the American Journal of Obstetrics & Gynecology between 2006 and 2015, “spin” was present in 50% of the text of RCT manuscripts, with 40% emphasizing statistically significant secondary outcomes.44 Similar concerns about outcome reporting bias have been raised in other literature reviews in OBGYN.44,45 Although it is important to acknowledge unexpected findings as hypothesis-generating for future investigations, a cornerstone of research interpretation is that conclusions should be limited to the question that a study was designed to answer. Because studies are rarely designed to adequately address more than one question, findings from secondary outcomes and especially unplanned analyses should be considered as hypothesis-generating only, since any study with at least 20 statistical comparisons is likely to have at least one that is “statistically significant,” even when the null hypothesis is true.
Multiplicity & Data mining
Combining the concerns surrounding reporting bias and p-value thresholds is the issue of multiple comparisons, or multiplicity, and “data mining” in the interpretation of OBGYN literature. When there are more than two groups being compared with a traditional statistical approach, the problem of multiple comparisons is produced.46 This can be simplified when we think about “rolls of the dice.” If you roll two dice once, the probability of two sixes is 1/36. If you continue to roll the dice indefinitely, the probability of rolling two sixes will approach 1. Likewise, when the alpha for between-group testing is set to 0.05 and more than two comparisons (more than two rolls) are performed, the type 1 error is inflated to be greater than 5%. In such cases, results are inappropriately reported as statistically significant despite the fact that they may have occurred simply because of repeated attempts.
In biomedical research this comes up in the context of multiple end-points (e.g., co-primary outcomes), repeated measurements over time, subgroup analyses, multiple treatments or interventions (e.g., comparing more than two groups), and interim analyses.47 There are statistical procedures to adjust p values to account for multiple comparisons (e.g., Bonferroni, Holm) which should be discussed or used when multiple comparisons are carried out in an analysis.48 As the preferred approach, researchers should focus on selecting a single primary endpoint to eliminate the issue of multiplicity, and explicitly describe secondary outcomes as exploratory (hypothesis-generating) or descriptive.20 Similarly, in interim analyses, looking early at the results followed by repeat statistical testing at trial conclusion inflates the likelihood of producing a statistically significant p-value.49 Therefore, an adjustment in the final p-value for statistical significance should be addressed.
At its most extreme, the problem of multiplicity can result in multiple statistical comparisons without a predefined comparison or hypothesis, and presentation of the study as having been designed to assess the statistically significant comparisons. This poses an ethical concern with a high likelihood for presentation of false positive results. This has been termed, “data fishing, “p hacking,” and “data mining.”50 This is another form of reporting bias.
Other study considerations
Transparency in reporting is valuable for the critical appraisal of study results. However, the findings can only be as valuable as the robustness of the original study design. Irrespective of the quality of reporting, if the study rigor is limited by problems such as flawed methodology (i.e., selection bias, poor masking procedures), underpowered sample size, or biased analyses (i.e., lack of accounting for confounding variables, altering sample size), the study is of limited utility (Table 1). There is necessity for both rigorous design and execution of studies followed by their transparent reporting to improve the quality and interpretation of study results. Here we outline several additional considerations for execution of OBGYN studies and their interpretation.
Table 1.
Summary of some common forms of research bias
| Type of bias | Definition | Examples |
|---|---|---|
| Selection bias | ■ Selected study participants are not representative of the population | ■ Non-random sampling |
| Misclassification | ■ Incorrect group assignment ■ May occur for the exposure or outcome |
■ Categorization of an individual into ‘exposed’ arm when individual actually ‘unexposed’ |
| Inadequate masking | ■ Unblinding of participants, clinicians, study personnel, or outcome assessors ■ May result from non-rigorous blinding procedures |
■ Participants or clinicians become unblinded to group assignment based on study procedures |
| Recall bias | ■ Differences in accuracy of recollections among study participants | ■ Often specific to self-report in observational studies ■ Individuals with the outcome of interest may recall differently than those without the outcome |
| Loss to follow up | ■ Individuals lost to continued follow-up in a study resulting in incomplete data | ■ Produces bias when those lost to follow up differ than those continuing in the study |
| Unmeasured confounding | ■ Biased effect measures resulting from unidentified confounding | ■ Demonstrate an association between exposure and outcome incorrectly due to unmeasured confounding |
| Confirmation bias | ■ Interpreting results to confirm preexisting beliefs or hypotheses | ■ Introduction of bias (whether consciously or subconsciously) in interpretation of results |
| Reporting bias | ■ Selective presentation of results | ■ Preferentially report statistically significant findings ■ Secondary outcomes or analyses emphasized ■ Deviate from pre-registered research plan |
| Publication bias | ■ Study outcome (statistical significance) influences publication | ■ Under publication of non-significant trials |
The need for power
A power calculation informs how many participants are needed in a study to address a specific question. Such an assessment considers the acceptable risk of finding a non-significant result if a difference is really present. This is described as the type II error (β), the probability of a false negative result, and power is derived as (1- β).51 For example, a study large enough to have 80% power to identify a difference between groups (supposing that difference is really present) has a type II error (β error) of 20%. Calculation of sample size requires consideration of the type I error, statistical power, and the effect size.52 In simple terms, studies with a larger sample size have larger power, thereby reducing the likelihood of a false negative result.6,53 Interpreting results requires knowledge of their power calculations; their absence in study design and reporting is problematic.
In assessment of 519 published and PubMed-indexed RCTs in December 2000, Chan & Altman found only 27% stated a power calculation.54 A priori, or pre-specified, power calculations have been reported at higher rates in OBGYN RCTs (86.6%) more recently but are reported less often than would be expected given their importance and their requirement by CONSORT reporting guidelines.11,14
Studies with small sample sizes have reduced power and can be problematic for interpretation of clinically meaningful outcomes in OBGYN literature. In consideration of 85 published induction of labor RCTs from 2009 to 2019, two-thirds had 100 or fewer participants per group. Ayala & Rouse opine these small trials reduce our ability to understand the “best” method for induction because such small studies do not have adequate power to answer clinically meaningful questions about perinatal morbidity and mortality.55 The ARRIVE trial (A Randomized Trial of Induction versus Expectant Management) is an example of a widely hailed large, well-powered study of meaningful outcomes.56 Developing trials with sufficiently large sample sizes to answer consequential clinical questions requires massive resources (i.e., research networks, funding).
Core outcomes & standardized interventions
Ayala & Rouse also outline the different outcome measures assessed by small RCTs addressing induction of labor.55 Induction of labor is one example, but wide variation in outcome measures are present across all of OBGYN. Duffy et al proposed the development and validation of “core outcome sets” to move towards standardization of trial outcomes.45 One such example is the COSGROVE (Core Outcome Set for GROwth restriction; deVeloping Endpoints) core outcome set for the study of fetal growth restriction. A team of stakeholders used iterative consensus criteria to finalize 22 outcomes in four domains that are recommended for measurement and reporting in future trials on growth restriction.57
As large, multicenter RCTs may not be feasible due to the necessary resources, an alternative approach may be smaller RCTs using these standardized outcome measures to facilitate later meta-analysis. This concept is currently being used by the STRIDER (Sildenafil Therapy In Dismal prognosis Early-onset intrauterine growth Restriction) working group. This team is carrying out a series of smaller RCTs addressing sildenafil therapy for fetal growth restriction with collection of a core data set to facilitate future patient-level meta-analysis of these results.58 Such measures may allow for greater yield from systematic reviews and meta-analyses by including multiple small RCTs that have comparable outcome measures.
Beyond variation in outcomes are the difficulties in interpreting results of trials testing the effect of combined interventions. Perioperative management for cervical cerclage and cesarean delivery are two examples. When we interpret study results on approaches to cerclage for cervical insufficiency (beyond cerclage versus no-cerclage), wide variation exists in most study interventions for management, including surgical approach (e.g., McDonald vs Shirodkar cerclage, suture type) and adjunct interventions (e.g., antibiotics, tocolytics, amniocentesis, bed rest).59 Roman et al performed a multicenter RCT and found that use of a physical examination-indicated cerclage reduced the rate of spontaneous preterm birth for twin pregnancies with asymptomatic cervical dilation before 24 weeks when compared to no cerclage (RR 0.7, 95% CI 0.46-0.96).60 However, 82% of those in the cerclage intervention arm also received tocolytic and antibiotics peri-procedure. Because tocolytics and antibiotics were not also used in the control group, it is difficult to isolate the effect of the cerclage on the reported outcomes.
Studies of cesarean delivery pose a similar conundrum. To study a singular aspect of cesarean section (e.g., wound dressing), the remainder of the procedure should be standardized to eliminate bias. This is difficult to achieve within institutions, let alone across multiple institutions, because of variation in surgical approach.61 In a recent editorial, Dahlke et al make the case for a standardized cesarean delivery technique, with a principal benefit being the value to future trials on cesarean delivery. Such an approach would reduce bias introduced by variation in surgical technique found across surgeons and institutions.62
reVITALized data definitions
Further standardizing definitions for care in obstetrics and gynecology has been credited as a way to improve clinical care, data capture and research. As an illustrative example, consider early postpartum hemorrhage. Existing guidelines and institutions use varying definitions (e.g., > 500 or > 1000 mL loss, variation by mode of delivery, and associated signs including tachycardia) making communication about postpartum hemorrhage, let alone data collection for research, difficult. Formed by the American College of Obstetricians and Gynecologists (ACOG), the Women’s Health Registry Alliance brought together stakeholders from across OBGYN to develop initiatives to improve data interoperability. One resultant initiative, reVITALize, produced standardized obstetrics and gynecology data definitions.63,64 This initiative produced uniform definitions for clinical use, but also for inter-database and institutional comparison of variables in research. For our example, early postpartum hemorrhage is provided a definition of: “Cumulative blood loss of ≥ 1000ml or blood loss accompanied by signs or symptoms of hypovolemia within 24 hours following the birth process.”65 The use of standardized definitions, when available, should be encouraged.
Research Ethics
We have outlined a number of hazards in OBGYN research practices and their resultant effect on interpretation of study results. These discussions of opportunities for improvement assume ethical practices and good intentions of researchers. The final challenge to interpreting OBGYN literature is research fraud. As the ‘gold standard’ of biomedical research, RCTs inform clinical practice, direct funding resources, and can produce life-or-death consequences for our patients. Therefore, trial integrity is of utmost value. Liu et al recently evaluated 22 RCTs published by the same researcher between 2015 and 2019, highlighting discrepancies in results between studies using the same study population and raising concerns for data integrity.66
Based on review of thirty-six RCTs that were retracted or were the subjects of ongoing investigation for research misconduct, Li et al put forth a checklist to improve assessment of data integrity at time of trial publication.67 Adapted and added to from other sources, the checklist includes comparisons between the pre-trial registry and manuscript, as well as assessment for plausibility (e.g., study timeframe, intervention, time from study completion to manuscript, reported participants lost to follow up).67,68 The authors emphasize ongoing need for systematic advances in these processes. Given these recent examples, improved mechanisms for assessing data integrity and identifying research fraud prior to publication are necessary in OBGYN moving forward.
Conclusions
Enhancing the rigor and interpretation of results in OBGYN is an initiative that will require considerable coordination and dedication of resources. Widespread adoption of effect size reporting and abandonment of the dichotomized p-value requires researcher buy-in, adherence to reporting guidelines, enforcement by journals, and education of clinicians and readers. To improve author compliance with reporting guidelines and pre-trial registry requires accountability, but who maintains and enforces these guidelines (e.g., journal editors, peer reviewers, readers) and how this can be done in the current research environment remains unclear. To design and execute trials that are adequately powered to answer clinically meaningful questions requires significant funding, collaboration, and research infrastructure. While this is possible through efforts such as the NIH-funded Maternal Fetal Medicine Units Network, further investment is required since many such networks are needed to address the many outstanding questions remaining in the field of reproductive science.
There are no obvious or simple solutions to address the myriad difficulties in interpreting OBGYN literature. We have outlined multiple pitfalls and solutions to address deficiencies in study design, transparency, data integrity, and interpretation of data in our field, though many of these are not unique to obstetrics and gynecology. A multi-pronged and concerted effort is required to improve the quality and interpretation of studies in OBGYN.
Funding:
There was no funding for this work.
Footnotes
Disclosure: The authors report no conflict of interest.
References:
- 1.Wasserstein RL, Lazar NA. The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician 2016;70(2):129–133. [Google Scholar]
- 2.Fisher R. Statistical Methods for Research Workers. Edinburgh, UK: Oliver & Boyd, 1925. [Google Scholar]
- 3.Fisher R. The Design of Experiments. Edinburgh, UK: Oliver & Boyd, 1935. [Google Scholar]
- 4.Gigerenzer GK, Stefan; Oliver Vitouch. The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask. In: Kaplan D, ed. The Sage Handbook of Quantitative Methodology for the Social Sciences. Thousand Oaks, California: Sage; 2004:391–408. [Google Scholar]
- 5.Fisher R. Statistical Methods and Scientific Inference. Edinburgh, UK: Oliver & Boyd, 1956. [Google Scholar]
- 6.Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2(8):e124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Grimes DA, Schulz KF. An overview of clinical research: the lay of the land. Lancet 2002;359(9300):57–61. [DOI] [PubMed] [Google Scholar]
- 8.Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav 2018;2(1):6–10. [DOI] [PubMed] [Google Scholar]
- 9.Ioannidis JPA. The Proposal to Lower P Value Thresholds to .005. JAMA 2018;319(14):1429–1430. [DOI] [PubMed] [Google Scholar]
- 10.Wayant C, Scott J, Vassar M. Evaluation of Lowering the P Value Threshold for Statistical Significance From .05 to .005 in Previously Published Randomized Clinical Trials in Major Medical Journals. JAMA 2018;320(17):1813–1815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bruno A, Shea A, Einerson B, et al. Impact of the p-Value Threshold on Interpretation of Trial Outcomes in Obstetrics and Gynecology. Am J Perinatol 2021;38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chavalarias D, Wallach JD, Li AH, Ioannidis JP. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. JAMA 2016;315(11):1141–8. [DOI] [PubMed] [Google Scholar]
- 13.Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 2016;31(4):337–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Moher D, Jones A, Lepage L. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285(15):1992–5. [DOI] [PubMed] [Google Scholar]
- 15.Lancet. Preparing your manuscript. (https://www.thelancet.com/preparing-your-manuscript). Accessed October 13, 2021.
- 16.New England Journal of Medicine (NEJM). New Manuscripts. (https://www.nejm.org/author-center/new-manuscripts). Accessed October 13, 2021.
- 17.Instructions for Authors—July 2021. Obstet Gynecol 2021;138(1):138–155. [Google Scholar]
- 18.Grimes DA, Bachicha JA, Learman LA. Teaching critical appraisal to medical students in obstetrics and gynecology. Obstet Gynecol 1998;92(5):877–82. [DOI] [PubMed] [Google Scholar]
- 19.Bougie O, Posner G, Black AY. Critical Appraisal Skills Among Canadian Obstetrics and Gynaecology Residents: How Do They Fare? J Obstet Gynaecol Can 2015;37(7):639–647. [DOI] [PubMed] [Google Scholar]
- 20.Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zarin DA, Keselman A. Registering a clinical trial in ClinicalTrials.gov. Chest 2007;131(3):909–912. [DOI] [PubMed] [Google Scholar]
- 22.De Angelis C, Drazen JM, Frizelle FA, et al. Clinical trial registration: a statement from the International Committee of Medical Journal Editors. N Engl J Med 2004;351(12):1250–1. [DOI] [PubMed] [Google Scholar]
- 23.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007;370(9596):1453–7. [DOI] [PubMed] [Google Scholar]
- 24.Moher D, Shamseer L, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev 2015;4(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chauhan SP, Blackwell SC, Saade GR. A suggested approach for implementing CONSORT guidelines specific to obstetric research. Obstet Gynecol 2013;122(5):952–6. [DOI] [PubMed] [Google Scholar]
- 26.Glujovsky D, Boggino C, Riestra B, Coscia A, Sueldo CE, Ciapponi A. Quality of reporting in infertility journals. Fertil Steril 2015;103(1):236–41. [DOI] [PubMed] [Google Scholar]
- 27.Mills E, Loke YK, Wu P, et al. Determining the reporting quality of RCTs in clinical pharmacology. Br J Clin Pharmacol 2004;58(1):61–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Uetani K, Nakayama T, Ikai H, Yonemoto N, Moher D. Quality of reports on randomized controlled trials conducted in Japan: evaluation of adherence to the CONSORT statement. Intern Med 2009;48(5):307–13. [DOI] [PubMed] [Google Scholar]
- 29.Adams AD, Benner RS, Riggs TW, Chescheir NC. Use of the STROBE Checklist to Evaluate the Reporting Quality of Observational Research in Obstetrics. Obstet Gynecol 2018;132(2):507–512. [DOI] [PubMed] [Google Scholar]
- 30.Bruno A, Olmsted M, Martin M, et al. Rigor, reproducibility and transparency of randomized controlled trials in obstetrics and gynecology. American Journal of Obstetrics & Gynecology MFM 2021;3:100450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Grimes DA. The CONSORT 2010 guidelines: sound advice, spotty compliance. Obstet Gynecol 2010;115(5):892–3. [DOI] [PubMed] [Google Scholar]
- 32.Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines: 5. Rating the quality of evidence--publication bias. J Clin Epidemiol 2011;64(12):1277–82. [DOI] [PubMed] [Google Scholar]
- 33.DeVito NJ, Goldacre B. Catalogue of bias: publication bias. BMJ Evid Based Med 2019;24(2):53–54. [DOI] [PubMed] [Google Scholar]
- 34.Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet 1991;337(8746):867–72. [DOI] [PubMed] [Google Scholar]
- 35.Korn D, Ehringhaus S. Principles for strengthening the integrity of clinical research. PLoS Clin Trials 2006;1(1):e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Abaid LN, Grimes DA, Schulz KF. Reducing publication bias through trial registration. Obstet Gynecol 2007;109(6):1434–7. [DOI] [PubMed] [Google Scholar]
- 37.Steinberg JR, Weeks BT, Reyes GA, et al. The obstetrical research landscape: a cross-sectional analysis of clinical trials from 2007-2020. Am J Obstet Gynecol MFM 2021;3(1):100253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bibens ME, Chong AB, Vassar M. Utilization of Clinical Trials Registries in Obstetrics and Gynecology Systematic Reviews. Obstet Gynecol 2016;127(2):248–253. [DOI] [PubMed] [Google Scholar]
- 39.Sterne JAC, Sutton AJ, Ioannidis JPA, et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 2011;343:d4002. [DOI] [PubMed] [Google Scholar]
- 40.Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997;315(7109):629–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Milette K, Roseman M, Thombs BD. Transparency of outcome reporting and trial registration of randomized controlled trials in top psychosomatic and behavioral health journals: A systematic review. J Psychosom Res 2011;70(3):205–17. [DOI] [PubMed] [Google Scholar]
- 42.Dwan K, Altman DG, Arnaiz JA, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One 2008;3(8):e3081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 2010;303(20):2058–64. [DOI] [PubMed] [Google Scholar]
- 44.Turrentine M. It’s All How You “Spin” It: Interpretive Bias in Research Findings in the Obstetrics and Gynecology Literature. Obstet Gynecol 2017;129(2):239–242. [DOI] [PubMed] [Google Scholar]
- 45.Duffy JMN, Ziebland S, von Dadelszen P, McManus RJ. Tackling poorly selected, collected, and reported outcomes in obstetrics and gynecology research. Am J Obstet Gynecol 2019;220(1):71.e1–71.e4. [DOI] [PubMed] [Google Scholar]
- 46.Speirs AL, Asch RH, Silber SJ. When predictions don’t predict. Aust N Z J Obstet Gynaecol 1991;31(4):346–7. [DOI] [PubMed] [Google Scholar]
- 47.Pocock SJ. Clinical Trials: A Practical Approach. New York John Wiley & Sons., 1983. [Google Scholar]
- 48.Bender R, Lange S. Adjusting for multiple testing--when and how? J Clin Epidemiol 2001;54(4):343–9. [DOI] [PubMed] [Google Scholar]
- 49.Kumar A, Chakraborty BS. Interim analysis: A rational approach of decision making in clinical trial. J Adv Pharm Technol Res 2016;7(4):118–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Liao C, Speirs AL, Goldsmith S, Silber SJ. When “facts” are not facts: what does p value really mean, and how does it deceive us? J Assist Reprod Genet 2020;37(6):1303–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jones SR, Carley S, Harrison M. An introduction to power and sample size estimation. Emerg Med J 2003;20(5):453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Whitley E, Ball J. Statistics review 4: sample size calculations. Crit Care 2002;6(4):335–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med 1984;3(4):409–22. [DOI] [PubMed] [Google Scholar]
- 54.Chan AW, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet 2005;365(9465):1159–62. [DOI] [PubMed] [Google Scholar]
- 55.Ayala NK, Rouse DJ. Nondefinitive Studies of Labor Induction Methods: Enough Already! Obstet Gynecol 2019;134(1):7–9. [DOI] [PubMed] [Google Scholar]
- 56.Grobman WA, Rice MM, Reddy UM, et al. Labor Induction versus Expectant Management in Low-Risk Nulliparous Women. N Engl J Med 2018;379(6):513–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Healy P, Gordijn SJ, Ganzevoort W, et al. A Core Outcome Set for the prevention and treatment of fetal GROwth restriction: deVeloping Endpoints: the COSGROVE study. Am J Obstet Gynecol 2019;221(4):339.e1–339.e10. [DOI] [PubMed] [Google Scholar]
- 58.Ganzevoort W, Alfirevic Z, von Dadelszen P, et al. STRIDER: Sildenafil therapy in dismal prognosis early-onset intrauterine growth restriction – a protocol for a systematic review with individual participant data and aggregate data meta-analysis and trial sequential analysis. Systematic Reviews 2014;3(1):23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Berghella V, Seibel-Seamon J. Contemporary use of cervical cerclage. Clin Obstet Gynecol 2007;50(2):468–477. [DOI] [PubMed] [Google Scholar]
- 60.Roman A, Zork N, Haeri S, et al. Physical examination–indicated cerclage in twin pregnancy: a randomized controlled trial. Am J Obstet Gynecol 2020;223(6):902.e1–902.e11. [DOI] [PubMed] [Google Scholar]
- 61.Encarnacion B, Zlatnik MG. Cesarean delivery technique: evidence or tradition? A review of the evidence-based cesarean delivery. Obstet Gynecol Surv 2012;67(8):483–94. [DOI] [PubMed] [Google Scholar]
- 62.Dahlke JD, Mendez-Figueroa H, Maggio L, Sperling JD, Chauhan SP, Rouse DJ. The Case for Standardizing Cesarean Delivery Technique: Seeing the Forest for the Trees. Obstet Gynecol 2020;136(5):972–980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Menard MK, Main EK, Currigan SM. Executive summary of the reVITALize initiative: standardizing obstetric data definitions. Obstet Gynecol 2014;124(1):150–153. [DOI] [PubMed] [Google Scholar]
- 64.Sharp HT, Johnson JV, Lemieux LA, Currigan SM. Executive Summary of the reVITALize Initiative: Standardizing Gynecologic Data Definitions. Obstet Gynecol 2017;129(4):603–607. [DOI] [PubMed] [Google Scholar]
- 65.American College of Obstetricians and Gynecologists (ACOG). reVITALize: Obstetric Data Definitions. (https://www.acog.org/practice-management/health-it-and-clinical-informatics/revitalize-obstetrics-data-definitions). Accessed October 13, 2021.
- 66.Liu Y, Thornton JG, Li W, van Wely M, Mol BW. Concerns about Data Integrity of 22 Randomized Controlled Trials in Women’s Health. Am J Perinatol 2021; ePub ahead of print. [DOI] [PubMed] [Google Scholar]
- 67.Li W, Bordewijk EM, Mol BW. Assessing Research Misconduct in Randomized Controlled Trials. Obstet Gynecol 2021;138(3):338–347. [DOI] [PubMed] [Google Scholar]
- 68.Grey A, Bolland MJ, Avenell A, Klein AA, Gunsalus CK. Check for publication integrity before misconduct. Nature 2020;577(7789):167–169. [DOI] [PubMed] [Google Scholar]
