Abstract
Objectives
COVID-19, since its emergence in December 2019, has globally impacted research. Over 360 000 COVID-19-related manuscripts have been published on PubMed and preprint servers like medRxiv and bioRxiv, with preprints comprising about 15% of all manuscripts. Yet, the role and impact of preprints on COVID-19 research and evidence synthesis remain uncertain.
Materials and Methods
We propose a novel data-driven method for assigning weights to individual preprints in systematic reviews and meta-analyses. This weight termed the “confidence score” is obtained using the survival cure model, also known as the survival mixture model, which takes into account the time elapsed between posting and publication of a preprint, as well as metadata such as the number of first 2-week citations, sample size, and study type.
Results
Using 146 preprints on COVID-19 therapeutics posted from the beginning of the pandemic through April 30, 2021, we validated the confidence scores, showing an area under the curve of 0.95 (95% CI, 0.92-0.98). Through a use case on the effectiveness of hydroxychloroquine, we demonstrated how these scores can be incorporated practically into meta-analyses to properly weigh preprints.
Discussion
It is important to note that our method does not aim to replace existing measures of study quality but rather serves as a supplementary measure that overcomes some limitations of current approaches.
Conclusion
Our proposed confidence score has the potential to improve systematic reviews of evidence related to COVID-19 and other clinical conditions by providing a data-driven approach to including unpublished manuscripts.
Keywords: evidence synthesis, data-driven modeling, preprint, systematic review
Introduction
The global pandemic of COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread rapidly across the world since December 2019.1,2 As of June 14, 2023, over 360 000 manuscripts related to COVID-19 have been published or posted on PubMed Central (PMC) and preprint servers such as bioRxiv and medRxiv from researchers around the world.3,4 These manuscripts cover a wide range of topics that can help us understand the clinical and public health impacts of COVID-19, including disease mechanisms, diagnosis, treatment, and prevention, as well as viral infection, replication, pathogenesis, transmission, host range, and virulence. However, the vast amount of information from multiple sources can be overwhelming for policymaking and implementation in terms of volume, variety, and velocity of data. The World Health Organization’s director-general, Tedros Adhanom Ghebreyesus, stated, “We’re not just fighting a pandemic; we’re fighting an infodemic.” It is, therefore, essential to conduct timely and reliable evidence synthesis based on existing findings, to benefit clinical decision-making and inform future research agendas on clinical trials.
To address the COVID-19 pandemic, significant efforts have been made to produce current, evidence-based systematic reviews. Currently, more than 10 000 COVID-19-specific systematic review protocols have been registered at PROSPERO,5 a registry of systematic reviews. These protocols include over 100 living systematic reviews regularly updated with new evidence. In addition, several meta-analyses have also been conducted to investigate the effectiveness of COVID-19 treatments.6–8 These studies provide researchers and clinicians with valuable information about the effectiveness of interventions for patients infected with COVID-19.
Besides peer-reviewed papers, preprint articles should also be considered for inclusion in the systematic review process. Here, we define preprints as manuscripts that are publicly available on preprint servers and presented in the format of peer-reviewed journal articles.9 It is important to note that the actual review status of these preprints is unobserved and publicly unknown. Compared to peer-reviewed articles, preprints can speed up dissemination by eliminating the months-long interval between submitting a manuscript to a publisher to the first public release of the manuscript. Preprints can also collect public comments to improve the rigor of the work, provide new opportunities to form new scientific collaborations, and avoid publication bias since preprints are issued at the discretion of the author.
In recent years, large international sites such as bioRxiv and medRxiv have played a significant role in distributing unpublished research in health sciences. As of June 2022, there are over 43 000 preprints on medRxiv and 190 000 papers on bioRxiv. Since the COVID-19 pandemic, medRxiv and bioRxiv have been major sources for sharing research findings on COVID-19.2,10 The National Library of Medicine (NLM) at the National Institutes of Health (NIH) also made preprints resulting from research funded by the NIH available via PMC and, by extension, PubMed. From June 2020 through June 2022, NLM made more than 3300 preprints reporting NIH-supported COVID-19 research discoverable in PMC and PubMed. As of May 24, 2023, employing the search filter “preprint[filter],” the number of COVID-19-related preprints has risen to 9081.
These preprint servers demonstrate that preprint records could provide an avenue for the discovery of research prior to journal publication during the ongoing public health emergency, accelerating the point at which this research would otherwise be discoverable in the literature search. See Figure 1 for 2 examples, including the results of a randomized controlled trial (RCT) on the efficacy of dexamethasone in hospitalized COVID-19 patients conducted by the RECOVERY Collaborative Group (lower panel in Figure 1). This article was published 248 days later in the New England Journal of Medicine.11
Figure 1.
Preprints (eg, medRxiv) contain timely information on treatments for COVID-19.
In the recent Cochrane Handbook for Systematic Reviews of Interventions (Version 6.3, 202212), it says (in Chapter 4.4.5), “As with comments and letters, preprints should also be considered a potentially relevant source of study evidence. Recent and widespread availability of preprints has resulted from an urgent demand for emerging evidence during the COVID-19 pandemic.13–16As study data are often reported in multiple publications and may be reported differently in each,17efforts to identify all reports for eligible studies, regardless of publication format, are necessary to support subsequent stages of the review process to select, assess and analyze complete study data.12”
While the Cochrane Handbook and other sources recommend considering preprints as a source of evidence in systematic reviews, there is no clear consensus on properly incorporating preprints into these reviews. The quality of preprints, which have yet to undergo peer review, can be highly variable. The up-to-date PRISMA statement mentions the possibility of including preprints in systematic reviews but does not address how the quality of this type of evidence should be taken into account in the synthesis of evidence. Currently, there is a lack of guidance on how to appropriately use results from preprints in systematic reviews.18
Efforts have been made to assess the quality of studies within the systematic review process. Consolidated Standards of Reporting Trials (CONSORT) is a set of internationally recognized guidelines designed to improve the transparency and completeness of reporting RCTs.19 Typically presented in a checklist format,20 CONSORT was developed to ensure that researchers, clinicians, and readers have access to accurate and comprehensive information about clinical trial methodologies and findings. Another commonly used tool is the Cochrane Risk of Bias (ROB), which is also a qualitative assessment tool. This tool involves making judgments about the risk of bias across various domains, including randomization, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective reporting of outcomes, and other potential biases. These judgments are expressed using descriptive terms such as “low risk,” “high risk,” or “unclear risk” to qualitatively assess the potential for bias in each domain. However, it does not assign quantitative scores or weights to these judgments.
For observational studies, STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) was developed.21 Similar to the CONSORT guidelines, STROBE also uses the checklist format to qualitatively assess the observation studies such as cohort, case-control, and cross-sectional studies. Additionally, Newcastle-Ottawa Scale (NOS) is another tool designed for observational studies. However, it shares the limitation of not providing quantitative scores for assessing the quality of these studies.21
In this article, we introduce, for the first time, a data-driven approach for assigning a “confidence score” to preprints. The confidence score is aimed to reflect the likelihood of a preprint surviving the peer review process. One benefit of the confidence score is that it is naturally defined on a scale from 0 to 1, which can be used as a weight in a subsequent meta-analysis. To determine the confidence score, we use “timestamps” for each preprint (ie, the date it was posted and the date it was published, if applicable). By studying the “life cycle” of preprints in this way, we can examine factors such as the length of time a preprint is posted before being published and the impact of study-level and preprint-specific features (eg, the number of patients in the study, citations of the preprint) on the likelihood of publication. We hypothesize that by appropriately modeling the life cycle of preprints using timestamps and relevant features, we can accurately predict the likelihood of a preprint being published.
In Figure 2, we present a framework for literature review and meta-analysis by using confidence scores. Following the literature review and study selection process (Step A), the confidence scores of the preprints can be calculated along with the assessment of risk of bias and CONSORT/STROBE19,21 criteria (Step B). In the meta-analysis (Step C), the proposed confidence score can serve as a supplemental measure for weighing each individual study. It provides additional quantitative information beyond the qualitative measures of each study, eg, assessed by the Cochrane ROB 2.0 Tool for RCTs,22 the NOS for non-randomized studies, and the compliance to the CONSORT or STROBE guidelines.19,21
Figure 2.
The overall framework for literature review and meta-analysis with a data-driven confidence score.
Methods
Data
Databases and search strategy
We conducted a systematic search of preprints and peer-reviewed papers to identify studies on treatments for COVID-19. We searched the medRxiv, bioRxiv, arxiv, SSRN Electronic Journal, ChemRxiv, JMIR Preprints, and Research Square databases for preprints, and PubMed for peer-reviewed papers, using the following search terms: “COVID-19” or “SARS-CoV-2” and “clinical” and “treatment” and the name of the treatment. The treatments included chloroquine, hydroxychloroquine, azithromycin, carrimycin, convalescent plasma, ASC09, atazanavir, darunavir, danoprevir, lopinavir, ritonavir, stem cells, clazakizumab, olokizumab, sarilumab, siltuximab, sirukumab, tocilizumab, and remdesivir. Infectious disease experts, including clinicians and pharmacists, reviewed and summarized the emerging research on these treatments weekly to biweekly. For peer-reviewed papers, we identified manuscripts posted on journal websites that had not been previously posted on any preprint server.
Eligibility criteria
Eligibility was limited to human experimental studies of treatments or drugs related to COVID-19. Studies in vitro or involving animal testing were excluded, as were literature reviews and meta-analyses of COVID-19 treatments. We excluded the articles with missing data in the variables of interest and the ones that were published within the first 2 weeks from being posted on any preprint servers. The rationale behind omitting articles from this early post-preprint phase is to reduce biases in our confidence score calculations. Articles with a brief posting duration could provide non-informative data points for our model, as the actual time available for assessing publication status is not adequately represented. This precaution helps maintain the accuracy and reliability of our predictive outcomes. By excluding articles published within this short timeframe after being posted as a preprint, we ensure that the confidence score reflects the manuscript’s original state at the time of preprint submission. The study selection flow diagram is shown in Figure 4.
Figure 4.
Literature review and study selection flow diagram.
Selection process
Three reviewers (A.B., A.P., O.W., trained by M.E.S. and J.T., who had previous experience in systematic review) independently screened preprint titles and abstracts using prespecified selection criteria. Any discrepancies were resolved through discussion, with any remaining conflicts being resolved by a fourth reviewer (J.T.). The reviewers also designed data collection forms to extract relevant information from eligible studies and manually checked for duplicates by comparing authors’ names, publication or posting dates, bioRxiv or medRxiv links, and population sizes.
Data extraction
Researchers with expertise in epidemiology and biostatistics extracted and organized data from 146 preprints and 19 peer-reviewed papers (not posted on any preprint server) in a systematic format. We collected various features from the eligible studies. The full list is available in the Appendix Section SA2. Considering the data completeness, the list of 11 covariates for analysis includes whether or not the study was an RCT, median age of participants, the sample size of the study cohort, single- or multi-center study design, whether or not the results were preliminary, whether or not the analysis was adjusted for confounding variables, country of study (with China, the United States, Europe, and other countries as the reference group), whether or not the study was observational, h-index of the last author, citation counts within the first 2 weeks posted on preprints server, and PDF download counts in first 2 weeks posted on the server.
The feature “whether or not the results were preliminary” refers to preprints where authors explicitly state in their manuscript that the study and its associated findings were preliminary. During the data extraction phase, we conducted a manual search for the keyword “preliminary” within the preprints’ titles, abstracts, and main text to identify this feature.
Regarding the “country” feature, we categorized the study’s country of origin based on the source of the dataset utilized within the preprints. If a study drew data from multiple countries, we assigned it to the “other” category within the “country” feature.
Statistical analysis
Figure 3 illustrates the 2 ways in which a manuscript may be submitted. First, a manuscript is submitted directly to a journal without posting to a preprint server, undergoes review, and may eventually be published or not. In this case, we only have access to published studies. Second, a manuscript is posted to a preprint server either before or during the review. In this case, a preprint may be “published,” “not yet published” (eg, still under review or not yet submitted to a journal), or “never published” (eg, not accepted by any peer-reviewed journal). Unlike direct submission to a journal, we can observe the preprints regardless of their publication status. Preprints that have not been published in a journal by the date of administrative censoring, which is the date on which the preprint was identified and relevant data were collected, are either “not yet published” or “never published.” While we cannot directly observe which of these categories unpublished preprints fall into, we can model the probability that manuscripts will be published by time t from the date a preprint was posted. Third, there are instances where a manuscript is posted to a preprint server following acceptance. For such preprints, the interval between their posting and publication dates can be brief, rendering their publication status less relevant to our study. To ensure the integrity of our analysis, we have excluded articles that were published within the first 2 weeks of appearing on any preprint server.
Figure 3.
The proposed framework to derive the “confidence score” via the life cycle of a manuscript via its “timestamps.” Events in grey are unobserved.
It is important to acknowledge that the peer-review process can introduce modifications to a manuscript’s content and structure. In the context of our method, we considered the fact that the fundamental findings and major conclusions presented in a preprint should remain consistent throughout the subsequent revisions for publication.23 While minor edits and refinements are expected during the peer-review process, our methodology is designed to focus on the core research outcomes and conclusions.
We propose to use a survival mixture model24,25 to calculate the probability that a preprint will eventually be published. The survival mixture models are typically used in cases where the survival distribution is believed to arise from a mixture of patients who will never experience the outcome of interest (ie, are “cured”) and those who will experience the outcome. In our context, we consider publication to be the outcome of interest, so preprints that will never be published are considered “cured” (ie, never experience the outcome of interest, which is publication). For each preprint, we determine whether it was published and recorded the time from posting to publication, or administrative censoring at a given time (eg, April 30, 2021). Appendix Figure SA1 illustrates a survival mixture model,24–26 which assumes that all preprints can be divided into 2 groups: those that will eventually be published and those that will never be published (ie, publication time is infinity or likelihood of publication is 0).
The survival mixture model characterizes the distribution of the time it takes for a preprint to be published for preprints that are published, as well as the probability of a preprint never being published. Denote by the time from posting to publication. For a preprint with features , represents its probability of being published, or in other words, the probability of being finite given features . This probability may be related to various features, including study-level features (eg, number of patients, median age, RCT or observational study, single or multi-center study) and preprint-specific features (eg, citations of the preprint during the first 2 weeks on a preprint server, number of downloads during first 2 weeks on preprint server).
We let be the probability of for those that will be published, which depends on features . The mixture model assumes that the probability that a preprint has not been published by time is:
(1) |
We use logistic regression with a logit link to model and proportional hazards (PHs) regression to model , ie, and, where is the coefficient of the effects of , which is the parameter of interest to estimate, and is baseline survival function. denotes the probability of being published for those preprints who have not been published up to time and represents the probability of being published by setting all covariates to be zero. Using model (1), we can predict the likelihood that a preprint will be published. To predict the publication probability, which we termed the confidence score, it is important to account for the publication status observed at the time of analysis. For the published preprints, the confidence score is denoted as . For the not yet published ones, the confidence score should reflect the probability of eventual publication, given both study features and the fact that the study has not yet been published. Specifically, we define as the time between posting and analysis. Instead of using that only accounts for study features, we calculate the probability Under model (1), this probability (ie, confidence score) can be expressed as . The intuition of this probability is that for the preprints that have not yet been published, the posted time on servers needs to be incorporated into the weights to be used in meta-analysis, instead of directly using as the weights. The features in the logistic regression model can be different or a subset of the variables in the PH regression model.
In our real data analysis, we used the R package “smcure”27 to calculate the confidence scores for the preprints based on extracted features, including study type (ie, RCT, observational study, others), median age, sample size, single or multi-center study design, whether or not the results were preliminary, whether or not the analysis was adjusted for confounding variables, country of study, citation counts in first 2 weeks, number of PDFs downloaded from preprint servers in first 2 weeks, and h-index of the last author from Google scholar at the posted date. If 2 manuscripts shared identical features, such as sample size, median age, and others, a distinction in their posting times, which is also a key feature in the predictive model, could lead to varying confidence scores.
Enabling data-driven incorporation of preprints in meta-analysis using a confidence score
In the meta-analysis, the confidence scores are utilized as weights for the preprints. Suppose there are total studies. We use to denote the weight for the th study in the meta-analysis, . If the th study is already published at the time of analysis, we set to be 1. If the th study is not published yet, its probability of never being published is estimated from the survival mixture model. The weight of the ith study in the meta-analysis is set as if it is not published. It is worth noting that when 2 studies with the same features remain unpublished at the time of meta-analysis, the earlier posted study is assigned a lower weight. Based on the which characterizes the chance of publication for the ith study, we propose the following multiple imputation procedure for incorporating preprints into meta-analysis studies, taking into account their probability of publication. The preprints with higher confidence scores (ie, more likely to be published) will be assigned more weights in the meta-analysis.
For each preprint, we impute its publication status () using a Bernoulli distribution with probability , where = 1 indicates studies that will be published, and = 0 indicates a study that will never be published. We then conduct a random-effects meta-analysis using all the studies with = 1. This allows us to estimate an effect size and its standard error, as well as the heterogeneity variance . By repeating this imputation-estimation process multiple times, we can obtain a final estimate for the overall effect size by taking the average of the estimated effect sizes for all imputations, as well as a final estimate for the heterogeneity variance. In our case study, this imputation process was repeated 500 times.
Results
Study selection
We identified 146 preprints and 19 peer-reviewed papers (not posted on any preprint server) related to COVID-19 treatments, which were collected up to April 30, 2021. The literature review flow diagram is presented in Figure 4. Out of the 146 preprints, 84 have been published by April 30, 2021. The summary of variables is presented in Table 1.
Table 1.
Summary characteristics of 146 preprints.
Overall (n = 146) | Not yet published by April 30, 2021 (n = 62) | Published by April 30, 2021 (n = 84) | P-value | |
---|---|---|---|---|
RCT | .31 | |||
No = 0 | 127 (87.0%) | 56 (90.3%) | 71 (84.5%) | |
Yes = 1 | 19 (13.0%) | 6 (9.7%) | 13 (15.5%) | |
Age | .56 | |||
Mean (SD) | 57.4 (13.1) | 56.7 (16.3) | 58.0 (10.1) | |
Sample size (log) | .01a | |||
Mean (SD) | 4.97 (1.72) | 4.55 (1.73) | 5.27 (1.66) | |
Single center | .14 | |||
No = 0 | 50 (34.2%) | 17 (27.4%) | 33 (39.3%) | |
Yes = 1 | 96 (65.8%) | 45 (72.6%) | 51 (60.7%) | |
Preliminary result | .23 | |||
No = 0 | 113 (77.4%) | 51 (82.3%) | 62 (73.8%) | |
Yes= | 33 (22.6%) | 11 (17.7%) | 22 (26.2%) | |
Adjusted analysis | .34 | |||
No = 0 | 78 (53.4%) | 36 (58.1%) | 42 (50.0%) | |
Yes = 1 | 68 (46.6%) | 26 (41.9%) | 42 (50.0%) | |
Observational study | .74 | |||
No = 0 | 38 (26.0%) | 17 (27.4%) | 21 (25.0%) | |
Yes = 1 | 108 (74.0%) | 45 (72.6%) | 63 (75.0%) | |
PDF downloaded (log) | .82 | |||
Mean (SD) | 5.57 (2.12) | 5.52 (2.28) | 5.60 (2.02) | |
Country—China | .62 | |||
No = 0 | 109 (74.7%) | 45 (72.6%) | 64 (76.2%) | |
Yes = 1 | 37 (25.3%) | 17 (27.4%) | 20 (23.8%) | |
Country—United States | .81 | |||
No = 0 | 114 (78.1%) | 49 (79.0%) | 65 (77.4%) | |
Yes = 1 | 32 (21.9%) | 13 (21.0%) | 19 (22.6%) | |
Country—Europe | .14 | |||
No = 0 | 101 (69.2%) | 47 (75.8%) | 54 (64.3%) | |
Yes = 1 | 45 (30.8%) | 15 (24.2%) | 30 (35.7%) | |
Last author h-index | .87 | |||
<30 = 0 | 115 (78.8%) | 49 (79.0%) | 66 (78.6%) | |
>30 = 1 | 31 (21.2%) | 13 (21.0%) | 18 (21.4%) | |
Citation countsa | .01a | |||
Mean (SD) | 3.33 (11.4) | 0.633 (1.96) | 5.31 (14.7) |
Citation counts: citation counts within the first 2 weeks posted on preprints server.
Confidence scores derived from metadata of 146 preprints
For the multivariate survival mixture model, the estimated coefficients of the cure probability model and PHs model with corresponding 95% CIs and P-values are presented in Table 2. The effect sizes of the mixture model are the estimated coefficients, denoted as γ, of the confidence score, denoted as in “Statistical analysis” section. For example, the effect size of the “country—China” variable was 0.53, which could be interpreted as that keeping all other variables constant, the confidence score of a preprint utilizing a study cohort from China was 1.70 (=exp(0.53)) times that of a preprint utilizing data from other countries other than the United States and Europe or multiple countries. However, evidence indicated that this effect was not statistically significant with a P-value of .93 and a confidence interval covering zero. Across all the variables, the citation count was the only variable that is significantly associated with the confidence score with a P-value of .07. In the sensitivity analysis, we excluded one variable at a time to calculate the confidence score. The results showed that citation counts had the greatest impact on the performance of the predictive model.
Table 2.
Estimated effect sizes of the variables from the survival mixture model.
Variable | Effect size (95% CI) of mixture model | P-value | Effect size (95% CI) of PH model | P-value |
---|---|---|---|---|
RCT | 4.44 (-14.42, 23.3) | .64 | −0.15 (-1.34, 1.03) | .80 |
Median age | −0.17 (-0.41, 0.08) | .18 | 0.04 (-0.02, 0.10) | .15 |
Sample size | −0.37 (-1.82, 1.07) | .61 | 0.19 (-0.26, 0.64) | .41 |
Single center | 0.45 (-5.77, 6.66) | .89 | 0.39 (-0.79, 1.57) | .51 |
Preliminary results | 1.60 (-4.58, 7.77) | .61 | 0.31 (-0.82, 1.44) | .59 |
Adjusted analysis | 1.48 (-5.35, 8.32) | .67 | −0.75 (-1.89, 0.39) | .20 |
Observational study | 2.64 (-8.12, 13.39) | .63 | −0.49 (-1.77, 0.79) | .45 |
PDF downloaded counts | −0.18 (-2.33, 1.97) | .87 | −0.26 (-0.61, 0.10) | .15 |
Country—China | 0.53 (-11.76, 12.82) | .93 | −1.04 (-3.03, 0.95) | .30 |
Country—United States | 1.50 (-9.6, 12.59) | .79 | −0.97 (-3.27, 1.32) | .41 |
Country—Europe | 3.13 (-5.55, 11.8) | .48 | −0.31 (-1.86, 1.23) | .69 |
Last author h-index | 1.97 (-1.83, 5.76) | .31 | −0.10 (-1.10, 0.89) | .84 |
Citation countsa | 4.29 (0.48, 9.07) | .07 | 0.11 (-0.05, 0.28) | .18 |
Citation counts: citation counts within the first 2 weeks posted on the preprints server.
Similarly, the effect sizes of the PHs regression model are the estimated coefficients, denoted as β in the survival component in the survival mixture model, and are interpreted as the log hazard ratios. To illustrate, considering the “preliminary results” variable, where the hazard of being published of a preprint, which presents preliminary results, was about 1.36 (=exp(0.31)) times the hazard of another preprint that either does not present preliminary results or makes no mention of them, though evidence indicated that this effect was not statistically significant.
Predictive performance of confidence scores
Figure 5a shows the separation of confidence scores plotted against actual publication status. The median confidence score for unpublished preprints was 0.09, while for published preprints it was 0.98, which was significantly higher. Figure 5b presents the in-sample receiver operating characteristic (ROC) curve of the predictive model, with an area under the curve (AUC) of 0.95 (95% CI, 0.92-0.98). Due to the limited number of total studies, we conducted leave-one-out cross-validation instead of splitting the data into training and test sets. The AUC value was 0.83 (95% CI, 0.75-0.89). Overall, our pilot study has shown that the confidence score has the potential to be a useful predictive measure of the likelihood of publication, even with a limited number of features extracted from preprints. The detailed publication status of 146 preprints along with the confidence scores obtained from the survival mixture model is presented in Appendix Figure SA2.
Figure 5.
Predictive performance of confidence scores. (a) Box plot of estimated confidence score, stratified by publication status; (b) ROC curve of confidence score in predicting the publication status.
Evaluation of proposed confidence score
Numerous studies have subsequently shown that hydroxychloroquine does not improve survival rates in COVID-19 patients. As a result, the NIH stopped clinical trials of hydroxychloroquine for COVID-19 in mid-2020, and the FDA revoked the EUA on June 15, 2020. The lack of effectiveness of hydroxychloroquine in COVID-19 provides a clear example of the importance of evidence-based medicine, as the relative risk between the treatment group and the control group (ie, those receiving placebo or standard care) was likely to be small.28
Within the 146 preprints, we identified 9 preprints that examined the effectiveness of hydroxychloroquine on the mortality rate of critically ill COVID-19 patients using both a treatment arm and a control arm. No eligible studies with 2 arms on hydroxychloroquine were identified in the 19 peer-reviewed papers that were not posted on any server before publication. We conducted a conventional meta-analysis and a proposed method using a confidence score to compare the results. The conventional meta-analysis found an estimated risk ratio of 1.65 (95% CI, 0.92-2.93) for mortality between the 2 arms as shown in Figure 6. The proposed method, which used the confidence score to impute a publication status, found a risk ratio of 1.36 (95% CI, 0.75-2.44). Our analyses showed that properly including preprints led to an estimated risk ratio closer to 1. Although the results were not statistically significant, the direction of the effect is consistent with the emerging clinical consensus that hydroxychloroquine is not effective for COVID-19.
Figure 6.
Meta-analysis results in terms of a cumulative forest plot based on posted date, using a random effects model and the proposed method of estimating the overall risk ratio of hydroxychloroquine effectiveness versus placebo.
Discussion
In this study, we present a method for incorporating preprints into systematic reviews and meta-analyses based on the probability of being published. Our approach uses a survival mixture model to predict the probability that a preprint will eventually be published, which we refer to as the confidence score. The confidence score is then used as a weight in a multiple imputation process during the meta-analysis. To demonstrate the effectiveness of our method, we conducted systematic reviews of 146 preprints meeting our inclusion criteria, calculated the confidence scores, and used a case study to validate the confidence score. Our proposed approach provides a solution for integrating preprints into evidence synthesis by considering their probability of publication using confidence scores.
Our proposed method simplifies the process of living systematic reviews by incorporating preprints based on the publication likelihood. Traditional living systematic reviews and cumulative meta-analyses demand significant long-term commitments from large research teams. For example, Siemieniuk et al6 published and maintain a living systematic review of network meta-analyses, comparing 23 treatments for COVID-19, which involved a team of 58 researchers across 33 institutions. Our framework addresses this challenge by providing a high-quality evidence synthesis that includes preprints guided by their likelihood of publication. This approach advances beyond the arbitrary exclusion of certain preprints or naïve inclusion of all preprints, streamlining the synthesis process and conserving valuable time and resources.
Alternatively, the proposed framework can be extended to a resource-intensive living systematic review. By continuously monitoring evidence through living systematic reviews, preprints can be incorporated using a data-driven survival mixture model to enhance the validity of results. This approach is well-suited for cumulative meta-analyses, which allow for examining temporal trends in treatment effects.29 Although the preprint search is labor-intensive, involving extensive document searching and feature extraction, it necessitates less domain expertise due to the employment of confidence scores that indicate the likelihood of publication. This data-driven approach facilitates the systematic inclusion of preprints into meta-analyses based on their confidence scores, thereby saving efforts in the review and selection process. To further save efforts in data extraction, we plan to leverage existing natural language processing (NLP) tools, applying our repository of manually extracted data to refine and optimize these models. This integration will elevate the automation of data extraction and bolster the breadth and depth of features identified, leading to an enhanced research methodology. As we collect more data points and expand the feature set, we have the potential to improve the performance of our predictive model, by increasing its statistical power, generalizability, and representativeness. Once the predictive model performs well, its confidence scores can guide researchers in reviewing a subset of preprints, reducing the labor involved in the systematic review process.
Another advantage of our method is that our proposed method is not limited to the COVID-19 domain. While we chose to apply it within the context of COVID-19 due to the significant “infodemic” surrounding COVID-19-related literature in recent years, the method we propose offers broad adaptability beyond COVID-19, due to its inherent flexibility in incorporating variables and its foundation in data-driven principles. As we progress, the extraction of additional features adapted to specific fields will enable the creation of tailored feature sets for constructing domain-specific confidence scores. For optimal performance in these varied domains, the model will require retraining to align with their distinct characteristics.
Our proposed method and case studies have limitations that need further investigation. Firstly, the use of citation count as a measure of peer recognition in the confidence score is subjective and may be influenced by different types of citations. Future research should explore distinguishing between citation types and considering their specific context in the predictive model. Beyond considering citation counts and the h-index of the last author in the predictive model, other factors, such as the number of authors, the number of institutions involved, and funding sources, are potential features for predicting the confidence score in future research works. Secondly, publication could be a better indicator of quality and validity than the confidence score. The proposed method can help filter out low publication probability studies and address publication bias, but the quality of preprint journals should also be taken into account. Though the incorporation of the preprints could mitigate the publication bias to a certain extent because more not peer-reviewed manuscripts can be included in the systematic reviews and meta-analyses, the existence of publication bias is not negligible. We are currently engaged in another related project, where we are developing an innovative framework that combines the confidence score with a further publication bias quantification and correction step. We will report our new investigation in the near future. We also acknowledge that the preprints are often posted with different updated versions. It would be of great interest to further extend our confidence score to be time-varying, reflecting the most up-to-date version of the preprints in a given study. We envision that large language models can play a critical role in these important extensions.
Additionally, our inclusion of both RCTs and observational studies in the case studies may impact the validity of efficacy estimates. However, including both types of studies allowed for larger sample sizes and more robust conclusions. Futhermore, our case studies focused on all-cause mortality but had varying lengths of follow-up. To ensure an adequate sample size, we did not differentiate between different follow-up lengths, but future subgroup meta-analyses can address this as more studies become available. Lastly, the data collection process was supported by the Summer 2020 Penn Undergraduate Research Mentoring Program (PURM). The limited length of this program implies certain limitations in terms of sample size and data status of the dataset. Updating the sample size and data in the future would strengthen the robustness and generalizability of the findings.
To increase the discoverability of early NIH research results, the NLM initialized the NIH Preprint Pilot project to make eligible preprints resulting from research funded by NIH available via PMC. However, one main concern is the perception or possibility of low-quality content being added to PubMed.30 To this end, a safeguard can be implemented to ensure that preprints are not confused with peer-reviewed articles. While NLM has established eligibility criteria for a preprint server for inclusion, there is no quality control for individual preprints. To this end, the confidence score proposed in this article may assist users in filtering “very bad” preprints during screening. The researchers can also use it to perform the literature survey, by finding good preprints and (manually) validating their novelty. In addition, the authors can use the score to get initial feedback to improve the preprint paper such as adding missing important components (eg, sample size) and highlighting whether it was a multi-center study design.
It is important to recognize that the publication itself is never a perfect proxy for presenting the quality and validity of a study. We do not recommend replacing existing measures of study quality with the confidence score. Rather, we propose the confidence score as a novel data-driven metric derived from preprints as a complementary measure for the likelihood of publication. For example, the confidence score is completely data-driven, which is obtained quantitively from the life cycles of manuscripts along with some covariate information. On the other hand, the study quality measures are subjective to some extent. While employing multiple independent reviewers to assess the studies may help partially address this issue, it has been found that there was a low level of agreement between the reviewers and authors in their assessments.31 Moreover, due to missing information (eg, details about blinding and concealment) reported in the original articles, many domains of risk of bias assessments could be unclear,32 leaving large uncertainties in how they might affect evidence synthesis. The confidence score does not have such issues as it is derived from a different perspective, ie, the likelihood of a study being eventually published.
Scientists issue preprints to increase the discoverability of early research results. However, it also noted that preprints may offer lower-quality information than work that was formally peer-reviewed. In this work, we proposed a method for incorporating preprints based on their probability of publication into systematic reviews and meta-analyses. As a starting point, this study represents a proof-of-concept effort to assess the preprints’ probability of publication rather than a mark of its quality, aiming to enhance the rigor of research synthesis and contribute to a more balanced scientific dialogue. While we focused on searching for and predicting COVID-19-related articles, the proposed method can be used for other research topics, especially in quickly evolving fields, such as the effectiveness of vaccines in prevention, cardiology, and oncology. Taken together, we hope our system will help researchers speed dissemination, establish priority, obtain feedback, and offset publication bias.
Conclusion
In conclusion, the confidence score we have proposed holds promise in enhancing systematic reviews of evidence pertaining to COVID-19 and other clinical conditions. By offering a data-driven approach, it enables the inclusion of unpublished manuscripts, thereby improving the overall comprehensiveness and reliability of such reviews.
Supplementary Material
Contributor Information
Jiayi Tong, The Center for Health Analytics and Synthesis of Evidence (CHASE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Chongliang Luo, Division of Public Health Sciences, Washington University School of Medicine in St Louis, St Louis, MO 63110, United States.
Yifei Sun, Department of Biostatistics, Columbia University, New York City, NY 10032, United States.
Rui Duan, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Cambridge, MA 02115, United States.
M Elle Saine, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Lifeng Lin, Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ 85724, United States.
Yifan Peng, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 11101, United States.
Yiwen Lu, The Center for Health Analytics and Synthesis of Evidence (CHASE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States; The Graduate Group in Applied Mathematics and Computational Science, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, United States.
Anchita Batra, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Anni Pan, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Olivia Wang, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Ruowang Li, Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA, United States.
Arielle Marks-Anglin, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Yuchen Yang, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Xu Zuo, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States.
Yulun Liu, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
Jiang Bian, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States.
Stephen E Kimmel, Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL 32610, United States.
Keith Hamilton, Department of Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA 19104, United States.
Adam Cuker, Department of Medicine and Department of Pathology & Laboratory Medicine, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Rebecca A Hubbard, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States.
Hua Xu, Section of Biomedical Informatics & Data Science, Yale School of Medicine, New Haven, CT 06510, United States.
Yong Chen, The Center for Health Analytics and Synthesis of Evidence (CHASE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States; The Graduate Group in Applied Mathematics and Computational Science, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, United States; Leonard Davis Institute of Health Economics, Penn Medicine, Philadelphia, PA 19104, United States; Center for Evidence-based Practice (CEP), Philadelphia, PA 19104, United States; Penn Institute for Biomedical Informatics (IBI), Philadelphia, PA 19104, United States.
Author contributions
J.T. and Y.C. developed the methods; J.T., A.B., A.P., O.W., and M.E.S. collected the data from PubMed and preprints server; Y.C., A.C., and H.X. guided the implementation of case studies; Y.C. conceived the idea; J.T. and Y.C. developed and implemented the methods; J.T. conducted the analyses; All authors interpreted the results and provided instructive comments; J.T. and Y.C. drafted the main manuscript. All authors have approved the manuscript.
Supplementary material
Supplementary material is available at Journal of the American Medical Informatics Association online.
Funding
This work was supported in part by National Institutes of Health (1R01LM014344, 1R01AG077820, R01LM012607, R01AI130460, R01AG073435, R56AG074604, R01LM013519, R56AG069880, U01TR003709, RF1G077820, R21AI167418, R21EY034179). This work was supported partially through Patient-Centered Outcomes Research Institute (PCORI) Project Program Awards (ME-2019C3-18315 and ME-2018C3-14899). All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee.
Conflicts of interest
None declared.
Data availability
The data that support the findings of this study are available from the corresponding author upon request.
References
- 1. Velavan TP, Meyer CG.. The COVID-19 epidemic. Trop Med Int Health. 2020;25(3):278-280. 10.1111/tmi.13383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Coronavirus in the U.S.: Latest Map and Case Count—The New York Times. Accessed March 31, 2021. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
- 3.LitCovid—NCBI—NLM—NIH. Accessed April 1, 2021. https://www.ncbi.nlm.nih.gov/research/coronavirus/
- 4.medRxiv COVID-19 SARS-CoV-2 Preprints from medRxiv and bioRxiv. Accessed April 1, 2021. https://connect.medrxiv.org/relate/content/181
- 5.PROSPERO. Accessed September 5, 2021. https://www.crd.york.ac.uk/prospero/#searchadvanced
- 6. Siemieniuk RA, Bartoszko JJ, Zeraatkar D, et al. Drug treatments for covid-19: living systematic review and network meta-analysis. BMJ. 2020;370:m2980. 10.1136/bmj.m2980 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Sterne JAC, Murthy S, Diaz JV, et al. ; WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group. Association between administration of systemic corticosteroids and mortality among critically ill patients with COVID-19: a meta-analysis. J Am Med Assoc. 2020;324(13):1330-1341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Axfors C, Schmitt AM, Janiaud P, et al. Mortality outcomes with hydroxychloroquine and chloroquine in COVID-19 from an international collaborative meta-analysis of randomized trials. Nat Commun. 2021;12(1):3001-3013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.NOT-OD-17-050: Reporting Preprints and Other Interim Research Products. Accessed May 22, 2023. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-050.html
- 10.How Science Moved Beyond Peer Review During The Pandemic | FiveThirtyEight. Accessed December 17, 2022. https://fivethirtyeight.com/features/how-science-moved-beyond-peer-review-during-the-pandemic/
- 11. RECOVERY Collaborative Group; Horby P, Lim WS, Emberson JR, et al. Dexamethasone in hospitalized patients with Covid-19. N Engl J Med. 2021;384(8):693-704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cochrane Handbook for Systematic Reviews of Interventions—Google Books. Accessed December 17, 2022. https://books.google.com/books?hl=en&lr=&id=cTqyDwAAQBAJ&oi=fnd&pg=PR3&dq=Higgins+JP,+Thomas+J,+Chandler+J,+et+al.+Cochrane+handbook+for+systematic+reviews+of+interventions+version+6.3+(updated+February+2022):+Cochrane%3B+2022.+&ots=tvmGy5zBon&sig=drjwieia7go35-qJzVsShm1kxGY#v=onepage&q&f=false
- 13. Gianola S, Jesus TS, Bargeri S, et al. Characteristics of academic publications, preprints, and registered clinical trials on the COVID-19 pandemic. PLoS One. 2020;15(10):e0240123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Kirkham JJ, Penfold NC, Murphy F, et al. Systematic examination of preprint platforms for use in the medical and biomedical sciences setting. BMJ Open. 2020;10(12):e041849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Callaway J. Librarian Reserve Corps: literature indexing and metadata enhancement (LIME) observations from a year in the field. JoHILA. 2021;2(1):35-45. [Google Scholar]
- 16. Fraser N, Brierley L, Dey G, et al. The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biol. 2021;19(4):e3000959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Oikonomidi T, Boutron I, Pierre O, et al. ; COVID-19 NMA Consortium. Changes in evidence for studies assessing interventions for COVID-19 reported in preprints: meta-research study. BMC Med. 2020;18(1):402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/BMJ.N71 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Falci SGM, Marques LS.. CONSORT: when and how to use it. Dental Press J Orthod. 2015;20(3):13-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. Int J Surg. 2012;10(1):28-55. [DOI] [PubMed] [Google Scholar]
- 21. Cuschieri S. The STROBE guidelines. Saudi J Anaesth. 2019;13(Suppl 1):S31-S34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Sterne JAC, Savović J, Page MJ, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898. 10.1136/BMJ.L4898 [DOI] [PubMed] [Google Scholar]
- 23. Bero L, Lawrence R, Leslie L, et al. Cross-sectional study of preprints and final journal publications from COVID-19 studies: discrepancies in results reporting and spin in interpretation. BMJ Open. 2021;11(7):e051821. 10.1136/bmjopen-2021-051821 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Othus M, Barlogie B, LeBlanc ML, et al. Cure models as a useful statistical tool for analyzing survival. Clin Cancer Res. 2012;18(14):3731-3736. 10.1158/1078-0432.CCR-11-2859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Sy JP, Taylor JMG.. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56(1):227-236. [DOI] [PubMed] [Google Scholar]
- 26. Berkson J, Gage RP.. Survival curve for cancer patients following treatment. J Am Stat Assoc. 1952;47(259):501-515. [Google Scholar]
- 27.CRAN—Package smcure. Accessed March 31, 2021. https://cran.r-project.org/web/packages/smcure/index.html
- 28.NIH Halts Clinical Trial of Hydroxychloroquine | National Institutes of Health (NIH). Accessed April 1, 2021. https://www.nih.gov/news-events/news-releases/nih-halts-clinical-trial-hydroxychloroquine
- 29. Lau J, Antman EM, Jimenez-Silva J, et al. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327(4):248-254. [DOI] [PubMed] [Google Scholar]
- 30. Funk K, Zayas-Cabán T, Beck J. Phase 1 of the NIH preprint pilot: testing the viability of making preprints discoverable in PubMed Central and PubMed. bioRxiv. 10.1101/2022.12.12.520156, December 13, 2022, preprint: not peer reviewed. [DOI]
- 31. Lo CKL, Mertz D, Loeb M.. Newcastle-Ottawa Scale: comparing reviewers’ to authors’ assessments. BMC Med Res Methodol. 2014;14(1):45-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Hartling L, Ospina M, Liang Y, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon request.