Early in the Epidemic: Impact of preprints on global discourse of 2019-nCoV transmissibility

Maimuna S Majumder; Kenneth D Mandl

doi:10.2139/ssrn.3536663

[Preprint]. 2020 Feb 12:3536663. [Version 1] doi: 10.2139/ssrn.3536663

Early in the Epidemic: Impact of preprints on global discourse of 2019-nCoV transmissibility

Maimuna S Majumder ^1,², Kenneth D Mandl ^1,²

PMCID: PMC7366820 PMID: 32714103

Since it was first reported by the World Health Organization (WHO) in early January 2020, nearly 43,000 cases of a novel coronavirus (2019–nCoV) have been diagnosed in China with exportation events to over 20 countries [1]. Given the novelty of the causative pathogen, scientists have rushed to fill epidemiological, virological, and clinical knowledge gaps – resulting in over 50 new studies about the virus between January 10 and January 30 alone [2]. However, in an era where the immediacy of information has become an expectation of decision-makers and the general public alike, many of these studies have been shared first in the form of preprint papers – prior to peer review.

For the past three decades, preprint servers have become commonplace in the scientific publication ecosystem, and 2019–nCoV has prompted a seemingly unprecedented utilization of these platforms [3]. Though peer-review is critical for the validation of science, the ongoing outbreak has showcased the rapidity with which preprints can disseminate information during emergencies.

Here, we use both preprint and peer-reviewed studies that estimated the transmissibility potential (i.e. basic reproduction number, R₀) of 2019–nCoV on or prior to February 1, 2020 to investigate the role preprints have had in information dissemination during the ongoing outbreak. We also analyze the agreement of preprint estimates as compared against those presented by peer-reviewed studies and propose a consensus-based approach for evaluating the validity of preprint findings during public health crises.

What We Did

To conduct our analysis, we collected publicly available data from scientific studies, news reports, and search trends pertaining to 2019–nCoV and its basic reproduction (or reproductive) number (R₀). Defined as the average number of secondary infections a new case may infect in a fully susceptible population, estimates of R₀ can provide decision-makers with insights into the epidemic potential of a given outbreak.

Relevant news reports and search trends were discovered via MediaCloud and Google Search Trends respectively and served as a proxy indicator for information dissemination [4, 5]. Meanwhile, relevant scientific studies were discovered through a combination of searches executed via Google Scholar and – to address possible delays in indexing – four popular public servers (i.e. arXiv, bioRxiv, medRxiv, and SSRN) that we believe are representative of the relevant preprint literature. Search terms and specifications for each data source are outlined in Table 1. All studies discovered via Google Scholar, arXiv, bioRxiv, medRxiv, and SSRN were manually checked for relevance to the topic area of interest. Only studies that included estimates for the basic reproduction number associated with 2019–nCoV in the body of the text were retained.

Table 1. Search terms and specifications for data discovery by data source.

Exact search terms are shown with Boolean operators as specified by each data source. Relevant studies were collected through February 1, 2020, and due to indexing delays on the Google Scholar platform, the preprint servers arXiv, bioRxiv, medRxiv, and SSRN were used (in addition to Google Scholar) for study collection. Data for MediaCloud and Google Search Trends were collected From January 1, 2020 through February 9, 2020 to allow for baseline and posterior observation of news media and search interest in the topic area. Though geographic ranges of news media and search trends were specifiable, worldwide catchment was selected to ensure that international searches and media were captured in discovery.

Data Source	Search Terms	Date Range	Geographic Range
Google Scholar	• coronavirus reproduction • coronavirus reproductive	February 1, 2019 – February 1, 2020	not specifiable
Preprint Servers	• coronavirus reproduction • coronavirus reproductive	January 1, 2020 – February 1, 2020	not specifiable
MediaCloud	coronavirus AND (reproductive OR reproduction)	January 1, 2020 – February 9, 2020	worldwide (global)
Google Search Trends	coronavirus reproductive + coronavirus reproduction	January 1, 2020 – February 9, 2020	all media (global)

Open in a new tab

After this initial data discovery phase, date of first publication, publication platform, review status (i.e. preprint vs. peer-reviewed), and methodological details were manually curated from each of the 11 individual studies (Table 2) [6-16]. R₀ estimates were also extracted from each study for further analysis. In the event of multiple R₀ estimates – due either to preprint revisions following the first version or the use of multiple approaches in a single study – each estimate was recorded and treated as a separate entry to represent all available knowledge at any given point in time (Table 2).

Table 2. Metadata collected for all R₀ estimates [6-16].

For preprints that were revised before publication of the first relevant peer-reviewed study on January 29, the version number is indicated between parentheses as (n). When multiple R₀ estimates were presented in a single study due to the use of multiple approaches, the version number is followed by a single decimal place to indicate the approach used as (n.n). If a first author published more than one relevant independent study before February 1, the version number is followed immediately by an alphabetical marker (ordered by date of publication) as (nx).

ID	Date of Publication	Publication Platform	Modeling Method	Temporal Data Used	R₀ Range Presented
Majumder and Mandl (1)	23-Jan-20	SSRN (preprint)	incidence decay + exponential adjustment	confirmed case counts	point estimates
Read et al. (1)	24-Jan-20	medRxiv (preprint)	deterministic compartmental metapopulation	confirmed case counts; travel data	95% CI
Riou and Althaus (1)	24-Jan-20	bioRxiv (preprint)	stochastic simulations	none	90% high density interval
Tang et al.	24-Jan-20	SSRN (preprint)	deterministic compartmental	confirmed case counts	95% CI
Zhao et al. (1a.1)	24-Jan-20	bioRxiv (preprint)	exponential growth model	confirmed case counts	95% CI
Zhao et al. (1a.2)	24-Jan-20	bioRxiv (preprint)	exponential growth	confirmed case counts	95% CI
Majumder and Mandl (2)	26-Jan-20	SSRN (preprint)	incidence decay + exponential adjustment	confirmed case counts	point estimates
Read et al. (2)	27-Jan-20	medRxiv (preprint)	deterministic compartmental metapopulation	confirmed case counts; travel data	95% CI
Zhou et al. (1.1)	28-Jan-20	arXiv (preprint)	deterministic compartmental	confirmed case counts	point estimates
Zhou et al. (1.2)	28-Jan-20	arXiv (preprint)	deterministic compartmental	estimated case count based off of exportation events	point estimates
Li et al.	29-Jan-20	NEJM (peer- revieiwed)	renewal equations	confirmed case counts	95% CI
Riou and Althaus (2)	30-Jan-20	Eurosurveillance (peer-revieiwed)	stochastic simulations	none	90% high density interval
Zhao et al. (2a.1)	30-Jan-20	IJID (peer-revieiwed)	exponential growth	confirmed case counts	95% CI
Zhao et al. (2a.2)	30-Jan-20	IJID (peer-revieiwed)	exponential growth	confirmed case counts	95% CI
Wu, Leung, and Leung	31-Jan-20	Lancet (peer- revieiwed)	Markov chain Monte Carlo	confirmed case counts	95% CrI
Zhao et al. (1b)	31-Jan-20	JCM (peer- revieiwed)	simulation	confirmed case counts	95% CI

Open in a new tab

Given that the first known preprint estimates for R₀ were posted to SSRN by us on January 23, search trend fractions and news report volume were plotted between January 23 and February 1 (Figure 1). Baseline data for both sources prior to January 23, 2020 yielded negligible interest and volume respectively, and data collected through February 9, 2020 demonstrated depreciating interest and volume following the catchment window in Figure 1 [4, 5]. To illustrate when each of the 11 relevant studies became available to the public, indicator bars were overlaid against the search trend and news report data by date of publication (Figure 1).

Figure 1. — Indicator bars demonstrate when 11 different studies – all of which estimated the R₀ associated with 2019–nCoV – were made available. Data are plotted from January 23, 2020 (i.e. the date of publication for the first study) through February 1, 2020. Though not shown here, search and news media interest prior to January 23, 2020 were negligible in the topic area, and interest continued to depreciate from February 2, 2020 through February 9, 2020 [4, 5].

We then plotted each of the 16 R₀ estimates produced by the 11 studies in Table 2, including both the mean and the estimate range (e.g. 95% CI, 95%CrI, etc.) presented. Estimates were plotted by date of publication and alphabetically there-in, offering a side-by-side comparison of preprint versus peer-reviewed results; averages and 95% confidence intervals were also computed for both groups (Figure 2).

Figure 2. — Ranges presented vary by study (e.g. 95%CI, 95%CrI, etc.) and are presented in Table 2.

What We Found

Google Search Trends and MediaCloud data suggest that both general (i.e. search) interest and news media interest in the basic reproduction number (R₀) associated with 2019–nCoV peaked prior to the publication of relevant peer-reviewed studies. In the selected time frame, search interest peaked on January 27 after a sharp increase between January 23 and January 25 immediately following the publication of 5 early preprint studies – all of which estimated R₀ – to bioRxiv, medRxiv, and SSRN. Meanwhile, news media interest peaked on January 28, coinciding with a 6^th preprint study published on arXiv (Figure 1). The first peer-reviewed estimates were then published by Li et al. on January 29 at 5:00 PM EST – as is standard for The New England Journal of Medicine – followed by four additional peer-reviewed studies in Eurosurveillance, The International Journal of Infectious Diseases, The Lancet, and Journal of Clinical Medicine through February 1 [12, 17].

Average R₀ estimates across both the preprint group and the peer-reviewed group were 3.61 [95%CI: 2.77, 4.45] and 2.54 [95%CI: 2.17, 2.91] respectively – demonstrating overlap in 95% confidence intervals despite a wide diversity of modeling methods and data sources used both in-group and across-group (Table 2). Though the average mean for the preprint group was higher than that of the peer-reviewed group, this effect was driven primarily by two outlier estimates (R₀ > 95%CI maximum) as is depicted in Figure 2 [9, 10]. Exclusion of these two estimates using a consensus-based approach based on the 95% confidence interval yielded an average R₀ estimate of 3.02 [95% CI: 2.65, 3.39] for the preprint group.

Notably, two of the studies in the peer-reviewed group had previously been published as preprints [13, 14]. Though estimates presented by Riou and Althaus remained unchanged following peer review, estimates presented by Zhao et al. were higher prior to peer review than they were following it.

What It Means

Due to the rapidity of their release, our findings suggest that preprints – rather than peer-reviewed literature in the same topic area – may be driving discourse related to the ongoing 2019–nCoV outbreak. Though our analysis focuses on search trend and news media data as a measure for general discourse, it is likely that preprints are also influencing policymaking discussions given that the WHO announced on January 26, 2020 that they would be creating a repository of relevant studies – including those that have not yet been peer-reviewed [18].

Nevertheless, despite the advantages of speedy information delivery, lack of peer review can also potentially translate into issues of credibility and misinformation – intentional and unintentional alike. This particular drawback has been highlighted during the ongoing outbreak especially following the high-profile withdrawal of a virology study from the preprint server bioRxiv, which erroneously claimed that 2019–nCoV contained HIV “insertions” [19]. Thankfully, the very fact that this study was withdrawn showcases the power of “public” peer-review during emergencies; the withdrawal itself appears to have been prompted by public outcry from dozens of scientists from around the globe who had access to the study because it was placed on a public server [20]. Nevertheless, instances like this one showcase the need for caution when considering the science put forth by any one preprint.

With this in mind, taking multiple studies into consideration as presented in our analysis can help operationalize the kind of caution necessitated by preprints while simultaneously allowing for important, robust insights prior to the publication of a peer-reviewed study in the same topic area. This kind of collective, consensus-based approach is arguably easiest when the research of interest is quantitative in nature; nevertheless, given that many critical epidemiological parameters that inform decision-making (e.g. incubation period, generation time, etc.) are quantitative, our proposed approach could work well in these contexts as well.

Bearing cautious consideration, our work demonstrates the powerful role preprints can play during public health crises due to the timeliness with which they can disseminate new information. While primacy and peer-reviewed publications are key metrics in scientific advancement, the impact of preprints on discourse and decision-making pertaining to the ongoing 2019–nCoV outbreak suggests that we must rethink how we reward and recognize community contributions during current and future public health crises.

Acknowledgments

Funding Statement – This work was supported in part by grant T32HD040128 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Footnotes

Conflicts of Interest – The authors declare no conflicts of interest.

References.

[1].Novel Coronavirus(2019-nCoV): Situation Report – 22. The World Health Organization, February 2020. (https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200211-sitrep-22-ncov.pdf). [Google Scholar]
[2].China coronavirus: how many papers have been published? Nature, January 2020. (https://www.nature.com/articles/d41586-020-00253-8). [DOI] [PubMed]
[3].Preprints can fill a void in times of rapidly changing science. STAT, January 2020. (https://www.statnews.com/2020/01/31/preprints-fill-void-rapidly-changing-science/). [Google Scholar]
[4].MediaCloud. February 2020. (https://mediacloud.org/).
[5].Google Search Trends. February 2020. (https://trends.google.com/).
[6].Majumder MS, Mandl KD. Early Transmissibility Assessment of a Novel Coronavirus in Wuhan, China. SSRN; 2020; doi: 10.2139/ssrn.3524675 (https://tinyurl.com/rkzrmtv [v1], https://tinyurl.com/tbu796z [v2]). [DOI] [Google Scholar]
[7].Read JM, Bridgen JRE, Cummings DAT, et al. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. medRxiv 2020; doi: 10.1101/2020.01.23.20018549 (https://tinyurl.com/vt9nmsw [v1], https://tinyurl.com/soroa5o [v2]). [DOI] [PMC free article] [PubMed]
[8].Riou J, Althaus CL. Pattern of early human-to-human transmission of Wuhan 2019-nCoV. bioRxiv 2020; doi: 10.1101/2020.01.23.917351 [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Tang B, Wang X, Li Q, et al. Estimation of the Transmission Risk of 2019-nCov and Its Implication for Public Health Interventions. SSRN; 2020; doi: 10.2139/ssrn.3525558 [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Zhao S, Ran J, Musa SS, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. bioRxiv 2020; doi: 10.1101/2020.01.23.916395 (https://www.biorxiv.org/content/10.1101/2020.01.23.916395v1 [v1]). [DOI] [PMC free article] [PubMed]
[11].Zhou T, Liu Q, Yang Z, et al. Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV. arXiv 2020. (https://arxiv.org/abs/2001.10530v1 [v1]). [DOI] [PMC free article] [PubMed]
[12].Li Q, Guan X, Wu P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. NEJM 2020; doi: 10.1056/NETMoa2001316 [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Riou J, Althaus CL. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Euro Surveill 2020; 25(4): 10.2807/1560-7917.ES.2020.25.4.2000058. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Zhao S, Lin Q, Ran J, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. Int J Infect Dis 2020; doi: 10.1016/j.ijid.2020.01.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Wu KT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 2020; doi: https://doi.org.10.1016/S0140-6736(20)30260-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Zhao S, Musa SS, Lin Q, et al. Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak. J Clin Med 2020; 9(2): 10.3390/jcm9020388 [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Frequently Asked Questions. The New England Journal of Medicine, February 2020. (https://www.nejm.org/media-center/frequently-asked-questions).
[18].New Coronavirus (2019-nCoV). The World Health Organization, January 2020. (https://twitter.com/WHO/status/1221475167869833217) [Google Scholar]
[19].Pradhan P, Pandey AK, Mishra A, et al. Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag (WITHDRAWN). bioRxiv 2020; doi: 10.1101/2020.01.30.927871 [DOI]
[20].Quick retraction of a faulty coronavirus paper was a good moment for science. STAT, February 2020. (https://www.statnews.com/2020/02/03/retraction-faulty-coronavirus-paper-good-moment-for-science/). [Google Scholar]

[R1] [1].Novel Coronavirus(2019-nCoV): Situation Report – 22. The World Health Organization, February 2020. (https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200211-sitrep-22-ncov.pdf). [Google Scholar]

[R2] [2].China coronavirus: how many papers have been published? Nature, January 2020. (https://www.nature.com/articles/d41586-020-00253-8). [DOI] [PubMed]

[R3] [3].Preprints can fill a void in times of rapidly changing science. STAT, January 2020. (https://www.statnews.com/2020/01/31/preprints-fill-void-rapidly-changing-science/). [Google Scholar]

[R4] [4].MediaCloud. February 2020. (https://mediacloud.org/).

[R5] [5].Google Search Trends. February 2020. (https://trends.google.com/).

[R6] [6].Majumder MS, Mandl KD. Early Transmissibility Assessment of a Novel Coronavirus in Wuhan, China. SSRN; 2020; doi: 10.2139/ssrn.3524675 (https://tinyurl.com/rkzrmtv [v1], https://tinyurl.com/tbu796z [v2]). [DOI] [Google Scholar]

[R7] [7].Read JM, Bridgen JRE, Cummings DAT, et al. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. medRxiv 2020; doi: 10.1101/2020.01.23.20018549 (https://tinyurl.com/vt9nmsw [v1], https://tinyurl.com/soroa5o [v2]). [DOI] [PMC free article] [PubMed]

[R8] [8].Riou J, Althaus CL. Pattern of early human-to-human transmission of Wuhan 2019-nCoV. bioRxiv 2020; doi: 10.1101/2020.01.23.917351 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Tang B, Wang X, Li Q, et al. Estimation of the Transmission Risk of 2019-nCov and Its Implication for Public Health Interventions. SSRN; 2020; doi: 10.2139/ssrn.3525558 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Zhao S, Ran J, Musa SS, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. bioRxiv 2020; doi: 10.1101/2020.01.23.916395 (https://www.biorxiv.org/content/10.1101/2020.01.23.916395v1 [v1]). [DOI] [PMC free article] [PubMed]

[R11] [11].Zhou T, Liu Q, Yang Z, et al. Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV. arXiv 2020. (https://arxiv.org/abs/2001.10530v1 [v1]). [DOI] [PMC free article] [PubMed]

[R12] [12].Li Q, Guan X, Wu P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. NEJM 2020; doi: 10.1056/NETMoa2001316 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Riou J, Althaus CL. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Euro Surveill 2020; 25(4): 10.2807/1560-7917.ES.2020.25.4.2000058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Zhao S, Lin Q, Ran J, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. Int J Infect Dis 2020; doi: 10.1016/j.ijid.2020.01.050 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Wu KT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 2020; doi: https://doi.org.10.1016/S0140-6736(20)30260-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Zhao S, Musa SS, Lin Q, et al. Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak. J Clin Med 2020; 9(2): 10.3390/jcm9020388 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Frequently Asked Questions. The New England Journal of Medicine, February 2020. (https://www.nejm.org/media-center/frequently-asked-questions).

[R18] [18].New Coronavirus (2019-nCoV). The World Health Organization, January 2020. (https://twitter.com/WHO/status/1221475167869833217) [Google Scholar]

[R19] [19].Pradhan P, Pandey AK, Mishra A, et al. Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag (WITHDRAWN). bioRxiv 2020; doi: 10.1101/2020.01.30.927871 [DOI]

[R20] [20].Quick retraction of a faulty coronavirus paper was a good moment for science. STAT, February 2020. (https://www.statnews.com/2020/02/03/retraction-faulty-coronavirus-paper-good-moment-for-science/). [Google Scholar]

PERMALINK

This is a preprint.

Early in the Epidemic: Impact of preprints on global discourse of 2019-nCoV transmissibility

Maimuna S Majumder, PhD, MPH

Kenneth D Mandl, MD, MPH

What We Did

Table 1. Search terms and specifications for data discovery by data source.

Table 2. Metadata collected for all R₀ estimates [6-16].

Figure 1. Search and news media interest in the basic reproduction number (R₀) associated with 2019–nCoV as a function of time.

Figure 2. Basic reproduction number (R₀) mean and range estimates from 11 different studies of 2019–nCoV as a function of time.

What We Found

What It Means

Acknowledgments

Footnotes

References.

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Early in the Epidemic: Impact of preprints on global discourse of 2019-nCoV transmissibility

Maimuna S Majumder, PhD, MPH

Kenneth D Mandl, MD, MPH

What We Did

Table 1. Search terms and specifications for data discovery by data source.

Table 2. Metadata collected for all R0 estimates [6-16].

Figure 1. Search and news media interest in the basic reproduction number (R0) associated with 2019–nCoV as a function of time.

Figure 2. Basic reproduction number (R0) mean and range estimates from 11 different studies of 2019–nCoV as a function of time.

What We Found

What It Means

Acknowledgments

Footnotes

References.

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. Metadata collected for all R₀ estimates [6-16].

Figure 1. Search and news media interest in the basic reproduction number (R₀) associated with 2019–nCoV as a function of time.

Figure 2. Basic reproduction number (R₀) mean and range estimates from 11 different studies of 2019–nCoV as a function of time.