Abstract
This study describes views, downloads, Altmetric scores, and citations of articles published as preprints and differences in Altmetric scores and citations of published articles by prior preprint status.
Preprints are versions of articles that are made publicly available prior to peer-reviewed publication, are widely used in physical sciences, and are now emerging in life sciences. Preprints provide immediate access to new information; however, articles not formally peer reviewed may contain errors in methods, results, or interpretation.
As preprints in medicine are debated, data on how preprints are used, cited, and published are needed. We evaluated views and downloads and Altmetric scores and citations of preprints and their publications. We also assessed whether Altmetric scores and citations of published articles correlated with prior preprint posting.
Methods
We downloaded all information from the preprint repository bioRxiv on all preprints posted between November 7, 2013, and January 17, 2017 (including publication status), and all data from Altmetric and CrossRef. Altmetric records mentions of articles in the media and creates an “attention score,” with a score of more than 20 corresponding to articles in the top 5%. CrossRef records citation counts. We similarly downloaded all data from PubMed, Altmetric, and CrossRef on September 16, 2017, for all article publications of preprints. The probability of publication was analyzed with Kaplan-Meier estimates. Published articles were compared with their preprints using the sign test.
We also randomly selected 30% of the articles with a preprint on bioRxiv; 30% was chosen to balance power vs computational processing time. PubMed was then searched for up to 5 research articles without a preprint matched for being published in the same journal during the same period as each of the selected articles. Matched articles were compared for Altmetric scores and CrossRef citations using the Friedman test. Statistical significance for 2-tailed P values was claimed at P < .005 based on prior recommendations. Data search, extraction, and analyses used R version 3.4.1 (R Foundation for Statistical Computing).
Results
Of 7760 preprints, 7750 were unique. Preprint availability on bioRxiv increased over time (from a median of 54/month in 2013 to median of 392/month in 2016; Table). The bioRxiv-defined disciplines with the most preprints were bioinformatics (15.8%), evolutionary biology (13.7%), neuroscience (12.6%), and genomics (11.8%); only 3 preprints were labeled as clinical trials.
Table. Descriptive Characteristics of Preprints in bioRxiv.
Characteristic | All Years | 2013 | 2014 | 2015 | 2016 |
---|---|---|---|---|---|
Posting rate, median/mo | 127 | 54 | 76 | 147 | 392 |
Publication type, No. (%)a | |||||
New results | 7424 (96) | 99 (92) | 850 (96) | 1679 (95) | 4529 (96) |
Confirmatory | 221 (3) | 1 (1) | 30 (3) | 59 (3) | 119 (3) |
Contradictory | 105 (1) | 8 (7) | 6 (1) | 28 (2) | 60 (1) |
Discipline, No. (%)a | |||||
Bioinformatics | 1223 (15.8) | 14 (13) | 160 (18) | 312 (18) | 708 (15) |
Evolutionary biology | 1064 (13.7) | 17 (16) | 177 (20) | 325 (18) | 526 (11) |
Neuroscience | 974 (12.6) | 9 (8) | 62 (7) | 146 (8) | 701 (15) |
Genomics | 913 (11.8) | 11 (10) | 130 (15) | 235 (13) | 506 (11) |
Genetics | 651 (8.4) | 9 (8) | 71 (8) | 187 (11) | 363 (8) |
Ecology | 404 (5.2) | 9 (8) | 57 (6) | 94 (5) | 232 (5) |
Microbiology | 354 (4.6) | 3 (3) | 14 (2) | 60 (3) | 259 (6) |
Other | 2162 (27.9) | 36 (33) | 215 (24) | 407 (23) | 1413 (30) |
Versions, median (IQR)b | 1 (1-2) | 1 (1-2) | 1 (1-2) | 1 (1-2) | 1 (1-2) |
Publications, No. (%) | 2628 (34) | 63 (58) | 525 (59) | 963 (55) | 1077 (23) |
Abstract Views | |||||
First month, median (IQR) | 402 (240-675) | 45 (14-191) | 227 (127-407) | 328 (204-568) | 472 (301-770) |
Overall, median (IQR) | 924 (618-1499) | 1467 (952-2146) | 1213 (837-2075) | 1064 (731-1743) | 844 (573-1348) |
PDF Downloads | |||||
First month, median (IQR) | 74 (39-142) | 52 (32-90) | 54 (27-108) | 64 (34-131) | 83 (45-153) |
Overall, median (IQR) | 321 (173-596) | 703 (509-1331) | 610 (412-1046) | 450 (285-782) | 248 (141-441) |
Altmetric Scoresc,d | |||||
Preprints, median (IQR) | 7 (3-16) | 3 (0.4-13) | 7 (2-16) | 9 (3-17) | 7 (3-15) |
Published articles, median (IQR) | 8 (3-22) | 0 (0-0)e | 6 (2-22) | 9 (3-22) | 8 (3-22) |
CrossRef Citationsd,f | |||||
Preprints, median (IQR) | 0 (0-0) | 0 (0-0) | 0 (0-0) | 0 (0-0) | 0 (0-0) |
Published articles, median (IQR) | 5 (2-12) | 6 (6-6)e | 17 (7.5-33.5) | 9 (3-17) | 4 (1-8) |
Abbreviation: IQR, interquartile range.
Publication type was categorized and determined by bioRxiv. Similarly, discipline is determined by bioRxiv using criteria undisclosed on its website.
Versions refers to how many versions of a preprint have been available on bioRxiv so far.
Altmetric score refers to the score for media attention given to each preprint by Altmetric.
Numbers in parentheses refer to the first and third quartiles.
The 2013 Altmetric scores and CrossRef citation counts do not have a range for published articles because our database only had 1 published preprint in 2013.
Citations refer to citation count as reported in CrossRef.
The median number of preprint abstracts views was 924 (range, 6-192 570) and the median number of preprint PDF downloads was 321 (range, 2-151 520). The median Altmetric score was 7.3 (range, 0-2506) and the median CrossRef citation count was 0 (range, 0-55); 18.2% (1414/7750) of preprints achieved an Altmetric score of more than 20. Of 7750 preprints, 2628 articles (34%) were published in a peer-reviewed journal. The probability of publication in the peer-reviewed literature was 48% within 12 months and 55.5% within 24 months.
The median Altmetric score of the published articles was 8.8, and of their respective preprint, 8.4 (median pairwise difference, −0.3; interquartile range [IQR], −5.4 to 7.7; P = .17). The median number of citations for published articles was 5, and of their respective preprint, 0 (median pairwise difference, 5 [IQR, 2 to 12]; P < .001).
The sample of 776 published articles with preprints was matched to 3647 published articles without preprints. Published articles with preprints had significantly higher Altmetric scores than published articles without preprints (median, 9.5 [IQR, 3.1 to 35.3] vs 3.5 [IQR, 0.8 to 12.2], respectively; between-group difference, 4 [IQR, 0 to 15]; P < .001) and received more citations (median, 4 [IQR, 1 to 10] vs 3 [IQR, 1 to 7]; between-group difference, 1 [IQR, −1 to 5]; P < .001).
Discussion
The number of preprints posted on bioRxiv rapidly increased between 2013 and 2016, much more than the increase in MEDLINE-indexed publications during the same period (1.2%). Although preprints were not well cited, 18% had Altmetric scores in the top 5% and 48% were estimated to reach peer-reviewed publication within 1 year. Articles with a preprint received higher Altmetric scores and more citations than articles without a preprint. These results add to a previous report from bioRxiv by also quantifying social media attention and citations received by preprints and published articles and comparing articles with and without preprints.
This analysis has limitations. First, it was limited to a few years during which preprint posting has rapidly evolved; patterns may change over time. Second, only a short time was available for preprints to be published; the rate of publication is therefore an underestimate. Third, the association between Altmetric scores and citations in articles with and without preprints may not be causal because differences between authors choosing to post or not to post a preprint were not considered.
Section Editor: Jody W. Zylke, MD, Deputy Editor.
References
- 1.Butler D. Biologists join physics preprint club. Nature. 2003;425(6958):548. [DOI] [PubMed] [Google Scholar]
- 2.Bourne PE, Polka JK, Vale RD, Kiley R. Ten simple rules to consider regarding preprint submission. PLoS Comput Biol. 2017;13(5):e1005473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Annesley T, Scott M, Bastian H, et al. . Biomedical journals and preprint services: friends or foes? Clin Chem. 2017;63(2):453-458. [DOI] [PubMed] [Google Scholar]
- 4.Benjamin DJ, Berger JO, Johannesson M, et al. . Redefine statistical significance [published online September 1, 2017]. Nat Hum Behav. doi: 10.1038/s41562-017-0189-z [DOI] [PubMed] [Google Scholar]
- 5.US National Library of Medicine Yearly citation totals from 2017 MEDLINE/PubMed baseline: 26,759,399 citations found. https://www.nlm.nih.gov/bsd/licensee/2017_stats/2017_Totals.html. Accessed November 29, 2017.
- 6.Inglis JR, Sever R bioRxiv: a progress report. http://asapbio.org/biorxiv. Accessed September 18, 2017.