Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jul 24;125(1):795–812. doi: 10.1007/s11192-020-03632-0

Characteristics of scientific articles on COVID-19 published during the initial 3 months of the pandemic

Nicola Di Girolamo 1,2,, Reint Meursinge Reynders 3,4
PMCID: PMC7380499  PMID: 32836530

Abstract

The COVID-19 pandemic has been characterized by an unprecedented amount of published scientific articles. The aim of this study is to assess the type of articles published during the first 3 months of the COVID-19 pandemic and to compare them with articles published during 2009 H1N1 swine influenza pandemic. Two operators independently extracted and assessed all articles on COVID-19 and on H1N1 swine influenza that had an abstract and were indexed in PubMed during the first 3 months of these pandemics. Of the 2482 articles retrieved on COVID-19, 1165 were included. Over half of them were secondary articles (590, 50.6%). Common primary articles were: human medical research (340, 59.1%), in silico studies (182, 31.7%) and in vitro studies (26, 4.5%). Of the human medical research, the vast majority were observational studies and cases series, followed by single case reports and one randomized controlled trial. Secondary articles were mainly reviews, viewpoints and editorials (373, 63.2%). Limitations were reported in 42 out of 1165 abstracts (3.6%), with 10 abstracts reporting actual methodological limitations. In a similar timeframe, there were 223 articles published on the H1N1 pandemic in 2009. During the COVID-19 pandemic there was a higher prevalence of reviews and guidance articles and a lower prevalence of in vitro and animal research studies compared with the H1N1 pandemic. In conclusions, compared to the H1N1 pandemic, the majority of early publications on COVID-19 does not provide new information, possibly diluting the original data published on this disease and consequently slowing down the development of a valid knowledge base on this disease. Also, only a negligible number of published articles reports limitations in the abstracts, hindering a rapid interpretation of their shortcomings. Researchers, peer reviewers, and editors should take action to flatten the curve of secondary articles.

Electronic supplementary material

The online version of this article (10.1007/s11192-020-03632-0) contains supplementary material, which is available to authorized users.

Keywords: Covid-19, Coronavirus, SARS-nCoV-2, Study design, Research quality, Healthcare policy

Introduction

The WHO was informed on December 31st 2019 that a number of patients were hospitalized for a pneumonia of unknown etiology in Wuhan City, China (WHO 2020). In the following week, molecular diagnostic techniques identified a novel coronavirus (SARS-CoV-2) as responsible of the pneumonia (WHO 2020). That was the first known outbreak of the disease that was lately renamed COVID-19. The SARS-CoV-2 has high transmissibility and an asymptomatic incubation period, during which transmission may occur (Huang et al. 2020; Rothe et al. 2020). Due to its characteristics, up to June 19th 2020, more than 200 countries have been affected by this disease (Centers for Disease Control and Prevention 2020), resulting in the most relevant pandemic in recent history.

Past coronavirus outbreaks have led to prolific publishing on these health issues (Kagan et al. 2020). Similar surges in publication numbers were seen with earlier outbreaks of viral diseases like SARS, MERS, Ebola, and Swine Flu, which then dropped drastically when these diseases were contained (Kagan et al. 2020). The production of a large bulk of literature in the early phases of such outbreaks can create a severe burden for policy makers who need to make rapid evidence-based decisions for controlling the pandemic. They have to scrutinize large quantities of scientific publications to assess what original research has been published on this topic and appraise the quality of this research. It is especially important to identify articles that report novel information to articles that summarize or comment on existing information, i.e. primary versus secondary articles.

In this research study we have replicated this process and report on the characteristics of articles published in the first trimester of the COVID-19 pandemic. Patients, health care professionals, policy makers, and the general public want to know what has been published on this health issue and what quality of research was available for decision making. Researchers, editors, peer reviewers, and publishing companies get an insight into the quantity and quality of articles that they contributed. The purpose of the present meta-epidemiological study is to identify the proportion of primary and secondary articles, to identify the proportion of studies that report limitations in their abstracts and to compare publishing patterns during COVID-19 and during the only other pandemic of the XXI century, the 2009 H1N1 swine influenza.

Methods

We performed a cross-sectional study of articles published during the initial period of the COVID-19 pandemic. We adopted the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement (von Elm et al. 2007) for reporting this study and included its checklist (Additional file 1). We implemented two changes compared with our original protocol. We did not assess whether studies originated as multi center research projects, because we realized that this information could not be extracted reliably from every article. To fulfill the request of one of the peer reviewers of this manuscript we included a new section: ‘Calculation of articles per population, per gross domestic product (GDP) and per declared COVID-19 cases’.

Eligibility criteria and search strategy

All articles retrieved on Medline through searching PubMed with the string “(COVID-19 OR COVID)” on April 2nd 2020 at 900 pm Central Standard Time, after application of the filter ‘Abstract’, were eligible for inclusion in the study. The full search strategy is given in Additional file 2A. Any type of article published on COVID-19 was eligible. This implies that a broad spectrum of articles ranging from letters to the editors to randomized controlled trials were eligible for inclusion. Articles were eligible if they included any terminology related to SARS-CoV-2 (including but not limited to: SARS-CoV-2, COVID, COVID-19, novel coronavirus 2019), in the title, abstract or full-text. For example, in vitro articles in which other viruses (e.g., MERS-CoV, SARS-CoV) were used as a proxy for SARS-CoV-2, were still eligible for inclusion in the study when the authors mentioned SARS-CoV-2 or synonyms in the manuscript. No eligibility criteria were applied to specific participants, interventions, comparators, outcomes, endpoints, language or settings of the articles. Articles that did not present an English abstract, as well as correspondence to previous research studies and errata were excluded.

Selection of articles and data extraction

We extracted from each article the following information: ‘title’, ‘abstract’, ‘DOI’’, ‘number of authors’, ‘journal’, ‘date of creation’, ‘first author’, ‘country of the first institution of the first author’, ‘article type’ (primary/secondary, defined below), ‘study design’ (defined below), ‘number of patients included’ (only for human medical research), ‘presence of objective in the abstract’, ‘presence of limitation in the abstract’, ‘main conclusion’ (Additional file 2B). Two operators (ND and RMR) conducted the selection of articles and data extraction procedures independently. These procedures were pilot tested on 40 articles to calibrate both operators and to fine-tune the data extraction forms. Disagreements during the selection of articles and data extraction procedures were resolved through discussions between both operators. Consultation with a third operator in the case of persisting disagreements was not necessary.

Classifications of the included articles

We used a multi-step approach in order to classify each article included in the study. The overarching final classification was whether an article was primary, i.e., adding original scientific information to the literature, or secondary. Primary articles refer to original research studies and secondary articles refer to perspectives and syntheses of the available knowledge on COVID-19 such as, viewpoints, commentaries, guidelines, reviews etc. (Table 1). Our classification of included articles was not exclusively based on the labels assigned to these articles, because study designs are often mislabeled by the authors themselves (Esene et al. 2014). We therefore first assessed the validity of such labeling by evaluating the study design in the full-text, before making our final classifications of a study. Primary articles were divided in five categories, i.e., human medical research, in silico, in vitro, animal research and human non-medical research, and then in subcategories (Table 1). Many published articles included multiple analytical steps and could therefore represent one or more of these categories. For example, in a study samples could be obtained from several patients—‘human medical research’—then transferred to a petri dish and cultured—‘in vitro research’—and the results of the growth could then be modelled using computer simulation—‘in silico research’. Since the purpose of the present meta-epidemiological study is to define the amount of information obtained that is actually relevant for healthcare policy makers and clinicians, the categorization of the articles was performed considering the theoretical order of evidence provided by different study settings, i.e., human medical research > animal research > in vitro research > in silico research. Therefore, if a study could fit in multiple categories, we assigned the highest category based on that order. In the example above, the study would have been categorized as ‘human medical research’. Similarly, a study including abundant in vitro (or in silico) research and a final part on an animal model would have been categorized as ‘animal research’.

Table 1.

Criteria employed to classify the included articles

Article type Study design Description
Primary articles Human medical research Human medical research refers to articles reporting information on 1 or more human patient/s. In order to be classified as human medical research, an article would need to report individual patient data. Articles in this category were further subcategorized in ‘randomized controlled trials (RCTs)’, ‘observational studies and case series’, and ‘case reports’, based on the following key: articles including a single case were categorized ‘case reports’; articles including 2 or more cases where no randomization were performed were categorized ‘observational studies and case series’; articles including 2 or more cases where randomization of a treatment was performed were categorized ‘RCTs’. We extracted the total number of patients included in human medical research studies
In silico research Primary articles were classified as ‘in silico research’ if they reported the results of any type of computer-based research. Articles in this category were further subcategorized in ‘epidemiological modelling’, ‘biology/biochemistry/bioinformatics studies’, and ‘social media studies’, based on the following key: articles focusing on exploiting platforms or other online tools, such as Google trend in order to extrapolate information or generate any sort of prediction were categorized ‘social media studies’; articles using published or original data to calculate the spreading or impact of COVID-19, including but not limited to epidemiological models and calculations, were classified as ‘epidemiological modelling’; articles using published or original data in order to generate original information solely using computer processing in the field of biology, biochemistry and bioinformatics were classified as ‘biology/biochemistry/bioinformatics studies’. If any part of the work performed by researchers was done without the computer the studies would have been included in the categories in vitro, human medical research or animal research
In vitro research Primary articles were classified as ‘in vitro research’ if they reported the results of any type of laboratory-based or in vitro research without inclusion of human or animal subjects. Articles in this category were further subcategorized in ‘development/performance of diagnostic technology’, ‘virus-host interaction’, ‘genomic studies’, ‘pharmacological activity in vitro’, ‘viral isolation/transport/elimination’ based on their primary objective and results
Animal research Research including animal subject/s refers to original clinical research on 1 or more animal subject/s. This category was not further subdivided and individual results of the included studies were reported
Human non-medical research Primary articles were classified as ‘human non-medical research’ if they reported the result of surveys or studies performed on healthcare professionals. Articles classified as a ‘survey’ were subdivided in ‘surveys to health professionals’ and ‘surveys to the general public’ depending on the population surveyed
Secondary articles Systematic review Secondary articles were classified as ‘systematic review’ if they reported the results of a systematic search, whether a narrative review or a meta-analysis was present
Review/viewpoint/editorial/letter/news Secondary articles were classified as ‘review/viewpoint/editorial/letter/news’ if they reported a narrative or graphical representation of previously performed research or previously published information. Since we found it difficult to unequivocally distinguish between articles in this category, the articles were not further subdivided
Guideline/guidance/recommendation Secondary articles were classified as ‘guideline/guidance/recommendation’ if they reported guidelines or recommendations either on the basis of personal experience, a review of the literature, or a combination of them. This category was further classified in ‘indications for specific department/disease/procedure’ if they reported clinical recommendations for a subset of health professionals and ‘indications for lay public’ if they reported recommendations/indications for the general public
Correspondence to previous research Secondary articles were classified as ‘correspondence to previous research’ if they reported any type of direct commentary to a recently published article
Erratum/correction Secondary articles were classified as ‘erratum/correction’ if they reported a mistake with or without a correction for a previously published article

Calculation of articles per population, per gross domestic product (GDP) and per declared COVID-19 cases

Based on the suggestions of one of the reviewers during the peer-review process of the article, we have extracted population, GDP and number of declared COVID-19 cases for the ten countries that have published most articles. The population per country for the year 2018 was extracted from The World Bank website which makes data publicly available. The year 2018 was the most recent year available. The data ‘population, total’ was extracted. Country GDP for the year 2019 was extracted from the dataset World Economic Outlook, online available in the International Monetary Fund website. Number of declared cases on March 2nd 2020 (1 month prior our data extraction) was extracted from the data published by the European Centre for Disease Prevention and Control (ECDC), publicly available online. The number of articles published per million inhabitants was calculated dividing the total number of articles published by each country for the country population and multiplying it by 1.000.000. The number of articles published per GDP unit was calculated dividing the total number of articles published by each country for the country GDP. The number of articles published per 100 declared cases was calculated dividing the total number of articles published by each country for the number of cases and multiplying it by 100.

Abstract assessment

We screened all abstracts to assess whether the objectives and the limitations of the article were reported or not. An abstract was defined as any type of information reported in the area for abstracts in PubMed. Objectives were defined as ‘reported’ when the abstract reported any type of statement that explained the purpose of the article. Limitations were defined as ‘reported’ when the abstract reported any type of statement that explained limitation(s) of the article.

Limitations were further subdivided in ‘methodological limitation’ and ‘general limitation’; articles were classified as reporting a ‘methodological limitation’ when they stated in the abstract the presence of at least 1 limitation inherent to the article design (e.g., “due to the inclusion of a convenient sample this report is at risk of selection bias”); articles were classified as reporting a ‘general limitation’ when they stated in the abstract the presence of a limitation that was not inherent to the article’s design (e.g., “more evidence is needed”, “further research on the topic is warranted”).

Selection, extraction & classification of articles on H1N1 2009 pandemic

We performed a search & extraction in an analogous way for articles published during the early phases of the H1N1 2009 pandemic. We performed a search on Medline through PubMed with the string “H1N1”. We applied the text availability filter “Abstract” and ordered the articles by date of publication. Our search strategy is reported in Additional file 2C. We extracted all the articles retrieved through the “Save” function on a.csv file. We established which was the first published article on the H1N1 2009 pandemic (Centers for Disease Control and Prevention 2009) based on a CDC summary (Centers for Disease Control and Prevention 2010). We then included three full months of publications, i.e. from April 25th 2009 to July 25th 2009. Similar to articles related to COVID-19, articles were eligible for inclusion if they reported terminology related to “H1N1”, “swine flu” or “the current pandemic”, among others. From the articles included, we extracted country of origin, language of full-text, type of study and study design were extracted in a similar fashion as was done for the COVID-19 articles. The selection, extraction and classification of articles was performed independently by two operators (ND and RMR) and disagreements were resolved by consensus.

Outcomes and prioritization

The primary outcomes of this meta-epidemiological study were:

  • The proportion of primary articles over the total number of articles with an abstract published during the first 3 months of the COVID-19 pandemic.

  • The proportion of articles reporting limitations in their abstracts.

  • The proportion of article types during COVID-19 and during the 2009 H1N1 swine flu pandemic.

The associations of any of these outcomes with other individual article characteristics were secondary outcomes.

Statistical analysis

Descriptive statistics are expressed as medians with interquartile ranges (IQR) and ranges or absolute counts and percentages. Multivariable logistic regression models were developed to explore the factors associated with the primary outcomes and provide odds ratio adjusted for confounders. Variables retained clinically significant were entered in the models regardless of their statistical significance. Goodness of fit was assessed with the Hosmer–Lemeshow test and Nagelkerke R squared was used as a measure of predictive power. The first multivariable logistic regression had primary vs secondary articles as the dependent variable and included the country of publication (limited to the 11 countries with more publications), the language of full-text (English/Other languages), the number of authors, and the number of days from the start of the pandemic as predictor variables. The initial model had a significant Hosmer–Lemeshow test (P = 0.004) and a low Nagelkerke R squared (0.27), due to non-linearities in the number of authors variable. The model was rebuilt after binning the variable (0 authors, 1–2 authors, 3–5 authors, 6–10 authors, > 11 authors). The new model had a non-significant Hosmer–Lemeshow test (P = 0.14) and higher Nagelkerke R squared (0.33) and was retained. A univariable logistic regression model was built including COVID-19 articles vs H1N1 articles as the dependent variable and including article type as predictor variable. A multiple linear regression model was built including number of articles per country as the dependent variable and country population, country GDP and country declared COVID-19 cases as predictor variables.

Data analyses and figures were performed using SPSS (version 24, IBM) and R 3.6.3 (R Core Team, 2020, www.R-project.org/). All P values were two tailed with nominal statistical significance claimed for P < 0.05.

Results

Results of the search

The results of our search are presented in a flow diagram (Fig. 1). Our search yielded 2482 articles. After exclusions of articles without an abstract, we retrieved 1215 articles. We excluded 50 articles based on the following rationale: duplicate articles (n = 13), articles that were not on COVID-19 (13), articles without an English abstract (6), letter to previous papers (8), erratum (2), local morbidity reports (7), and statement of the WHO (1). We included a total of 1165 articles on COVID-19 in the study.

Fig. 1.

Fig. 1

Modified PRISMA flow diagram showing the article inclusion process

Characteristics of published articles

Four countries contributed to three quarters (871, 74.8%) of the included articles, i.e., approximately half of these articles, (588, 50.5%) came from China, 168 articles (14.4%) from the United States, 77 articles (6.6%) from Italy, 38 articles (3.3%) from the United Kingdom. The remaining 294 articles (25.2%) originated in decreasing numbers in Japan, Singapore, Korea, India, France, Germany, Taiwan, and other countries (Table 2). When considering publications per million population, Singapore (4.43), Italy (1.27) and Taiwan (0.68) were the most prolific countries. When considering publications per GDP point, Italy (256.7), China (96.4) and United States (73.0) were the most prolific countries. When considering publication per 100 declared COVID-19 cases, India (700.0), United States (188.8), and United Kingdom (105.5) were the most prolific countries (Table 2).

Table 2.

Number of published articles per country, per country population, per country GDP and per country cases

Number of articles published Population (thousands, 2018) Number of published articles per million persons GDP (2019) Number of published articles per GDP Cases declared (March 2nd 2020) Number of published articles per 100 cases
Country
China 588 1,392,730.00 0.422 6.1 96.39 80,134.00 0.73
United States 168 326,687.50 0.514 2.3 73.04 89.00 188.76
Italy 77 60,421.76 1.274 0.3 256.67 1,689.00 4.56
United Kingdom 38 66,460.34 0.572 1.4 27.14 36.00 105.56
Japan 26 126,529.10 0.205 0.7 37.14 254.00 10.24
Singapore 25 5,638.68 4.434 0.7 35.71 106.00 23.58
Korea, Rep. 24 51,606.63 0.465 2 12 4,335.00b 0.55
India 21 1,352,617.33 0.016 4.2 5 3.00 700
Germany 18 82,905.78 0.217 0.6 30 129.00 13.95
France 18 66,977.11 0.269 1.3 13.85 130.00 13.85
Taiwan 16 23,588.93a 0.678 2.7 5.93 40.00 40
World 1165 7,592,886.80 0.153 2.9 401.72 88,416.00 1.32

Country population for the year 2018 was extracted from The World Bank website. Country GDP for the year 2019 was extracted from the dataset World Economic Outlook. Number of declared cases on March 2nd 2020 (1 month prior our data extraction) was extracted from the data published by the European Centre for Disease Prevention and Control (ECDC)

aUnavailable from The World Bank Data. Retrieved from Taiwan Government statistics website

bUnavailable from the European Centre for Disease Prevention and Control. Retrieved from Worldometer

Half of the included articles, (578, 49.6%) came from 49 individual journals (range of published articles per journal, 5–70). The full-text of 1000 of the 1165 articles (85.8%) was in English. Of the remaining articles, 152 full-texts (13.0%) were in Chinese, 6 (0.5%) in Spanish, 5 (0.4%) in German, and 2 (0.2%) in French. Articles included an average of 7.4 authors (SD: 6.98), ranging from 0 to 65 authors.

We identified 575 (49.4%) primary and 590 (50.6%) secondary articles. Of the primary articles, 340 were human medical research (59.1%), 182 were in silico studies (31.7%), 26 were in vitro studies (4.5%), 20 were human non-medical research (3.5%), and 7 were animal research (1.2%). Of the secondary articles, the majority were reviews, viewpoints and editorials (373, 63.2%). The second largest category was guidelines or guidance articles, including 193 articles (32.7%), of which 169 were indications for specific departments, patients or procedures. We included 23 systematic reviews (3.9%) and 1 protocol (0.2%).

Based on the multivariable logistic regression model, secondary articles were more likely to be published in a language different than English (aOR 3.02, 95% CI 1.99 to 4.58), to be published at a later stage of the pandemic (aOR 1.01, 95% CI 1.00 to 1.02), to include a lower number of authors (multiple aORs, Table 3), and to be published by authors from India, Italy, Singapore, Germany, and Taiwan (multiple aORs, reference: China; Table 3). Of the 20 journals that published more articles on COVID-19, there was a wide variation in the frequency of primary vs secondary articles (Fig. 2). Based on the multiple regression model [F(3,7) = 29.4, P < 0.001, R2 = 0.93], when adjusting for country population and GDP, the number of cases declared at the start of March significantly predicted the total number of articles published up to early April, with an increase of 6.7 articles (95% CI 4.1 to 9.4; P < 0.001) for each 1000 case increase.

Table 3.

Results of multivariable logistic regression analysis to determine factors associated with primary article publication in 1165 articles published in the early stages of COVID-19 pandemic

Primary
No. (%)
Secondary
No. (%)
AOR 95% CI P value
Days since Jan 1st 2020 76 ± 27 days 80 ± 22 days 1.012 1.004 1.020 0.004
Language of full text
English 513 (51.3%) 487 (48.7%) 3.02 1.99 4.58 < 0.001
Other than English 62 (37.6%) 103 (62.4%) Reference
Number of authors
None 5 (16.7%) 25 (83.3%) 15.44 5.16 46.17 < 0.001
1-2 39 (18.3%) 174 (81.7%) 13.12 8.02 21.46 < 0.001
3-5 119 (36.8%) 204 (63.2%) 5.08 3.45 7.50 < 0.001
6-10 228 (65.0% 123 (35.0% 1.58 1.08 2.32 0.018
>11 184 (74.2%) 64 (25.8%) Reference
Country
China 356 (60.5%) 232 (39.5%) Reference
United States 59 (35.1%) 109 (64.9% 2.05 1.34 3.14 0.001
Italy 20 (26.0%) 57 (74.0%) 5.68 3.16 10.19 < 0.001
United Kingdom 15 (39.5%) 23 (60.5% 1.77 0.82 3.80 0.145
Japan 22 (84.6%) 4 (15.4% 0.223 0.069 0.724 0.013
Singapore 9 (36.0%) 16 (64.0%) 4.17 1.70 10.27 0.02
Korea 17 (70.8%) 7 (29.2%) 0.402 0.142 1.138 0.09
India 4 (19.0%) 17 (81.0%) 7.17 2.23 23.04 0.001
Germany 5 (27.8%) 13 (72.2%) 3.73 1.17 11.88 0.001
France 8 (44.4%) 10 (55.6%) 2.12 0.73 6.16 0.17
Taiwan 6 (37.5% 10 (62.5%) 3.01 1.01 9.03 0.049
Others 54 (37.0% 92 (63.0%) 1.97 1.27 2.05 0.002

Continuous data are reported as Median ± IQR. Binary data are reported as number of observed events (percentage over the total). Hosmer–Lemeshow test: Chi square = 12.2; P = 0.14. Nagelkerke R squared: 0.33. aOR adjusted odds ratios, CI Confidence intervals

Fig. 2.

Fig. 2

Bar plot showing the percentage of primary articles (white boxes) and secondary articles (grey boxes) from the 20 journals that published more articles on COVID-19 in the first 3 months of the pandemic. Each bar represents all the articles published by each journal, with the number of articles showed in each box. The bar labelled as “Others” includes all remaining journals that had less than 10 publications each

Classification of primary articles

Human medical research consisted of 281 observational studies or case series (82.6%), 58 single case reports (17.1%), and 1 randomized controlled trial (0.3%). Human medical research included a median of 23 patients (IQR: 85), ranging from 1 to 72,314 patients. When only observational studies and case series were considered, the median number of patients included was 38 (IQR: 106). The only RCT included in the study enrolled 199 patients.

In silico research consisted of 109 studies on epidemiological modelling (59.9%), 64 studies on biochemistry, biology, bioinformatics or molecular modelling (35.2%), 5 studies evaluating or exploiting social media (2.7%), 3 studies on economical modelling (1.6%) and 1 description of an open database for viral trends (0.5%).

In vitro research consisted of 7 studies on the development or performance of diagnostic technology (26.9%), 7 studies on virus-host interactions (26.9%), 6 studies on gene expression or genomics (23.1%), 3 studies on pharmacological activity of compounds (11.5%), and 3 studies on viral isolation, transport or elimination (11.5%).

Animal research consisted of 4 studies that included mice (1 immunization with SARS-CoV S, 1 pharmacokinetic of a α-ketoamide inhibitor, 1 viral challenge with HCoV-OC43 and treatment with EK1C4, 1 hepatectomy and consequent gene expression)(57.1%), 1 study on hamsters challenged with SARS-CoV 2 (14.3%), 1 study on macaques challenged with MERS-CoV and treated with GS-5734 (14.3%), and 1 study on presence of SARS-CoV-2 related coronaviruses in Malayan pangolins (14.3%).

Human non-medical research consisted of 15 surveys, 8 on health professionals (40%), 7 on lay public (35.0%), 2 surveys of healthcare facilities (10.0%), 1 development of a psychological scale (5.0%), 1 RCT on medical professionals (5.0%), 1 simulation of an outbreak in a hospital (5.0%).

Reporting of limitations in the abstract

Limitations were reported in 42 out of 1165 abstracts (3.6%). Ten abstracts reported methodological limitations, i.e., limitations related to the study design and the remaining 32 abstracts reported general limitations, such as the current lack of evidence on COVID-19, or the need for further studies on COVID-19. Limitations were reported in 5 out of 23 systematic reviews (21.7%) and 2 out of 20 human non-medical researches (10.0%). All other manuscript types had a frequency of reporting limitations between 0% and 3.8%.

Comparison with early publications during the 2009 H1N1 swine influenza pandemic

Our search for articles on the 2009 H1N1 swine influenza during the first 3 months of that pandemic yielded 434 articles. After exclusions of articles without an abstract, we retrieved 239 articles. We excluded 16 articles that did not mention H1N1 or swine influenza in the full text. We included a total of 223 articles published at early stage of the 2009 H1N1 pandemic in the study. Eight countries contributed to three quarters (166, 74.4%) of the included articles, with approximately one-third of the articles coming from the United states (75, 33.6%) and one tenth of them coming from China (24, 10.8%). Almost all the articles (215, 96.4%) had an English full text. Based on our previous classification, there were 179 primary articles (80.3%) and 44 secondary articles (19.7%). The primary articles included 71 human medical researches (39.7%), 36 animal researches (20.1%), 33 in vitro studies (18.4%), 30 in silico studies (16.7%), and 9 human non-medical researches (5.0%). Of the human medical research, 66 were observational studies and case series (92.9%), 3 were RCTs (4.2%), and 2 were single case reports (2.8%). The secondary articles were mainly reviews, viewpoints and editorials (38, 86.4%), with a few guidelines or guidance articles (5, 11.4%) and 1 systematic review (2.3%) (Table 4).

Table 4.

Type of studies published in the early stages of COVID-19 pandemic and of 2009 H1N1 pandemic

COVID-19 pandemic
No. (%)
H1N1 2009 pandemic
No. (%)
OR 95% CI P value
Type of study
Human medical research 340 (29.2%) 71 (31.8%) Reference
Single case reports 58 (17.1%) 2 (2.8%)
Observational studies 281 (82.6%) 66 (93.0%)
Randomized controlled trials 1 (0.3%) 3 (4.2%)
Human non-medical research 20 (1.7%) 9 (4.0%) 0.46 0.20 1.06 0.07
In silico 182 (15.6%) 30 (13.5%) 1.27 0.80 2.01 0.32
In vitro 26 (2.2%) 33 (14.8%) 0.16 0.09 0.29 < 0.001
Animal research 7 (0.6%) 36 (16.1%) 0.04 0.02 0.09 < 0.001
Guidelines/guidance 193 (16.6%) 5 (2.2%) 8.06 3.20 20.30 < 0.001
Review/viewpoint/editorial/letter/news 373 (32.0%) 38 (17.0%) 2.05 1.35 3.12 0.001
Systematic review 23 (2.0%) 1 (0.4%) 4.80 0.64 36.15 0.13
Protocol 1 (0.1%) 0 (0%) NA NA NA NA

Results of univariable logistic regression analysis are reported. Binary data are reported as number of observed events (percentage over the total articles for each pandemic event; for human medical research subcategories, the percentage is calculated over the number of human medical researches). OR odds ratios, CI Confidence intervals

In the univariable logistic regression model, the odds of being published during COVID-19 were 8 times higher for guideline articles (OR 8.1, 95% CI 3.2 to 20.3), and 2 times higher for reviews (OR 2.0, 95% CI 1.3 to 3.1), while the odds of being published during H1N1 were 24 times higher for animal researches (OR 24.6, 10.5 to 57.6), and 6 times higher for in vitro research (OR 6.1, 3.4 to 10.8) (Table 4; Fig. 3).

Fig. 3.

Fig. 3

Relationship between days from the start of the pandemics, number of articles published and type of articles during COVID-19 pandemic (red circles) and 2009 H1N1 swine flu pandemic (green circles). On the x axis the days from the start of COVID-19 pandemic (top axis) and the days from the first publication for each pandemic (bottom axis) are reported. On the y axis the total number of articles published for each pandemic is reported. Circle size has been arbitrarily classified in order to show different levels of clinical evidence: at increasing circle size, increase the value of the article (circle size: 1: secondary articles and human non-medical research; 7: in silico research; 8: in vitro research; 9: animal research; 11: case reports; 14: observational studies and case series; 17: randomized controlled trials; 20: systematic reviews). The values were jittered over the y axis to reduce superimposition of data

Discussion

Principal findings of the study related to the COVID-19 pandemic

This meta-epidemiological study is novel in having assessed the characteristics of scientific articles published during the initial 3 months of the COVID-19 pandemic. Our study has five key findings. First, over half (50.6%) of all 1165 included articles were secondary articles. Perspectives and syntheses have an important role in scientific research, but one secondary article for each primary article could be redundant. Second, human medical research consisted of 29.2% (340/1165) of the included articles. This implies that a large body of articles are not relevant for health care policy makers. Identifying human medical research studies slows down the evidence-based decision making process, because a large bulk of literature has to be filtered out first. This selection process is particularly time consuming, because it can often not be done by reading titles and abstracts alone. Third, all except one (339/340) of the human medical research studies were observational studies or case reports. This implies that policy makers have to rely predominantly on studies that get a low-certainty (or quality) rating according to the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach (Schünemann et al. 2019). Fourth, only 3.6% (42/1165) of all included articles reported limitations in their abstracts. Reporting limitations is an important warning sign for end-users of research articles and is an obligatory item in the reporting of abstracts of systematic reviews (Beller et al. 2013). Fifth, about half of all included articles originated in China, i.e., 50.5% (588/1165). A high prevalence of articles from China was expected because the COVID-19 outbreak started in that country, but this statistic is disproportionate with the much higher COVID-19 infection and death rates in other countries. When evaluating the number of publications per inhabitant, per GDP and per COVID-19 cases, different countries were respectively more prolific. China was the second country with the highest number of articles per GDP unit, but was one of the countries with the lowest number of published articles per COVID-19 cases, with 0.73 articles published per 100 confirmed cases, and also its production was average when considering its number of inhabitants. Italy was the third overall most prolific country, the second in terms of articles per inhabitant, the first in terms of articles per GDP unit and the second to last in terms of articles per confirmed cases. The country that had most publications per number of cases, India, was also the country with the highest prevalence of secondary articles. Overall, we found an association between the number of cases declared in early March by a country and the number of articles published 1 month after, when adjusting for GDP and population. This association should be regarded carefully, considering that the analysis was performed as a deviation from the original protocol in light of a suggestion from a reviewer.

Comparison with the H1N1 pandemic

We observed several differences in the type of articles published during H1N1 and COVID-19 pandemics. The most obvious, is a striking difference in the proportion of secondary articles published during the two pandemics. Less than 20% of the articles were secondary in the early H1N1 pandemic, while during the COVID-19 pandemic over 50% of the articles were secondary. This difference was mostly related to the higher percentage of narrative reviews, editorials and guidelines. The amount of clinical reports was overall similar in the two pandemics, with a higher proportion of case reports during COVID-19 and a lower proportion of observational studies and randomized trials. Both in vitro and animal research were more prominent during the H1N1 pandemic. The larger proportion of animal research published during H1N1 could be related to the tight connection of the pandemic with farm animals, or with the increasing amount of regulations on laboratory animal research, such as the directive 2010/63/EU (2010).

Comparison with other studies

The exponential growth of publications identified in this paper during the first 3 months of the COVID-19 pandemic was also found in past viral outbreaks such as SARS, MERS, Ebola, and Swine Flu (Kagan et al. 2020). This high publication rate dropped dramatically upon containment of these diseases. Gori et al. (2020) identified a high proportion of secondary literature in the first 30 days of the COVID-19 pandemic. However, their findings cannot be directly compared with ours because they used different methods, had a much smaller sample size (234 papers versus 1165 in our sample), measured mostly different outcomes and at different time points (1 month versus 3 months in our sample).

Strengths and weaknesses

The strengths of this meta-epidemiological study are: (1) this is the first research study that assessed the characteristics of articles on COVID-19 listed in PubMed in the first 3 months since the outbreak of the coronavirus pandemic; (2) all study selection and data extraction procedures were conducted by two methodologists independently and all raw data were reported in additional files; (3) the manuscript was reported according to the STROBE checklist. The limitations of this study are: (1) having searched eligible articles exclusively in PubMed is a limitation, because this could have biased our outcomes (Lefebvre et al. 2008; Halladay et al. 2015). The total body of literature on COVID is expected to be larger; (2) lack of inclusion of articles without an abstract in English is another potential limitation, since it is possible that the proportion of primary/secondary articles was different when considering articles published in different languages without an English abstract; (3) lack of assessment of the discussion section for each article in order to retrieve limitations that were not reported in the abstracts.

Implications and future research

The exponential surge in scientific publishing was expected with the outbreak of a pandemic of an unknown virus. Finding mostly observational studies among the human medical research studies in this body of articles was also not surprising. However, having to filter out half of the literature, because it is not producing new research data is problematic, especially when almost 2500 new articles on COVID-19 were indexed in PubMed in the first 3 months of this pandemic. Researchers, peer reviewers, editors, and publishing companies are responsible for this large body of literature. They should aim at flattening the publication curve for example by tightening their acceptance criteria. This strategy could also help to improve the overall research quality (Sarewitz 2016). Labeling publications as ‘secondary article’ in the abstract could become an initial obligatory item for all publications that do not produce original research.

Undertaking future research studies on outbreaks of diseases should start with the consultation of a wide body of stakeholders to develop and prioritize research questions. Such research could explore (1) our statistics at later time points (2) quality assessments of the conduct and reporting of research studies on COVID-19 (3) factors that could be implemented to control the quantity and quality of publications (4) the impact of the development of a vaccine for COVID-19 on the publication curve and (5) how to rapidly synthesize literature in times of a pandemic. Further, high quality systematic reviews and guidelines for the prevention and management are necessary when COVID-19 is contained. This will be key to control new outbreaks of COVID-19 and other diseases.

Conclusions

We showed that as compared to the most recent pandemic (2009 H1N1), there is an overwhelming amount of information published on COVID-19. Due to the large body of non-original articles (about half) published in the early phases of the pandemic, the original information published has been diluted. This can slow down the development of a valid knowledge base on COVID-19 and the pertinent strategies to deal with this disease. Also, a negligible number of published articles reported limitations in the abstracts, potentially facilitating overemphasis of the article findings or recommendations. Researchers, peer reviewers, and editors should take action to flatten the publication curve and start labeling non-original research articles as secondary articles.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Authors contribution

NDG: Conceptualization, Methodology, Investigation, Formal analysis, Writing-Original Draft, Writing-Review & Editing. RMR: Conceptualization, Methodology, Investigation, Writing-Original Draft, Writing-Review & Editing.

Funding

No funding received.

Availability of data and material (data transparency)

The database including all data used will be available on Open Science Framework after a period of 6 months from publication during which the authors will still be working on other publications based on this database.

Compliance with ethical standards

Conflicts of interest

The authors declare no competing interests nor conflict of interests.

Code availability

No codes have been generated during this research.

Protocol Registration

Our protocol was registered in Open Science Framework: https://osf.io/eanzr.

Footnotes

The protocol was registered in Open Science Framework as: ‘Characteristics of scientific articles on COVID-19 published during the initial 3 months of the pandemic: protocol for a meta-epidemiological study’.

Contributor Information

Nicola Di Girolamo, Email: nicoladiggi@gmail.com.

Reint Meursinge Reynders, Email: reyndersmail@gmail.com.

References

  1. Beller EM, Glasziou PP, Altman DG, Hopewell S, Bastian H, Chalmers I, et al. PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Medicine. 2013;10(4):1001419. doi: 10.1371/journal.pmed.1001419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Centers for Disease Control and Prevention (CDC) Swine influenza A (H1N1) infection in two children–Southern California, March-April 2009. Morbidity and Mortality Weekly Report. 2009;58(15):400–402. [PubMed] [Google Scholar]
  3. Centers for Disease Control and Prevention (CDC). (2010). The 2009 H1N1 Pandemic: Summary Highlights, April 2009–April 2010. https://www.cdc.gov/h1n1flu/cdcresponse.htm. Accessed 11th April 2020.
  4. Centers for Disease Control and Prevention (CDC). (2020). World Map. [online]. https://www.cdc.gov/coronavirus/2019-ncov/global-covid-19/world-map.html. Accessed 19th June 2020.
  5. Directive 2010, 63, EU Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposes. Official Journal of the European Union. 2010;276:33–79. [Google Scholar]
  6. Esene IN, Ngu J, El Zoghby M, Solaroglu I, Sikod AM, Kotb A, et al. Case series and descriptive cohort studies in neurosurgery: the confusion and solution. Child’s Nervous System. 2014;30(8):1321–1332. doi: 10.1007/s00381-014-2460-1. [DOI] [PubMed] [Google Scholar]
  7. Gori, D., Boetto, E., & Fantini, M. P. (2020). The early scientific literature response to the novel Coronavirus outbreak: who published what?. [online]. https://www.medrxiv.org/content/10.1101/2020.03.25.20043315v1. Accessed 11th April 2020.
  8. Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. Journal of Clinical Epidemiology. 2015;68(9):1076–1084. doi: 10.1016/j.jclinepi.2014.12.017. [DOI] [PubMed] [Google Scholar]
  9. Huang L, Zhang X, Zhang X, Wei Z, Zhang L, Xu J, et al. Rapid asymptomatic transmission of COVID-19 during the incubation period demonstrating strong infectivity in a cluster of youngsters aged 16-23 years outside Wuhan and characteristics of young patients with COVID-19: a prospective contact-tracing study. Journal of Infection. 2020;80(6):e1–e13. doi: 10.1016/j.jinf.2020.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kagan, D., Moran-Gilad, J., & Fire, M. (2020). Scientometric trends for coronaviruses and other emerging viral infections. [online]. https://www.biorxiv.org/content/10.1101/2020.03.17.995795v2. Accessed 11th April 2020. [DOI] [PMC free article] [PubMed]
  11. Lefebvre C, Eisinga A, McDonald S, Paul N. Enhancing access to reports of randomized trials published world-wide–the contribution of EMBASE records to the Cochrane Central Register of Controlled Trials (CENTRAL) in The Cochrane Library. Emerging Themes in Epidemiology. 2008;5(1):13. doi: 10.1186/1742-7622-5-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Rothe C, Schunk M, Sothmann P, Bretzel G, Froeschl G, Wallrauch C, et al. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. New England Journal of Medicine. 2020;382(10):970–971. doi: 10.1056/NEJMc2001468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Sarewitz D. The pressure to publish pushes down quality. Nature. 2016;533(7602):147. doi: 10.1038/533147a. [DOI] [PubMed] [Google Scholar]
  14. Schünemann HJ, Higgins JP, Vist GE, Glasziou P, Akl EA, Skoetz N, et al. Completing ‘Summary of findings’ tables and grading the certainty of the evidence. Cochrane Handbook for Systematic Reviews of Interventions. 2019;2019:375–402. doi: 10.1002/9781119536604.ch14. [DOI] [Google Scholar]
  15. Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Annals of Internal Medicine. 2007;147(8):573–577. doi: 10.7326/0003-4819-147-8-200710160-00010. [DOI] [PubMed] [Google Scholar]
  16. WHO. (2020). WHO Timeline-COVID-19. [online]. https://www.who.int/news-room/detail/27-04-2020-who-timeline—covid-19. Accessed 19th June 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The database including all data used will be available on Open Science Framework after a period of 6 months from publication during which the authors will still be working on other publications based on this database.


Articles from Scientometrics are provided here courtesy of Nature Publishing Group

RESOURCES