Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2025 Jan 30;23(1):e3002999. doi: 10.1371/journal.pbio.3002999

Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors

John P A Ioannidis 1,2,3,4,5,*, Angelo Maria Pezzullo 5,6, Antonio Cristiano 5,6, Stefania Boccia 6, Jeroen Baas 7
Editor: Anita Bandrowski8
PMCID: PMC11781634  PMID: 39883670

Abstract

Retractions are becoming increasingly common but still account for a small minority of published papers. It would be useful to generate databases where the presence of retractions can be linked to impact metrics of each scientist. We have thus incorporated retraction data in an updated Scopus-based database of highly cited scientists (top 2% in each scientific subfield according to a composite citation indicator). Using data from the Retraction Watch database (RWDB), retraction records were linked to Scopus citation data. Of 55,237 items in RWDB as of August 15, 2024, we excluded non-retractions, retractions clearly not due to any author error, retractions where the paper had been republished, and items not linkable to Scopus records. Eventually, 39,468 eligible retractions were linked to Scopus. Among 217,097 top-cited scientists in career-long impact and 223,152 in single recent year (2023) impact, 7,083 (3.3%) and 8,747 (4.0%), respectively, had at least 1 retraction. Scientists with retracted publications had younger publication age, higher self-citation rates, and larger publication volume than those without any retracted publications. Retractions were more common in the life sciences and rare or nonexistent in several other disciplines. In several developing countries, very high proportions of top-cited scientists had retractions (highest in Senegal (66.7%), Ecuador (28.6%), and Pakistan (27.8%) in career-long citation impact lists). Variability in retraction rates across fields and countries suggests differences in research practices, scrutiny, and ease of retraction. Addition of retraction data enhances the granularity of top-cited scientists’ profiles, aiding in responsible research evaluation. However, caution is needed when interpreting retractions, as they do not always signify misconduct; further analysis on a case-by-case basis is essential. The database should hopefully provide a resource for meta-research and deeper insights into scientific practices.


Retractions are becoming increasingly common but still account for a small minority of published papers. Using a database that links retractions to the top 2% most highly-cited scientists across science, this study reveals that retraction rates varied by scientific discipline and scientists with retracted publications had younger publication age, higher self-citation rates, and larger publication volumes.

Introduction

Retractions of publications are a central challenge for science and their features require careful study [13]. In empirical surveys, various types of misconduct are typically responsible for most retractions [4]. The landscape of retractions is becoming more complex with the advent of papermills, massive production of papers that are typically fake/fabricated and where people may buy authorship in their masthead [5]. However, the reasons for retractions are not fully standardized, and many retractions are unclear about why a paper had to be withdrawn. Moreover, some retractions are clearly not due to ethical violations or author errors (e.g., they are due to publisher errors). Finally, in many cases, one may view a retraction as a sign of a responsible author who should be congratulated, rather than chastised, for taking proactive steps to correct the literature. Prompt correction of honest errors, major or minor, is a sign of responsible research practices.

The number of retracted papers per year is increasing, with more than 10,000 papers retracted in 2023 [6]. The countries with the highest retraction rates (per 10,000 papers) are Saudi Arabia (30.6), Pakistan (28.1), Russia (24.9), China (23.5), Egypt (18.8), Malaysia (17.2), Iran (16.7), and India (15.2) [6]. However, retractions abound also in highly developed countries [7]. There has also been a gradual change in the reasons for retractions over time [8]: the classic, traditional types of research misconduct (falsification, fabrication, plagiarism, and duplication) that involved usually one or a few papers at a time have been replaced in the top reasons by large-scale, orchestrated fraudulent practices (papermills, fake peer-review, artificial intelligence generated content). Clinical and life sciences account for about half of the retractions that are apparently due to misconduct [9], but electrical engineering/electronics/computer science (EEECS) have an even higher proportion of retractions per 10,000 published papers [9]. Clinical and life sciences disciplines have the highest rates of retractions due to traditional reasons of misconduct, while EEECS disciplines have a preponderance of large-scale orchestrated fraudulent practices.

Here, we aimed to analyze the presence of any retracted papers for all the top-cited scientists across all 174 subfields of science. Typical impact metrics for scientists revolve around publications and their citations. However, citation metrics need to be used with caution [10] to avoid obtaining over-simplified and even grossly misleading views of scientific excellence and impact. We therefore updated and extended databases of standardized citation metrics across all scientists and scientific disciplines [1114] to include information on retractions for each scientist. Systematic indicators of research quality and integrity are important to examine side-by-side with traditional citation impact data [15,16]. A widely visible list of highly cited scientists issued annually by Clarivate based on Web of Science no longer includes any scientists with retracted publications [17]. In our databases, which cover a much larger number of scientists with more detailed data on each, we have added information on the number of retracted publications, if any, for all listed scientists. Given the variability of the reasons behind retraction, this information can then be interpreted by any assessors on a case-by-case basis with in-depth assessment of reasons, and circumstances of each retraction.

Using our expanded databases, we aimed to answer the following questions: How commonly have top-cited scientists retracted papers? Are there any features that differentiate top-cited scientists with versus without retracted papers? Are specific scientific fields and subfields more likely to have top-cited scientists with retracted papers? Do some countries have higher rates of retractions among their top-cited scientists? Finally, how much do citations to and from retracted papers contribute to the overall citation profile of top-cited scientists? As we present these analyses, we also hope that this new resource will be useful for further meta-research studies that may be conducted by investigators on diverse samples of scientists and scientific fields.

Methods and results

To add the new information on retractions, we depended on the most reliable database of retractions available to date, the Retraction Watch database (RWDB, RRID:SCR_000654) which is also publicly freely available through CrossRef (RRID:SCR_003217). Among the 55,237 RWDB entries obtained from CrossRef (https://api.labs.crossref.org/data/retractionwatch) on August 15, 2024, we focused on the 50,457 entries where the nature of the notice is classified as “Retraction”, excluding other types (corrections, expressions of concern) that may also be covered in RWDB. From this set, we excluded entries where the paper had been retracted but then replaced by a new version (which can suggest that the errors were manageable to address and there is a new version representing the work in the published literature), and those entries where the retraction was clearly solely not due to any error or wrong-doing by the authors (e.g., publisher error). Therefore, we excluded entries where the reason for retraction was listed as “Retract and Replace,” “Error by Journal/Publisher,” “Duplicate Publication through Error by Journal/Publisher,” or “Withdrawn (out of date)”; however, for the latter 3 categories, these exclusions were only applied if there were no additional reasons listed that could be attributed potentially to the authors exclusively or in part, as detailed in S1 Table. This first filtering was automated and resulted in a set of 47,964 entries.

We tagged articles as retracted by linking retraction records to their corresponding entries in Scopus (RRID:SCR_022559). Initially, this linking is achieved by matching the OriginalPaperDOI with a DOI in Scopus. For retracted articles that do not have a direct DOI match, we employ an alternative strategy using the title and publication year, allowing for a 1-year discrepancy due to variations in the recorded publication year. To enhance the accuracy of our linking process, we perform data sanitization on both databases. DOIs are standardized by removing redundant prefixes and extraneous characters. Titles are normalized by stripping all non-alphanumeric characters and converting them to lowercase. Additionally, to avoid erroneous matches, especially with shorter titles, we impose a minimum length requirement of 32 characters for title matching. The code that demonstrates the linking strategy is published along the data set at https://elsevier.digitalcommonsdata.com/datasets/btchxktzyw/7.

Linking the retraction using the digital object identifier (DOI) of the original paper resulted in 38,364 matches. For entries where a DOI match was not possible, and where we attempted to link records using a combination of the title and year derived from the date of the original article, allowing for a +/− 1-year variation, resulted in 1,104 additional matches. This linkage process eventually resulted in a total of 39,468 matched records (Fig 1).

Fig 1. Flow diagram for linkage of retractions.

Fig 1

Calculation of the composite citation indicator and ranking of the scientists accordingly within their primary subfield (using the Science-Metrix classification of 20 fields and 174 subfields) were performed in the current iteration with the exact same methods as in previous iterations (described in detail in references [1113]). Career-long impact counts citations received cumulatively across all years to papers published at any time, while single most recent year impact counts only citations received in 2023 to papers published at any time.

The new updated release of the databases includes 217,097 scientists who are among the top 2% of their primary scientific subfield in the career-long citation impact and 223,152 scientists who are among the top 2% in their single most recent year (2023) citation impact. These numbers also include some scientists (2,789 and 6,325 scientists in the 2 data sets, respectively) who may not be in the top 2% of their primary scientific subfield but are among the 100,000 top-cited across all scientific subfields combined. Among the top-cited scientists, 7,083 (3.3%) and 8,747 (4.0%), respectively, in the 2 datasets have at least 1 retracted publication, and 1,710 (0.8%) and 2,150 (1.0%), respectively, have 2 or more retracted publications. As shown in Fig 2, the distribution of the number of linked eligible retractions per author follows a power law.

Fig 2. Distribution of the number of retractions in top-cited scientists with at least 1 retraction.

Fig 2

(A) Database of top-cited authors based on career-long impact. (B) Database of top-cited authors based on single recent year (2023) impact. The data underlying this figure can be found in S1 Data.

Table 1 shows the characteristics of those top-cited scientists who have any retracted publications versus those who have not had any retractions. As shown, top-cited scientists with retracted publications tend to have younger publication ages, higher proportion of self-citations, higher ratio of h/hm index (indicating higher co-authorship levels), slightly better ranking, and higher total number of publications (p < 0.001 by Mann–Whitney U test (in R version 4.4.0 (RRID: SCR_001905)) for all indicators in the career-long impact data set and the single recent year data set, except for the publication age and the absolute ranking in the subfield in the single recent year data set. However, except for the number of papers published, the differences are small or modest in absolute magnitude. The proportion of scientists with retractions is highest though at the extreme top of ranking. Among the top 1,000 scientists with the highest composite indicator values, the proportion of those with at least 1 retraction are 13.8% and 11.1%, in the career-long and single recent year impact, respectively.

Table 1. Top-cited scientists with and without retracted publications characteristics and Mann–Whitney U test.

Career-long impact Single recent year (2023) impact
Retracted Others Retracted Others
N = 7,083 N = 210,014 p-value N = 8,747 N = 214,405 p-value
Publication start, median (IQR) 1989 (1981–1997) 1987 (1977–1996) <0.00001 1997 (1987–2005) 1997 (1987–2006) 0.2
Self-citations (%), median (IQR) 12.9 (9.6–17.6) 11.7 (7.5–16.9) <0.00001 9.1 (5.6–14) 8.8 (4.8–14.2) <0.00001
h-index/hm-index ratio*, median (IQR) 2.4 (2–2.8) 2.1 (1.7–2.6) <0.00001 2.1 (1.8–2.6) 2 (1.7–2.5) <0.00001
Ranking in subfield, median (IQR) 973 (342–2,128.5) 1,011 (381–2,150) 0.0007 1,029 (367–2,274) 1,025 (388–2,170) 0.69
Percentile ranking in subfield, median (IQR) 0.008 (0.003–0.014) 0.011 (0.005–0.016) <0.00001 0.009 (0.003–0.015) 0.011 (0.006–0.016) <0.00001
Number of total published items, median (IQR) 270 (170–426) 160 (100–253) <0.00001 228 (135–377) 139 (79–234) <0.00001

* Data on h-index/hm-index are including self-citations. The Schreiber hm index is constructed in the same way as the Hirsch h-index but considers also co-authorship. The more extensive the co-authorship, the more the hm index will deviate from (and become smaller than) the h-index.

Table 2 shows the proportion of top-cited scientists with retracted publications across the 20 major fields that science is divided according to the Science-Metrix classification; information on the more detailed 174 subfields appears in S2 Table. The proportion of retractions varies widely across major fields, ranging from 0% to 5.5%. Clinical Medicine and Biomedical Research have the highest rates (4.8% to 5.5%). Enabling & Strategic Technologies, Chemistry and Biology have rates close to the average of all sciences combined. All other fields have from low to very low (or even zero) rates of scientists with retractions. When the 174 Science-Metrix subfields of science were considered, the highest proportions of top-cited scientists with at least 1 retracted paper were seen in the subfields of Complementary & Alternative Medicine, Oncology & Carcinogenesis, and Pharmacology & Pharmacy (with 10.5%, 9.9%, and 9.4%, respectively of top-cited scientists based on single recent year impact). See details on all 174 subfields in S2 Table.

Table 2. Top-cited scientists with and without/ retracted publications according to their main field.

Main field Career-long impact Single recent year impact
Retracted Others Retracted Others
N = 7,083 N = 210,014 N = 8,747 N = 214,405
Agriculture, Fisheries & Forestry 99 (1.4%) 7,166 (98.6%) 172 (2.3%) 7,203 (97.7%)
Biology 222 (2.6%) 8,434 (97.4%) 300 (3.5%) 8,363 (96.5%)
Biomedical Research 846 (5.0%) 16,052 (95.0%) 847 (5.1%) 15,843 (94.9%)
Built Environment & Design 34 (2.7%) 1,209 (97.3%) 40 (3.1%) 1,263 (96.9%)
Chemistry 462 (3.1%) 14,449 (96.9%) 624 (4.1%) 14,565 (95.9%)
Clinical Medicine 3,249 (4.8%) 64,590 (95.2%) 3,769 (5.5%) 64,574 (94.5%)
Communication & Textual Studies 2 (0.2%) 1,072 (99.8%) 4 (0.3%) 1,193 (99.7%)
Earth & Environmental Sciences 157 (2.1%) 7,231 (97.9%) 216 (2.8%) 7,526 (97.2%)
Economics & Business 59 (1.4%) 4,078 (98.6%) 111 (1.8%) 6,171 (98.2%)
Enabling & Strategic Technologies 654 (3.6%) 17,663 (96.4%) 906 (4.4%) 19,790 (95.6%)
Engineering 432 (2.5%) 16,686 (97.5%) 565 (3.3%) 16,631 (96.7%)
Historical Studies 0 (0.0%) 1,081 (100.0%) 1 (0.1%) 1,073 (99.9%)
Information & Communication Technologies 275 (1.8%) 14,812 (98.2%) 475 (3.1%) 14,700 (96.9%)
Mathematics & Statistics 48 (1.8%) 2,645 (98.2%) 80 (2.9%) 2,639 (97.1%)
Philosophy & Theology 0 (0.0%) 523 (100.0%) 2 (0.4%) 524 (99.6%)
Physics & Astronomy 361 (1.8%) 19,619 (98.2%) 431 (2.3%) 18,576 (97.7%)
Psychology & Cognitive Sciences 98 (2.5%) 3,773 (97.5%) 108 (2.6%) 4,036 (97.4%)
Public Health & Health Services 65 (1.7%) 3,776 (98.3%) 66 (1.7%) 3,803 (98.3%)
Social Sciences 20 (0.4%) 5,043 (99.6%) 30 (0.5%) 5,818 (99.5%)
Visual & Performing Arts 0 (0.0%) 112 (100.0%) 0 (0.0%) 114 (100.0%)

Retraction rates among top-cited scientists also vary in the 20 countries that host most of the top-cited authors (Table 3), with higher rates observed in India (9.2% career-long to 8.6% single recent year impact), China (8.2% to 6.7%), and Taiwan (5.2% to 5.7%), and lower rates observed in Israel (1.7% to 2.0%), Belgium (2.1% to 2.1%), and Finland (2.2% to 2.2%). Some countries with few top-cited authors (not among the 20 shown in Table 2) have impressive rates of scientists with retractions: Countries that exceed 10% either in career-long or in single recent year top-cited scientists are listed in S3 Table. The highest proportions of top-cited scientists with retractions were seen in Senegal (66.7%), Ecuador (28.6%), and Pakistan (27.8%) based on the career-long impact list and in Kyrgyzstan (50%), Senegal (41.7%), Ecuador (28%), and Belarus (26.7%) in the single recent year impact list. Nevertheless, the total number of top-cited authors for Senegal, Ecuador, Kyrgyzstan, and Belarus is very small, so percentages should be seen with caution.

Table 3. Top-cited scientists with and without retracted publications according to country.

Country Career-long impact Single recent year impact
Retracted Others Retracted Others
United States 2,332 (2.8%) 81,870 (97.2%) 2,186 (3.1%) 69,206 (96.9%)
United Kingdom 430 (2.2%) 19,218 (97.8%) 428 (2.4%) 17,127 (97.6%)
Germany 336 (2.9%) 11,236 (97.1%) 309 (3.0%) 10,111 (97.0%)
China 877 (8.2%) 9,810 (91.8%) 1,813 (6.7%) 25,352 (93.3%)
Canada 241 (2.6%) 9,024 (97.4%) 223 (2.7%) 7,962 (97.3%)
Japan 362 (4.4%) 7,899 (95.6%) 254 (4.5%) 5,354 (95.5%)
Australia 178 (2.4%) 7,270 (97.6%) 201 (2.5%) 7,833 (97.5%)
France 151 (2.2%) 6,770 (97.8%) 152 (2.6%) 5,630 (97.4%)
Italy 254 (4.1%) 6,017 (95.9%) 300 (3.9%) 7,318 (96.1%)
Netherlands 123 (2.7%) 4,392 (97.3%) 116 (2.6%) 4,419 (97.4%)
Spain 103 (2.9%) 3,405 (97.1%) 127 (3.2%) 3,880 (96.8%)
Switzerland 84 (2.4%) 3,347 (97.6%) 82 (2.4%) 3,323 (97.6%)
Sweden 84 (2.5%) 3,269 (97.5%) 78 (3.0%) 2,566 (97.0%)
India 270 (9.2%) 2,669 (90.8%) 462 (8.6%) 4,889 (91.4%)
South Korea 120 (5.1%) 2,246 (94.9%) 186 (5.3%) 3,313 (94.7%)
Denmark 46 (2.2%) 2,068 (97.8%) 53 (2.6%) 1,960 (97.4%)
Israel 36 (1.7%) 2,057 (98.3%) 32 (2.0%) 1,590 (98.0%)
Belgium 41 (2.1%) 1,956 (97.9%) 42 (2.1%) 1,965 (97.9%)
Taiwan 91 (5.2%) 1,668 (94.8%) 80 (5.7%) 1,327 (94.3%)
Finland 32 (2.2%) 1,413 (97.8%) 26 (2.2%) 1,153 (97.8%)

The new iteration of the 2 top-cited scientists’ data sets also includes information on the number of citations received (overall and in the single recent year, respectively) by the retracted papers of each scientist. If we consider scientists with at least 1 retraction, the range is 0 to 7,491, with median (IQR) of 25 (6 to 80) in the career-long data set. The range is 0 to 832 with median (IQR) of 1 (0 to 4) in the single recent year data set. A total of 114 scientists in the career-long data set have received more than 1,000 citations to their retracted papers and for 230 (0.1%) and 260 (0.1%) scientists in the 2 data sets the citations to the retracted papers account for more than 5% of their citations.

Furthermore, information is provided for each scientist on the number of citations that they received from any of the retracted papers. In the career-long data set, the range is 0 to 1,974, with median (IQR) of 0 (2 to 5) and in the single recent year data set, the range is 0 to 180 with median (IQR) of 0 (0 to 0). A total of 5 scientists in the career-long data set have received more than 1,000 citations from papers that have been retracted and for 14 and 7 scientists in the 2 data sets, the citations they have received from retracted papers account for more than 5% of their citations (overall and in the single recent year, respectively).

Discussion

We hope that the addition of the retraction data will improve the granularity of the information provided on each scientist in the new, expanded database of top-cited scientists. A more informative profile may be obtained by examining not only the citation indicators but retracted papers, proportion of self-citations, evidence of extremely prolific behavior [18] (see detailed data that can be linked to the top-cited scientists’ database, published in https://elsevier.digitalcommonsdata.com/datasets/kmyvjk3xmd/2), as well as responsible indicators such as data and code sharing and protocol registration information that is becoming increasingly available [15,16].

The data suggest that approximately 4% of the top-cited scientists have at least 1 retraction. This is a conservative estimate, and the true rate may be higher since some retractions are in titles that are not covered by Scopus or could not be linked in our data set linkage. Proportions of scientists with retractions are substantially higher in the extremes of the most-cited scientists. Top-cited scientists with retracted publications exhibit higher levels of collaborative co-authorship and have a higher total number of papers published. High productivity and more extensive co-authorship may be associated with less control over what gets published or may show proficiency in gaming the system (e.g., have honorary authorship as department chair). Nevertheless, the higher publication output of scientists with retractions might simply reflect that the more you publish, the greater the chance of encountering eventually a retraction.

More than half of the top-cited scientists with retractions were in medicine and life sciences. However, high rates were seen also in several other fields. A previous mapping of retractions due to misconduct [9] had found the highest rates of retracted papers in EEECS at 18 per 10,000, double the rate for life sciences. The EEECS scientific area corresponds in our mapping to scientific domains where we also found high concentrations of top-cited scientists with retracted papers, although the rates were lower than the rate in Clinical Medicine and in Biomedical Research. It is possible that medical and life science retractions are more likely to involve top-cited authors, while EEECS retracted papers have mostly authors who do not manage to reach top-cited status. EEECS retractions have a large share of artificial intelligence generated content and fake review [9]. Therefore, it is likely that such fraudsters aim for more modest citation records, or they are revealed before they reach highly cited status, although exceptions do exist [9]. Many scientific fields have minimal or no track records of retractions and some subfields such as alternative medicine, cancer research, and pharmacology exhibit rates of retractions double the rates exhibited by the life sciences overall. These differences might reflect the increased scrutiny and better detection of misconduct and major errors in fields that have consequences for health; differences in the intensity and types of post-publication review practices [19]; and the fact that quantifiable data and images in the life sciences are easier to assess for errors and fraud than many constructs in social sciences.

Many developing countries have extremely high rates of top-cited authors with retracted papers. This may reflect problematic research environments and incentives in these countries, several of which are also rapidly growing their overall productivity [3,2023]. The countries where we detected the highest rates of top-cited authors with retractions largely overlap with the countries that have also the highest number of retracted papers per 10,000 publications according to a previous mapping of retractions due to misconduct [9]. In fact, some of these countries such as India, China, Pakistan, and Iran also have a large share of implausibly hyperprolific authors [18]. It would be interesting to see if removing some of the productivity incentives may reduce the magnitude of the problem in these countries.

As previously documented, several retracted papers have been cited considerably and, unfortunately, some continue to be cited even after their retraction [24,25]—these citations are typically such that they suggest that the citing authors are unaware of the retraction rather than citing the paper to comment on its retraction. This is a problem that should and can be hopefully fixed.

Among top-cited authors, a small number have received a very large number of citations to their retracted papers. However, these citations have a relatively small proportional contribution to the overall very high total citation counts of these scientists. The same applies to the proportion of citations that are received by retracted papers. Some highly cited authors may have received a substantial number of citations from retracted papers, but this is a very small proportion against their total citations. Nevertheless, within paper mills, fake papers may be using repeatedly the same citations from known, influential authors and papers that are already cited heavily in the literature. It is possible that most paper mill products remain undetected and have not yet been retracted from the literature.

We expect that the new, expanded database may enhance the progression of further research on citation and retraction indicators, with expanded linkage to yet more research indicators. We caution that even though we excluded retractions that attributed no fault to the authors, we cannot be confident that all the included retractions included some error, let alone misconduct, by the authors. Some retraction notes are vague and the separation of author-related versus author-unrelated reasons may not be perfect. Even for types of reasons that seem to be author-related, exceptions may exist, e.g., in partial fake review, it could be that the editors unexpectedly invite a fake referee or encounter review mills [26]. Moreover, sometimes not all authors may have been responsible for what led to the retraction. Therefore, any further analyses that focus on individual author profiles rather than aggregate, group-level analyses should pay due caution in dissecting the features and circumstances surrounding each retraction. Unfortunately, these are often not presented in sufficient detail to allow safe judgments [4,27].

Moreover, inaccuracies are possible in the merged data set. As discussed previously, Scopus has high precision and recall [28], but some errors do exist in author ID files. In the past, some Asian author IDs had very high numbers of papers because more than one author were merged in the same ID file. However, this is no longer the case and Asian name disambiguation in Scopus is currently as good or even better than European/American names [28]. Errors may also happen in the attribution of affiliation for each scientist. Finally, considering the vast size of these data sets with potential duplicity and similarity of names, ensuring that no scientist is incorrectly associated with a retracted paper is virtually impossible. Users of these data sets and/or Scopus can improve author profile accuracy by offering corrections directly to Scopus through the use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/). Most importantly, we make no judgment calls in our databases on the ethical nature of the retractions, e.g., whether they represent misconduct or honest error by the authors; similarly we do not comment on whether the retractions may be fair or not. Some retractions may still be contested between authors and editors and/or may even have ongoing legal proceedings. We urge users of these data to very carefully examine the evidence and data surrounding each retraction and its nature.

Supporting information

S1 Data. Data underlying Fig 2.

(DOCX)

pbio.3002999.s001.docx (16.6KB, docx)
S1 Table. List of author-attributable reasons used to filter journal error and withdrawn (out of date) exceptions.

(DOCX)

pbio.3002999.s002.docx (16.6KB, docx)
S2 Table. Top-cited scientists with and without retracted publications according to their primary subfield.

(DOCX)

pbio.3002999.s003.docx (34.2KB, docx)
S3 Table. Top-cited scientists with and without retracted publications in countries with high (>10%) retraction prevalence.

(DOCX)

pbio.3002999.s004.docx (18.4KB, docx)

Acknowledgments

This work uses Scopus data provided by Elsevier. We are thankful to Alison Abritis and Ivan Oransky for constructive comments.

Data Availability

The full datasets are available at https://doi.org/10.17632/btchxktzyw.7.

Funding Statement

The work of AC has been supported by the European Network Staff Exchange for Integrating Precision Health in the Healthcare Systems project (Marie Skłodowska-Curie Research and Innovation Staff Exchange no. 823995).  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Marcus A, Oransky I. Is there a retraction problem? And, if so, what can we do about it? In: Kathleen Hall Jamieson Dan M Kahan DAS, editor. The Oxford Handbook of the Science of Science Communication. Oxford University Press; 2017. [Google Scholar]
  • 2.Oransky I. Retractions are increasing, but not enough. Nature. 2022;608(7921):9. doi: 10.1038/d41586-022-02071-6 [DOI] [PubMed] [Google Scholar]
  • 3.Oransky I. Volunteer watchdogs pushed a small country up the rankings. Science. 2018;362(6413):395. doi: 10.1126/science.362.6413.395 [DOI] [PubMed] [Google Scholar]
  • 4.Hwang SY, Yon DK, Lee SW, Kim MS, Kim JY, Smith L, et al. Causes for Retraction in the Biomedical Literature: A Systematic Review of Studies of Retraction Notices. J Korean Med Sci. 2023;38(41). doi: 10.3346/jkms.2023.38.e333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Candal-Pedreira C, Ross JS, Ruano-Ravina A, Egilman DS, Fernández E, Pérez-Ríos M. Retracted papers originating from paper mills: cross sectional study. BMJ. 2022:e071517. doi: 10.1136/bmj-2022-071517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Van Noorden R. More than 10,000 research papers were retracted in 2023—a new record. Nature. 2023;624(7992):479–481. doi: 10.1038/d41586-023-03974-8 . [DOI] [PubMed] [Google Scholar]
  • 7.Freijedo-Farinas F, Ruano-Ravina A, Pérez-Ríos M, Ross J, Candal-Pedreira C. Biomedical retractions due to misconduct in Europe: characterization and trends in the last 20 years. Scientometrics. 2024;129:2867–2882. [Google Scholar]
  • 8.Li M, Chen F, Tong S, Yang L, Shen Z. Amend: an integrated platform of retracted papers and concerned papers. J Data Inf Sci. 2024;9(2):41–55. [Google Scholar]
  • 9.Li M, Shen Z. Science map of academic misconduct. Innovation (Camb). 2024;5(2):100593. doi: 10.1016/j.xinn.2024.100593 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Oransky I. Why misconduct could keep scientists from earning Highly Cited Researcher designations, and how our database plays a part. 2022. Available from: https://retractionwatch.com/2022/11/15/why-misconduct-could-keep-scientists-from-earning-highly-cited-researcher-designations-and-how-our-database-plays-a-part/. [Google Scholar]
  • 11.Hicks D, Wouters P, Waltman L, de Rijcke S, Rafols I. Bibliometrics: The Leiden Manifesto for research metrics. Nature. 2015. Apr;520(7548):429–31. doi: 10.1038/520429a [DOI] [PubMed] [Google Scholar]
  • 12.Ioannidis JPA, Baas J, Klavans R, Boyack KW. A standardized citation metrics author database annotated for scientific field. PLoS Biol. 2019;17(8):e3000384. doi: 10.1371/journal.pbio.3000384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ioannidis JPA, Klavans R, Boyack KW. Multiple citation indicators and their composite across scientific disciplines. PLoS Biol. 2016;14(7):e1002501. doi: 10.1371/journal.pbio.1002501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ioannidis JPA, Boyack KW, Baas J. Updated science-wide author databases of standardized citation indicators. PLoS Biol. 2020;18(10):e3000918. doi: 10.1371/journal.pbio.3000918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ioannidis JPAElsevier Data Repository, V6. 2023. October 2023 data-update for “Updated science-wide author databases of standardized citation indicators.” Available from: https://elsevier.digitalcommonsdata.com/datasets/btchxktzyw/7. Last accessed December 13, 2024. [DOI] [PMC free article] [PubMed]
  • 16.Ioannidis JPA, Maniadis Z. In defense of quantitative metrics in researcher assessments. PLoS Biol. 2023;21(12):e3002408. doi: 10.1371/journal.pbio.3002408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ioannidis JPA, Maniadis Z. Quantitative research assessment: using metrics against gamed metrics. Intern Emerg Med. 2024. Jan;19(1):39–47. doi: 10.1007/s11739-023-03447-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ioannidis JPA, Collins TA, Baas J. Evolving patterns of extreme publishing behavior across science. Scientometrics. 2024. Jul;1–4. [Google Scholar]
  • 19.Hardwicke TE, Thibault RT, Kosie JE, Tzavella L, Bendixen T, Handcock SA, et al. Post-publication critique at top-ranked journals across scientific disciplines: A cross-sectional assessment of policies and practice. R Soc Open Sci. 2022. Aug 24;9(8):220139. doi: 10.1098/rsos.220139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Catanzaro M. Saudi universities entice top scientists to switch affiliations—sometimes with cash. Nature. 2023. May;617(7961):446–7. doi: 10.1038/d41586-023-01523-x [DOI] [PubMed] [Google Scholar]
  • 21.Bhattacharjee Y. Citation impact. Saudi universities offer cash in exchange for academic prestige. Science. 2011. Dec;334(6061):1344–5. doi: 10.1126/science.334.6061.1344 [DOI] [PubMed] [Google Scholar]
  • 22.Rodrigues F, Gupta P, Khan AP, Chatterjee T, Sandhu NK, Gupta L. The cultural context of plagiarism and research misconduct in the Asian Region. J Korean Med Sci. 2023;38(12):e88. doi: 10.3346/jkms.2023.38.e88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rathore FA, Waqas A, Zia AM, Mavrinac M, Farooq F. Exploring the attitudes of medical faculty members and students in Pakistan towards plagiarism: a cross sectional survey. PeerJ. 2015. Jun;3:e1031. doi: 10.7717/peerj.1031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hsiao TK, Schneider J. Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine. Quant Sci Stud. 2021;2(4):1144–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Marcus A, Abritis AJ, Oransky I. How to stop the unknowing citation of retracted papers. Anesthesiology. 2022;137(3):280–282. doi: 10.1097/ALN.0000000000004333 [DOI] [PubMed] [Google Scholar]
  • 26.Oviedo-García MÁ. The review mills, not just (self-) plagiarism in review reports, but a step further. Scientometrics. 2024. Sep;129(9):5805–13. [Google Scholar]
  • 27.Wager E, Williams P. Why and how do journals retract articles? An analysis of Medline retractions 1988–2008. J Med Ethics. 2011 Sep;37(9):567–70. doi: 10.1136/jme.2010.040964 [DOI] [PubMed] [Google Scholar]
  • 28.Baas J, Schotten M, Plume A, Côté G, Karimi R. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quant Sci Stud. 2020. Feb;1(1):377–86. [Google Scholar]

Decision Letter 0

Roland G Roberts

30 Sep 2024

Dear John,

Thank you for submitting your manuscript entitled "Updated science-wide author databases of standardized citation indicators including retraction data" for consideration as a Meta-Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

IMPORTANT: In order to help maximise our chances of recruiting appropriate reviewers (and probably maximising the chances of a positive outcome?), we strongly suggest that you slightly re-frame the article before uploading the additional metadata (see next paragraph). SPECIFICALLY, while we recognise the popularity of your database, we think that it would be better to lead with the retraction analysis, leaving the database update to be a secondary aspect. I think this could be relatively easily done, with a tweak to the Title, and then re-ordering the relevant elements of the Abstract and Introduction. I don't think any changes would be needed in the rest of the manuscript. In answer to your question about Meta-Research Article versus Update Article, we would definitely keep it as a Meta-Research Article. If the afore-mentioned re-framing is likely to take more than a week, let me know, and we can "reject" and then allow a new submission when you're ready (simply a formality).

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Oct 02 2024 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland G Roberts

15 Nov 2024

Dear John,

Thank you for your patience while your manuscript "Retractions among highly-cited authors in science-wide author databases" went through peer-review at PLOS Biology. Your manuscript has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers.

You'll see that reviewer #1 says that the study is new and important, and simply has a few questions about methodology and a suggestions for some helpful diagrams. Reviewer #2 is also positive, but wants you to discuss more of the prior literature on retractions, to better justify a claim, and to point out that some retractions may not be down to the authors themselves. Reviewer #3 wants you to formulate clearer research questions, questions the point of discussing countries with very low publication rates, questions the rationale behind looking at citations in such a recent year as 2023, and wants more detail about name disambiguation. Reviewer #4 says that the paper is important, but thinks that it needs re-framing and clearer motivation (is it about the database, or is it about the retractions?), and wants more clarity on where the responsibility for retraction lies.

IMPORTANT: My diagnosis here is that the concerns raised by reviewer #3 and #4 are a natural consequence of my previous request that you do a "quick and dirty" cosmetic re-framing of the paper before review, as we were much more interested in the retraction analysis than in the database per se, and wanted the reviewers to focus on that aspect. The reviewers seem to detect the resulting disconnect, so I see revision as an opportunity for you to complete the process of re-framing around the retraction aspect (e.g. by including clear research questions, as the reviewers suggest). Obviously the other concerns raised by the reviewers should also be addressed (and for clarity, we're still interested in the updated database, but the retraction analysis should take centre stage).

In light of the reviews, which you will find at the end of this email, we are pleased to offer you the opportunity to address the comments from the reviewers in a revision that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments with our Academic Editor aiming to avoid further rounds of peer-review, although might need to consult with the reviewers, depending on the nature of the revisions.

We expect to receive your revised manuscript within 1 month. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we withdraw the manuscript.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted " file type.

*Resubmission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

----------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

[identifies himself as David B Resnik]

This article develops databases that links retraction and citation data. The database can be useful for future research on retractions. The study is well designed and executed. The information provided is new and important. I have just a few questions/suggestions.

1. How was the linkage achieved? Was this automated? Done manually by members of the research team reviewing articles?

2. Is this database publicly available? Where? Searchable?

3. You might want to include some diagrams as visual aids to the reader, such as the steps in your search and linkage of records.

Reviewer #2:

In their study, Ioannidis et al. conducted a bibliometric analysis of retractions among highly-cited authors, highlighting a surprising ratio of highly-cited scientists having at least one retraction. This study is generally interesting and holds some value. However, the work does suffer from some issues.

Firstly, the article primarily focuses on the retractions of highly-cited authors, but the background is too brief to provide a comprehensive understanding of the retraction landscape. The authors should consider reviewing the latest relevant articles for a broader background, such as the following literatures, https://www.nature.com/articles/d41586-023-03974-8, 10.1007/s11192-024-04992-7, 10.2478/jdis-2024-0012, 10.1016/j.xinn.2024.100593, 10.1016/j.heliyon.2024.e38620, as well as more literatures. This could enhance the significance of their work.

Secondly, the authors claimed that retractions are more prevalent in the life sciences compared to other fields. The authors may not reach this conclusion based solely on the proportion of top-cited scientists with retracted publications. This requires literature support or a comparison of retraction rates.

Thirdly, the criteria used to filter journal errors may be inadequate. For instance, instances of partial fake peer review may not necessarily be linked to the authors. It could be due to factors like editors unexpectedly inviting a fake referee or encountering review mills (see 10.1007/s11192-024-05125-w). Additionally, some reasons for retractions, such as Falsification/Fabrication of Data, Contamination of Cell Lines/Tissues, Contamination of Materials, Duplication of Text, among others, were not explicitly categorized as author errors.

Lastly, there are issues with the citation style in the references. For example, "Oransky I. Volunteer watchdogs pushed a small country up the rankings. Science (1979). 2018;362(6413):395" requires correction. Please review the references for accuracy.

Reviewer #3:

This manuscript describes an update of a dataset of highly cited authors. The update also includes the addition of the number of retracted papers and their number of citations by the highly cited authors. I'm missing the research question of this manuscript. It might help to formulate clear research questions, e.g., how high is the percentage of highly cited authors with retracted papers, or do the retracted papers make some authors highly cited? I don't see the connection to biology. Maybe, the authors can include a research question that is related to biology. More specific comments follow below.

The authors state in the abstract without providing any reason: "It would be useful to generate databases where the presence of retractions can be linked to impact metrics of each scientist." They continue: "We have thus incorporated retraction data in an updated a Scopus-based database of highly-cited scientists (top-2% in each scientific subfield according to a composite citation indicator)." This is the other way around compared to the preceding sentence.

Also, in the abstract, the authors state: "In several developing countries, very high proportions of top-cited scientists had retractions (highest in Senegal (66.7%), Ecuador (28.6%) and Pakistan (27.8%) in career-long citation impact lists). Variability in retraction rates across fields and countries suggests differences in research practices, scrutiny, and ease of retraction." Especially, in the case of Senegal and Ecuador, this is statistics on very small numbers.

I do not see the benefit of analyzing the single most recent year (i.e., 2023) as this year's publications have had far too few time to be cited and generate impact.

The h/hm index has been mentioned (e.g., on page 6) but was not explained.

On page 9, the authors state: "Many developing countries have extremely high rates of top-cited authors with retracted papers. This may reflect problematic research environments and incentives in these countries, several of which are also rapidly growing their overall productivity (3,16-19). In fact, some of these countries such as India, China, Pakistan and Iran also have a large share of implausibly hyperprolific authors (14). It would be interesting to see if removing some of the productivity incentives may reduce the magnitude of the problem in these countries." I wonder if part of the "implausibly hyperprolific authors" and "extremely high rates of top-cited authors with retracted papers" might be due to a problem of the author name disambiguation. Also, no details regarding the author name disambiguation method were provided, like for the indicators used for ranking scholars.

Reviewer #4:

[identifies himself as Sean C. Rife]

The submitted manuscript investigates the increasing prevalence of retractions in scientific literature, which, despite their growth, still represent a small fraction of published works. By linking retraction data from the Retraction Watch database with citation metrics from Scopus, the authors found that a notable percentage of highly cited scientists had at least one retraction, with notable variations across disciplines and countries. The authors note that retractions are more frequent in the life sciences and highlight the necessity for careful interpretation of retraction data, as they do not always indicate misconduct, thereby providing a valuable resource for understanding scientific practices and enhancing research evaluation. I think this is an important paper that warrants publication. However, it could be improved in a number of ways, which I outline below. I also wonder if this paper might be better suited for PLOS ONE (although it would not qualify in my mind as suitable with only minor revisions, hence my lack of response to the earlier question of whether it would be "suitable for another PLOS journal with only minor revisions"), given its application to a wide array of scientific fields.

Broadly, my concern is that I didn't get a clear understanding of the purpose of the paper. Is it supposed to elucidate the extent to which highly-cited authors have their papers retracted (and associated variables such as field, region, etc.), describe a newly-published dataset, or both? At present it reads like it is straddling the line between the two, which makes it somewhat difficult to follow. This could, perhaps, be improved by adding a simple statement outlining the purposes of the paper explicitly, early on, but I think the paper would benefit from a more thorough revision that makes the purpose clear at every stage.

I was also somewhat confused by the focus in various places on responsibility on the part of authors. The Method section might benefit from a brief explication of the authors' intentions with the filtering they applied. I presume the goal is to limit the analyses to instances in which the author(s) in question are responsible for errors or malfeasance, but then on p. 10 the authors state that they "make no judgment calls in our databases on the ethical nature of the retractions"; but then, do they not - at least implicitly - do so in the paper? Or am I assuming too much? This is also complicated (as the authors note in multiple instances) by the fact that many of the authors they identify may not be responsible for the elements of the papers that justified their retraction.

A few minor points:

- The authors note on p. 9 that retracted works often continue to be cited after they have been retracted. This is certainly problematic to the extent that the citing authors are unaware of the retraction, but there are also valid reasons to knowingly cite a retracted work (e.g., to discuss the nature/implications/etc. of the retraction).

- The authors discuss paper mills in a number of places. A definition would be helpful.

- The authors mention that some authors may be able to "game the system" re: publishing. An example would be helpful.

- The authors note that retractions are more common in the life sciences and note that this might be due to increased scrutiny in these fields. This should probably be stated as a higher percentage, as a simple higher rate could be due to base rates (this is reflected elsewhere in the manuscript - just thinking it should be stated as a higher percentage here).

Decision Letter 2

Roland G Roberts

12 Dec 2024

Dear John,

Thank you for your patience while we considered your revised manuscript "Retractions among highly-cited authors in science-wide author databases" for publication as a Meta-Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors and the Academic Editor

Based on our Academic Editor's assessment of your revision, we are likely to accept this manuscript for publication, provided you satisfactorily address the following data and other policy-related requests.

IMPORTANT - please attend to the following:

a) Please change your Title to something more explicit, including an active verb. We suggest the following: "Linking citation and retraction data reveals the demographics of scientific retractions among highly-cited authors"

b) You say that you received no specific funding for this work. Can you please confirm that this is indeed the case?

c) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 2AB, as a supplementary data file.

d) Please cite the location of the data clearly in the legend to Figure 2, e.g. “The data underlying this Figure can be found in S1 Data.”

e) The Academic Editor wants you to include some RRIDs to improve long-term "findability" of the information. Specifically, they suggest the following instances: "To add the new information on retractions, we depended on the most reliable database of retractions available to date, the Retraction Watch database (RWDB, RRID:SCR_000654) which is also publicly freely available through CrossRef (RRID:SCR_003217)." and "Following this filtering process, we linked the retraction records to Scopus (RRID:SCR_022559) using the digital object identifier (DOI) of the original paper..."

f) Where you say "...publications (p<0.001 by Mann-Whitney U..." please report:

• which tool you ran your stats with (and the RRID and version of the tool)

• U-statistic (or z-statistics for large groups), exact p-values, sample and group sizes

• effect size

• descriptive statistics

g) There's a typo in some of the new text ("Carrer-long impact counts" instead of "Career-long impact counts").

h) Please make any custom code available, either as a supplementary file or as part of a DOI'd data deposition (e.g. in Zenodo). For example, I see that you describe the linkage of RetractionWatch entries to Scopus as being automated, so there is presumably a pipeline that performed this linkage? It would also be helpful if a more detailed description of how this linkage was performed were included in the manuscript itself.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli

Roland Roberts, PhD

Senior Editor

rroberts@plos.org

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Fig 2AB. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

CODE POLICY

Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

Decision Letter 3

Roland G Roberts

2 Jan 2025

Dear John,

Happy New Year! Thank you for the submission of your revised Meta-Research Article "Linking citation and retraction data reveals the demographics of scientific retractions among highly-cited authors" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Anita Bandrowski, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Best wishes,

Roli

Roland G Roberts, PhD, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data. Data underlying Fig 2.

    (DOCX)

    pbio.3002999.s001.docx (16.6KB, docx)
    S1 Table. List of author-attributable reasons used to filter journal error and withdrawn (out of date) exceptions.

    (DOCX)

    pbio.3002999.s002.docx (16.6KB, docx)
    S2 Table. Top-cited scientists with and without retracted publications according to their primary subfield.

    (DOCX)

    pbio.3002999.s003.docx (34.2KB, docx)
    S3 Table. Top-cited scientists with and without retracted publications in countries with high (>10%) retraction prevalence.

    (DOCX)

    pbio.3002999.s004.docx (18.4KB, docx)
    Attachment

    Submitted filename: repliesretra.docx

    pbio.3002999.s005.docx (25.6KB, docx)
    Attachment

    Submitted filename: responsecomments.docx

    pbio.3002999.s006.docx (15.3KB, docx)

    Data Availability Statement

    The full datasets are available at https://doi.org/10.17632/btchxktzyw.7.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES