Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 1.
Published in final edited form as: Eur Arch Otorhinolaryngol. 2024 Jul 17;281(11):6141–6146. doi: 10.1007/s00405-024-08834-3

AI-generated text in otolaryngology publications: a comparative analysis before and after the release of ChatGPT

Jonathan M Carnino 1, Nicholas Y K Chong 1, Henry Bayly 2, Lindsay R Salvati 2, Hardeep S Tiwana 3, Jessica R Levi 1,4
PMCID: PMC11513233  NIHMSID: NIHMS2016693  PMID: 39014250

Abstract

Purpose

This study delves into the broader implications of artificial intelligence (AI) text generation technologies, including large language models (LLMs) and chatbots, on the scientific literature of otolaryngology. By observing trends in AI-generated text within published otolaryngology studies, this investigation aims to contextualize the impact of AI-driven tools that are reshaping scientific writing and communication.

Methods

Text from 143 original articles published in JAMA Otolaryngology – Head and Neck Surgery was collected, representing periods before and after ChatGPT’s release in November 2022. The text from each article’s abstract, introduction, methods, results, and discussion were entered into ZeroGPT.com to estimate the percentage of AI-generated content. Statistical analyses, including T-Tests and Fligner-Killeen’s tests, were conducted using R.

Results

A significant increase was observed in the mean percentage of AI-generated text post-ChatGPT release, especially in the abstract (from 34.36 to 46.53%, p = 0.004), introduction (from 32.43 to 45.08%, p = 0.010), and discussion sections (from 15.73 to 25.03%, p = 0.015). Publications of authors from non-English speaking countries demonstrated a higher percentage of AI-generated text.

Conclusion

This study found that the advent of ChatGPT has significantly impacted writing practices among researchers publishing in JAMA Otolaryngology – Head and Neck Surgery, raising concerns over the accuracy of AI-created content and potential misinformation risks. This manuscript highlights the evolving dynamics between AI technologies, scientific communication, and publication integrity, emphasizing the urgent need for continued research in this dynamic field. The findings also suggest an increasing reliance on AI tools like ChatGPT, raising questions about their broader implications for scientific publishing.

Keywords: ChatGPT, Artificial intelligence (AI), Manuscript drafting, Ethics

Introduction

The realm of healthcare has undergone a remarkable transformation with the advent of Artificial Intelligence (AI), offering unprecedented opportunities to reshape the analysis of medical data, diagnostic processes, and the delivery of patient care [13]. With the ability to process vast amounts of complex information rapidly, AI holds the promise of enhancing medical decision-making, predicting diseases, and personalizing treatment plans [4, 5]. Notably, a surge in interest surrounding large language models (LLMs), a sub-type of AI models, has gained traction recently due to their widespread availability and user-friendly interfaces [6].

In recent years, several innovative LLMs and chatbots have been introduced to the public, including the widely discussed ChatGPT by OpenAI in November of 2022. While ChatGPT stands out for its popularity, it is merely a part of a vast and rapidly evolving ecosystem of AI technologies that are transforming the way scientific communication and manuscript drafting are approached. These tools leverage natural language processing (NLP) technology to understand and generate text, offering new avenues for scientific writing and communication [7, 8]. The introduction of advanced LLMs has sparked ethical debates and discussions surrounding integrity, plagiarism, and authorship in higher education [911]. Within the scientific literature, the growing citations of AI tools in PubMed has notably exceeded one thousand since August 2023, underscoring its growing prominence and highlighting the potential for its integration in healthcare [1214].

Despite the excitement, the integration of various AI-driven tools in scientific discourse has triggered critical ethical discussions. Several science journals have taken a strong stance against its usage, going so far as to prohibit it due to concerns akin to those associated with plagiarism or image manipulation [15]. Furthermore, apprehensions about the quality of papers generated by AI-assistance have been raised, as it occasionally produces inaccurate statements and provides different answers to the same question. This poses a significant challenge, especially as individuals lacking scientific expertise or time to thoroughly draft a manuscript might be empowered to use LLMs for assistance in drafting scientific articles [16, 17].

Our interest in examining AI-generated text, is driven by the need to better understand AI’s influence on the field of Otolaryngology within scholarly publications. This study aims to analyze articles in the high-impact journal JAMA Otolaryngology – Head and Neck Surgery (JAMA – Oto), evaluating the extent of AI-generated content both before and after the release of ChatGPT.

Methods

Data collection

Every original article published by JAMA – Oto from January 2022 to November 2022 (88 total articles) was extracted to represent the articles published before ChatGPTs release in November 2022. Alternative article types such as reviews, case reports, opinions, and commentaries were excluded from this analysis due to their varied lengths and structural inconsistencies, which complicated direct comparisons. It’s important to note that this timeline was used as ChatGPT’s release marks a pivotal moment in the adoption of AI for text generation, however many similar programs have also been released since. Every original article published by JAMA – Oto from March 2023 to September 2023 (55 total articles) was extracted to represent the articles published after Chat-GPTs release. Articles from December of 2022 to February of 2023 were not included in the analysis to exclude articles written and/or submitted before the chatbot was available. Free text from each article’s abstract, introduction, methods, results, and discussion were individually entered into ZeroGPT.com, an online application used to estimate the percent of a text generated by AI [1820]. The percent of text generated by AI from each article was recorded. The country of the corresponding author listed on each manuscript was also extracted.

Statistical analysis

ChatGPT was introduced on November 30, 2022. The means of AI-generated text in each section were compared, stratified by their publication dates being before or after the introduction of ChatGPT. Prior to this comparison, Fligner-Killeen’s test for the homogeneity of variances was performed. This non-parametric approach checks the assumptions for a standard T-Test without assuming the data to be normally distributed. Based on these results, (adjusted) T-Tests were conducted to compare the means. Additionally, the rate of AI-generated text usage based on the country of academic affiliation for the principal investigator (PI) was of interest. The means of detected AI-generated text were compared, stratified by whether the country of academic affiliation for the PI was primarily English-speaking. Fligner-Killeen’s test for the homogeneity of variances was performed, and T-Tests were conducted accordingly based on these results. A significance level of 0.05 was used for all analyses. All analyses utilized R (Version 4.3.1).

Results

The overall mean percentage of AI-written text is presented in Table 1. Notably, the results section displayed the highest percentage of estimated AI-written text, at 49.68%, while the discussion section contained the least, at 19.30%. The findings from Fligner-Killeen’s tests are detailed in Tables 2 and 3. It is important to highlight that the discussion section, when categorized by periods before and after the introduction of ChatGPT, showed unequal variances (p = 0.002), leading to an adjustment of the T-Test using the Welch approximation, as shown in Table 4. Significant differences were observed in the mean percentage of AI-generated text in the abstract, introduction, and discussion sections when comparing averages before and after Chat-GPT’s release, as shown in Fig. 1. Furthermore, significant disparities were found in the mean percentage of AI-generated text within the same sections when assessing whether the country of academic affiliation of the principal investigator (PI) was primarily English-speaking, as indicated in Table 5.

Table 1.

Overall mean percentage of AI generated text in each article section

Section of Paper Mean Percentage of AI Generated Text
Abstract 39.04%
Introduction 37.30%
Methods 20.92%
Results 49.68%
Discussion 19.30%

Table 2.

Results of fligner-killeen test of homogeneity of variances in each section between percentages of AI generated text before and after the introduction of ChatGPT

Section of Paper Chi-Square Statistic Degrees of Freedom P-Value
Abstract 3.08 1 0.08
Introduction 0.00 1 0.97
Methods 0.10 1 0.75
Results 2.39 1 0.12
Discussion 9.30 1 0.002*
*

Indicates a significant result; in this case meaning the assumption of homogeneity of variances has been violated and an adjustment is needed

Table 3.

Results of fligner-killeen test of homogeneity of variances between percentages of AI generated text stratified by our language variable described in the methods section

Section of Paper Chi-Square Statistic Degrees of Freedom P-Value
Abstract 0.40 1 0.53
Introduction 1.70 1 0.19
Methods 0.29 1 0.59
Results 0.81 1 0.37
Discussion 2.97 1 0.08

Table 4.

Result of T-Tests comparing the means of AI generated text before and after the introduction of chat GPT

Section of Paper Mean % AI Generated Text (Pre-ChatGPT) Mean % AI Generated Text (Post-ChatGPT) t-statistic P-Value
Abstract 34.36% 46.53% −2.94 0.004
Introduction 32.43% 45.08% −2.62 0.010
Methods 22.66% 18.15%   1.54 0.126
Results 50.96% 47.64%   0.73 0.464
Discussion 15.73% 25.03% −2.48 0.015

Fig. 1.

Fig. 1

Boxplot of AI-generated text before and after the release of ChatGPT. Articles from before and after the release of ChatGPT were analyzed by an AI detector, ZeroGPT.com. Article abstract, introduction, and discussion sections were found to have a significantly increased percentage of AI-generated text since ChatGPTs release. Percentages were plotted based on whether the article was published before or after the release of ChatGPT (November 2022). *P < 0.05, **P < 0.01

Table 5.

Results of T-Test comparing the mean percentages of AI generated text stratified by our language variable described in the methods section

Section of Paper Mean % AI Generated Text (Non-English Speaking) Mean % AI Generated Text (English Speaking) t-statistic P-Value
Abstract 51.11% 35.56% 3.24 0.001
Introduction 46.79% 34.56% 2.15 0.033
Methods 24.60% 19.86% 1.39 0.168
Results 56.71% 47.65% 1.73 0.085
Discussion 25.71% 17.46% 2.03 0.044

Discussion

This study aimed to assess the impact of AI-generated text on the content of articles published in JAMA – Oto, focusing on the period before and after the release of ChatGPT in November 2022. The findings of this analysis align with the recent policy updates from JAMA and the JAMA Network journals, which advocate for transparent and responsible AI tool use in both manuscript preparation and research. As detailed in their guidelines first issued in 2023 and expanded upon in 2024 [21, 22], the emphasis on disclosing AI involvement underscores the critical nature of transparency and accountability in scholarly publishing.

This study, revealing a notable uptick in AI-generated content post-ChatGPT’s introduction, prompts a broader reflection on the academic community’s adaptation to these policies and highlights the ongoing need for dialogue and research to effectively navigate AI’s challenges and opportunities within scientific dissemination. These results suggest that researchers are likely utilizing ChatGPT for assistance in composing these specific article sections. However, it remains unclear whether authors employ AI to draft entire manuscripts or if these chatbots are utilized primarily for proofreading and editing. Regardless of the purpose, there is a potential risk associated with ChatGPT, given its propensity to generate false statements, often referred to as “AI hallucinations” [17, 23].

Interestingly, authors from non-English speaking countries were found to have a higher percentage of AI-generated text than authors from English-speaking countries (Table 5). Interpreting this finding poses challenges due to the imperfect nature of current applications designed to detect AI-generated text. On one hand, it could suggest that authors from non-English speaking countries are more inclined to use AI chatbots for tasks such as translation or drafting manuscripts. However, it’s crucial to acknowledge that AI detectors have been reported to falsely accuse non-native English speakers [24]. This discrepancy may arise from the stylistic similarity between the writing style of individuals learning English as a second language and text generated by AI. Consequently, the interpretation of this finding remains uncertain, emphasizing the need for further research before considering widespread adoption of applications for detecting AI-generated text.

Moreover, the unexpectedly high detection of AI-generated text in the results sections can be partly explained by the performance characteristics of ZeroGPT.com. The tool’s accuracy is compromised by the shorter length and the straightforward, data-focused language typical of results sections, which resembles AI-generated text. This finding is particularly surprising since one would expect the highest percentages of AI-generated text in the more narrative sections such as the abstract, introduction, or discussion, not in the results which typically involve direct data reporting. This highlights the need for refined methodologies in AI text detection to ensure accuracy and reliability.

Recommendations for academic journals

Based on our findings, we suggest that academic journals consider strategies to address the evolving impact of AI technologies, including a range of LLMs and chatbots beyond just ChatGPT on the content of scholarly articles. Despite existing guidelines surrounding the use of AI tools in research and the call for transparency, it is evident that authors have yet to fully embrace these policies. To address this, academic journals should collaborate with AI experts to pioneer the development and implementation of more refined tools for detecting AI-generated text. This collaborative effort aims to mitigate potential biases and enhance the reliability of assessments. Recognizing the inherent difficulty in creating accurate tools to detect AI-generated text, it is crucial to acknowledge This challenge. However, despite the complexities involved, initiating efforts to develop such tools is imperative for safeguarding academic integrity and ensuring the accuracy of reported scientific findings. Additionally, given the observed variations tied to authors’ country of academic affiliation, journals should exercise caution in interpreting results and refrain from making assumptions about AI usage by authors from non-English speaking countries. Recognizing the imperfect nature of the current system, thorough screening for AI-generated text in academic articles remains crucial to uphold the accuracy of reported scientific findings. In essence, these recommendations seek to strike a harmonious balance between harnessing AI advancements and upholding the integrity and transparency of academic publishing.

Limitations

This study has several limitations should be acknowledged. Firstly, the evaluation focused exclusively on articles published in one high impact academic journal, JAMA - Oto, limiting the generalizability of the results to other medical disciplines or academic journals. Additionally, the exclusion of other article types such as reviews, case reports, opinions, and commentaries due to their varied lengths and structural inconsistencies also restricts the scope of our findings. The study’s reliance on an online application, ZeroGPT.com, for estimating the percentage of AI-generated text introduces a potential source of variability and may not capture nuances in the text’s quality or authenticity. Importantly, applications to detect AI-generated text are still imperfect and thus typically report some level of AI-generated text in articles written by humans depending on the author’s style of writing. Furthermore, the study raises questions about the purpose of AI usage, whether for entire manuscript drafting or specific sections, without providing definitive insights into authors’ intentions. The detected variations in AI-generated text based on the country of academic affiliation highlight a potential bias in current AI detectors, necessitating caution in interpreting these results. Overall, these limitations underscore the need for future research to address these constraints and further explore the complex dynamics between AI, scientific discourse, and publication integrity.

Conclusion

In conclusion, our investigation into the effects of AI-text generation models, with ChatGPT being one of the many influential tools, has uncovered noteworthy insights. The surge in AI-generated text, particularly in abstracts, introductions, and discussions, following ChatGPT’s release in November 2022, suggests a substantial influence on researchers’ writing practices. However, concerns persist regarding the accuracy of AI-generated content and the potential for misinformation, commonly referred to as “AI hallucinations.” As AI tools become more widely used, it would be interesting to observe whether the percentage of AI-generated content continues to rise and how this trend further influences scientific publishing. As we navigate the evolving landscape of AI in academic publishing, our recommendations advocate for collaboration between journals and AI experts to refine detection tools, uphold transparency, and ensure the integrity of scholarly discourse. Despite the limitations of our study, it serves as a stepping-stone, urging future research to address these constraints and delve deeper into the intricate relationship between AI, scientific communication, and publication integrity.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Footnotes

Conflict of interest Authors have no financial conflicts of interest to disclose. Authors declare no conflicts of interest.

References

  • 1.Yu KH, Beam AL, Kohane IS (2018) Artificial intelligence in healthcare. Nat Biomed Eng 2:719–731 [DOI] [PubMed] [Google Scholar]
  • 2.Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S et al. (2017) Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2:230–243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Goodman RS, Patrinely JR Jr., Osterman T, Wheless L, Johnson DB (2023) On the cusp: considering the impact of artificial intelligence language models in healthcare. Med 4:139–140 [DOI] [PubMed] [Google Scholar]
  • 4.Johnson KB, Wei WQ, Weeraratne D, Frisse ME, Misulis K, Rhee K et al. (2021) Precision Medicine, AI, and the future of Personalized Health Care. Clin Transl Sci 14:86–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Davenport T, Kalakota R (2019) The potential for artificial intelligence in healthcare. Future Healthc J 6:94–98 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ (2023) Artificial Intelligence in Orthopaedic surgery: can a large Language Model write a Believable Orthopaedic Journal article? J Bone Joint Surg Am 105:1388–1392 [DOI] [PubMed] [Google Scholar]
  • 7.Dergaa I, Chamari K, Zmijewski P, Ben Saad H (2023) From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport 40:615–622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lechien JR, Gorton A, Robertson J, Vaira LA (2023) Is ChatGPT-4 Accurate in Proofread a Manuscript in Otolaryngology-Head and Neck surgery? Otolaryngol Head Neck Surg [DOI] [PubMed] [Google Scholar]
  • 9.King MR (2023) chatGpt. A conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cell Mol Bioeng 16:1–2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A (2023) Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digit Health 5:e105–e6 [DOI] [PubMed] [Google Scholar]
  • 11.Ali MJ, Djalilian A (2023) Readership awareness series - paper 4: chatbots and ChatGPT - ethical considerations in scientific publications. Semin Ophthalmol 38:403–404 [DOI] [PubMed] [Google Scholar]
  • 12.Owens B (2023) How Nature readers are using ChatGPT. Nature 615:20. [DOI] [PubMed] [Google Scholar]
  • 13.Ruksakulpiwat S, Kumar A, Ajibade A (2023) Using ChatGPT in Medical Research: current status and future directions. J Multidiscip Healthc 16:1513–1520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Temsah MH, Altamimi I, Jamal A, Alhasan K, Al-Eyadhy A (2023) ChatGPT surpasses 1000 publications on PubMed: envisioning the Road ahead. Cureus 15:e44769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Thorp HH (2023) ChatGPT is fun, but not an author. Science 379:313. [DOI] [PubMed] [Google Scholar]
  • 16.Stokel-Walker C (2023) ChatGPT listed as author on research papers: many scientists disapprove. Nature 613:620–621 [DOI] [PubMed] [Google Scholar]
  • 17.Salvagno M, Taccone FS, Gerli AG (2023) Artificial intelligence hallucinations. Crit Care 27:180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bisi T, Risser A, Clavert P, Migaud H, Dartus J (2023) What is the rate of text generated by artificial intelligence over a year of publication in Orthopedics & Traumatology: surgery & research? Analysis of 425 articles before versus after the launch of ChatGPT in November 2022. Orthop Traumatol Surg Res 109:103694. [DOI] [PubMed] [Google Scholar]
  • 19.Maroteau G, An JS, Murgier J, Hulet C, Ollivier M, Ferreira A (2023) Evaluation of the impact of large language learning models on articles submitted to Orthopaedics & Traumatology: surgery & research (OTSR): a significant increase in the use of artificial intelligence in 2023. Orthop Traumatol Surg Res 109:103720. [DOI] [PubMed] [Google Scholar]
  • 20.Bellini V, Semeraro F, Montomoli J, Cascella M, Bignami E (2024) Between human and AI: assessing the reliability of AI text detection tools. Curr Med Res Opin 40:353–358 [DOI] [PubMed] [Google Scholar]
  • 21.Flanagin A, Kendall-Taylor J, Bibbins-Domingo K (2023) Guidance for authors, peer reviewers, and editors on Use of AI, Language models, and Chatbots. JAMA 330:702–703 [DOI] [PubMed] [Google Scholar]
  • 22.Flanagin A, Pirracchio R, Khera R, Berkwits M, Hswen Y, Bibbins-Domingo K (2024) Reporting Use of AI in Research and Scholarly Publication-JAMA Network Guidance. JAMA [DOI] [PubMed] [Google Scholar]
  • 23.Alkaissi H, McFarlane SI (2023) Artificial Hallucinations in ChatGPT: implications in Scientific writing. Cureus 15:e35179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mathewson TG (2023) AI detection Tools falsely accuse International students of cheating. The Markup, The Markup [Google Scholar]

RESOURCES