Over the last year, the subject of artificial intelligence (AI) in all its manifestations has generated significant sturm and drang throughout the world. The opinions of experts have not been in short supply, nor have they been consistent in tone. Steven Hawking stated, “The development of full AI could spell the end of the human race” (Cellen-Jones, 2014). In contrast, Sundar Pichai stated, “AI is one of the most important things humanity is working on. It is more profound than, I dunno, electricity or fire” (Clifford, 2018). Despite the recent flurry of interest, the history of AI dates back to the early 1950s when Alan Turing asked the question, “Can machines think?” (Turing, 1950). Turing described the imitation game among 3 people. He proposed that a computer participating in the game would deserve to be called intelligent if it could deceive a human into believing that it was human (Turing, 1950). In 1956, John McCarthy coined the term AI and gathered a group of scientists together at Dartmouth University “to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it” (Milestones, 1956). Over the next 2–3 decades, the works of Hebb, Rosenblatt, Widrow, and Hoff were critical to the development of thinking machines. These scientists developed the mathematical theories that were critical to designing machines to think like the brain, utilizing neural networks (Eberhart and Dobbins, 1990). The development of complex neural networks, coupled with the seemingly never-ending advancements in the ability of computers to store and process information, resulted in computers beginning to understand and process text and spoken words in a manner similar to humans. These advances required larger and larger amounts of information that could be used as training material and resulted in machines that were able to beat chess grandmasters (Eschner, 2017). The ever-increasing amounts of training materials that have become available on the internet, coupled with the ability to recognize and process human language, provided the foundation for the process of generative AI. Generative AI can be characterized as the process of generating new content (text or images) in response to inquiries written in normal conversational language using learning derived from large databases. With the release of ChatGPT in November of 2022, the wide-ranging abilities of generative AI became quickly evident. ChatGPT (and generative AI platforms such as Bard, DeepMind, Cohere, and others) will provide you, in about 3 seconds, a poem describing Duke basketball’s superiority (Figure 1) or a short paragraph on mechanisms of acantholysis with references (Figure 2). It did not take long for both scientists and the public to recognize the power of generative AI programs (Florindo, 2023).
Because ChatGPT has become more widely utilized, it is clear that there are many limitations to this technology. ChatGPT can write answers that read and sound convincing, yet are often full of mistakes (perhaps in our basketball poem?). The ability of ChatGPT or any generative AI program to answer specific questions is based on the information contained in the training set. This information is most often collected (or scraped) from a wide variety of databases available on the internet; it can be scraped from databases curated and vetted by experts or from social media sites. This approach is a major weakness in generative AI. In most generative AI platforms, the requestor has no way of validating the selection or curation of the information used to train the specific generative AI program. Although the answer is written well, there is no transparency about the sources of the data or their validity. Generative AI also poses a significant risk of plagiarism, both unintended and intentional, and for violation of copyright and licensing laws (Stokel-Walker and Van Noorden, 2023). Generative AI can also perpetuate biases or misinformation that is in the training set, incorporating this misinformation into an answer with a reference that may or may not be real (Stokel-Walker and Van Noorden, 2023). Zack et al (20231) have utilized GPT-4 to answer common clinical questions and found that the answers tended to include diagnoses that stereotyped specific races and genders.
Recently, Májovský et al (2023) used ChatGPT to generate a totally fraudulent scientific manuscript describing a clinical trial and then had the manuscript reviewed by content area experts. The reviews of this fraudulent ChatGPT manuscript by content area experts reported that “Overall, the generated article demonstrated a high level of technical proficiency and authenticity” and “From a psychiatric expert point of view, the study could be considered groundbreaking…” (Májovský et al, 2023). Both reviewers however noted “…some concerns and specific mistakes …” It was also noted that of the 17 references cited, 4 were found to be nonexistent, and 2 were contextually wrong (Májovský et al, 2023). However, mistakes may be difficult to detect by the reader or by tools designed to detect AI-generated material. Analysis of their ChatGPT-generated manuscript using 2 AI detector software programs reported either inconclusive or unclear evidence that the manuscript was AI generated (Májovský et al, 2023).
Despite these limitations, there are several areas in which generative AI programs have the potential to positively impact science. One aspect of great potential utility is using programs such as ChatGPT to improve the writing of manuscripts, speed up routine administrative tasks, and ultimately help write manuscripts faster (Van Noorden and Perkel, 2023). Using generative AI to assist non-native English-speaking scientists in writing their manuscripts represents an opportunity to ease the dissemination of new studies and improve the efficiencies throughout the publication process. Generative AI has been proposed to be of benefit in writing research grants because it lessens the burden of grant writing (Parrilla, 2023). The ability to explore a new hypothesis and discover previously unknown relationships between seemingly independent observations is also a great potential application of generative AI.
Significant concerns have been raised that ChatGPT and other generative AI programs on balance will have a negative impact on scientific manuscripts; the peer review process; and, ultimately, the veracity and quality of scientific publications (Conroy, 2023a, Conroy, 2023b; Florindo, 2023; Liu and Brown, 2023; Stokel-Walker and Van Noorden, 2023). Despite these concerns, generative AI is already being used by scientists in a variety of ways. In a survey of 1600 scientists conducted by Nature, the limitations of generative AI were recognized, with over 50% stating that the results of using AI tools can promote bias or discrimination in results, 55% stating that fraud would be easier, and 53% stating that the use of these tools leads to more irreproducible research (Van Noorden and Perkel, 2023). Despite these misgivings, 40% of the researchers who used generative AI in their research found it essential or very useful compared with 18% of researchers who were not currently using generative AI. Looking forward over the next 10 years, approximately 75% of current users and 40% of nonusers of generative AI felt that using it in their work going forward will become essential and very useful (Van Noorden and Perkel, 2023). Perhaps the strongest evidence that the use of generative AI in scientific publications needs to be directly addressed follows from the work of Guillaume Cabanac. Cabanac has discovered the undeclared use of ChatGPT or other generative AI in over 12 articles published in peer-reviewed journals since April of 2023 (Conroy, 2023a, Conroy, 2023b). Many of these papers were detected owing to the authors’ failure to delete generative AI tag lines such as regenerate response, whereas others were detected owing to using nonexistent references. These findings indicate that the use of generative AI is active in the scientific community and that the issues of transparency and accountability for its use have not been adequately addressed.
However, generative AI is not the only threat to the quality and veracity of scientific publications. The problems of lack of accountability of authors, poor reproducibility or replicability of published results, and manipulated or falsified data have led to a record number of retracted papers owing to the discovery of significant intentional or unintentional errors. Nearly 300 papers per month were retracted in 2021, and it is suspected that this represents only a fraction of published work with significant yet unreported scientific errors (Errington et al, 2021; Ioannidis, 2012; Oransky, 2022).
Scientific journals have implemented many standards to help address these issues. For example, all authors must document their contributions to the work, possible or real conflicts of interest must be declared by all authors, most journals require that the methodology for PubMed searches in meta-analyses be published with the paper, and the CONSORT statement is required to be published with clinical trials reports (Butcher et al, 2022). All of these currently instituted measures and many others are designed to improve the transparency of methods and accountability of authors for their work. Likewise, journals are using software programs such as IThenticate (https://www.ithenticate.com/) to help detect plagiarism. There are also software programs that are utilized to identify potential manipulation of figures for further evaluation by journal staff. The rapid growth of generative AI platforms and their potential for both positive and negative impacts on the scientific literature require the scientific community to aggressively take steps to mitigate the potential damage of generative AI on the scientific literature.
In response to this growing challenge, many journals have developed policies for the use of generative AI and large language models (LLMs) (Brainard, 2023; Kaebnick et al, 2023). Most journals have stated that ChatGPT or other generative AI platforms cannot be listed as an author. This is an important statement and is critical to assessing who is accountable for the published work. Only humans can truly be accountable for published work, and they must be accountable for any misstatements when using generative AI. This is however clearly not enough. Science has decided to forbid the use of generative AI in any aspect of submitted manuscripts (Thorp, 2023). However, given the increasing use of generative AI by scientists, this seems to be a policy that will simply force authors to misrepresent when generative AI has been used and prohibit the use of the beneficial aspects of generative AI. Other journals have not prohibited all uses of generative AI but rather focused on self-declaration only when generative AI has been utilized. Some have required a description of the purpose and how the generative AI was incorporated into the manuscript (Kaebnick et al, 2023).
In developing our approach to the role of generative AI and LLMs in JID Innovations, we will be introducing specific requirements that we believe will directly make our authors, editors, and reviewers fully aware of the challenges and benefits of these tools; mandate accountability; and improve transparency about how generative AI is being used.
First, ChatGPT and other platforms of generative AI or LLM cannot be listed as authors. As we have discussed, there is no accountability of a computer program. The accountability of authors for all the work in the manuscript cannot be relieved by including a computer tool as an author. Second, all authors must state directly if a generative AI or LLM was used in any aspect of the preparation of the manuscript. It is our expectation that before this box is checked, the corresponding author will ensure that all authors’ contributions are consistent with that answer. All authors must be aware that similar to conflict of interest, we must be transparent about our use (or nonuse) of generative AI in our work. Third, if generative AI has been utilized, the authors must state which generative AI platform was utilized, state how the generative AI was utilized, and directly state the questions or instructions (prompts) that were used to interrogate the application. Was generative AI used to correct grammar and improve the readability of the first draft, or was it used to write an introduction (or ideas for the introduction) or the discussion of the manuscript? This requirement will increase the transparency of how the paper was written, help to demonstrate the input of the authors to the generative AI material, and help reviewers and readers evaluate the veracity of the manuscript. In this way, we will be requiring our authors to demonstrate accountability and transparency and to provide complete information needed to reproduce and replicate their work.
However, JID Innovations will also have added requirements for editors and reviewers of submitted manuscripts. Editors and reviewers will not be allowed to upload any portion of a manuscript onto a generative AI platform, and they will be required to state that they adhered to this policy in their reviews. This will help ensure that the confidentiality of the material submitted is maintained. We will also require that editors and reviewers document that they have not used generative AI to assess the validity of the manuscript. Currently, the lack of a clear and transparent source of information used to train generative AI programs precludes their use in assessing the quality of a manuscript under evaluation. Finally, JID Innovations will be actively exploring mechanisms to screen manuscripts for the use of generative AI in the writing of the text as well as in the generation of figures. At this time, these tools are helpful but imperfect. We will be proactive in examining manuscripts for signs of using generative AI and in evaluating new approaches to ensure the integrity of the work we publish.
Although no one can be certain of how generative AI will impact science and the publication of scientific results going forward, it is clear that we are at the very earliest stages of the interaction of generative AI and scientific publications. It is our collective job to be engaged and fully involved with the development of generative AI and how it is used in science. We need to fully understand the methodology, the source, and the veracity of the information used to train the applications and insist on full transparency when it is being utilized by our communities. It is up to us to work together to ensure that it does not damage the integrity or quality of what is published in our journals. These steps will be re-evaluated regularly and modified as required by the nature and capability of generative AI as it evolves. Alan Turing proposed that the test for proving that machines can think is that “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human” (emphasis added). It is our task to do all that we can to prevent such deception. We believe that active efforts to require transparency and accountability are needed. I would add that we should also consider a statement by Richard Feynman, “The first principle is that you must not fool yourself and you are the easiest person to fool,” to guide our individual evaluation and use of generative AI, the positive and negative impact of these tools, and the veracity and quality of their output.
Declaration of Artificial Intelligence and Large Language Models
During the preparation of this work, the author used ChatGPT 3 to generate text for the figures. The output and queries used are given in Figure 1 (generated on October 12, 2023) and Figure 2 (generated on October 19, 2023). These were created as examples of generative artificial intelligence/large language models.
Conflict of Interest
The author states no conflict of interest.
Footnotes
Cite this article as: JID Innovations 2023.100256
Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J, et al. Coding Inequity: assessing GPT-4’s potential for perpetuating racial and gender biases in healthcare. medRxiv 2023.
References
- Brainard J. Journals Take up arms against AI-written text. Science. 2023;379:740–741. doi: 10.1126/science.adh2762. [DOI] [PubMed] [Google Scholar]
- Butcher N.J., Monsour A., Mew E.J., Chan A.-W., Moher D., Mayo-Wilson E., et al. Guidelines for Reporting Outcomes in Trial Reports: the CONSORT-outcomes 2022 extension. JAMA. 2022;328:2252–2264. doi: 10.1001/jama.2022.21022. [DOI] [PubMed] [Google Scholar]
- Cellen-Jones R. Stephen Hawking warns artificial intelligence could end mankind. 2014. https://www.bbc.com/news/technology-30290540
- Clifford C. Google CEO: A.I. is more important than fire or electricity. 2018. https://www.cnbc.com/2018/02/01/google-ceo-sundar-pichai-ai-is-more-important-than-fire-electricity.html
- Conroy G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature. 2023;622:234–236. doi: 10.1038/d41586-023-03144-w. [DOI] [PubMed] [Google Scholar]
- Conroy G. 2023. Scientific sleuths spot dishonest ChatGPT use in papers.https://www.nature.com/articles/d41586-023-02477-w [DOI] [PubMed] [Google Scholar]
- Eberhart R.C., Dobbins R.W. Early neural network development history: the age of Camelot. IEEE Eng Med Biol Mag. 1990;9:15–18. doi: 10.1109/51.59207. [DOI] [PubMed] [Google Scholar]
- Errington T.M., Denis A., Perfito N., Iorns E., Nosek B.A. Challenges for assessing replicability in preclinical cancer biology. ELife. 2021;10 doi: 10.7554/eLife.67995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eschner K. 2017. Computers are great at chess, but that doesn't mean the game is 'solved.https://www.smithsonianmag.com/smart-news/what-first-man-lose-computer-said-about-chess-21st-century-180962046/ [Google Scholar]
- Florindo F. ChatGPT: a threat or an opportunity for scientists? Perspectives of Earth and Spac. 2023;4 [Google Scholar]
- Ioannidis J.P. Why science is not necessarily self-correcting. Perspect Psychol Sci. 2012;7:645–654. doi: 10.1177/1745691612464056. [DOI] [PubMed] [Google Scholar]
- Kaebnick G.E., Magnus D.C., Kao A., Hosseini M., Resnik D., Dubljević V., et al. Editors' statement on the responsible use of generative AI technologies in scholarly journal publishing. Hastings Cent Rep. 2023;53:3–6. doi: 10.1002/hast.1507. [published correction appears in Ethics Hum Res 2023;53;10] [DOI] [PubMed] [Google Scholar]
- Liu N., Brown A. AI increases the pressure to overhaul the scientific peer review process. Comment on "artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened". J Med Internet Res. 2023;25 doi: 10.2196/50591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Májovský M., Černý M., Kasal M., Komarc M., Netuka D. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened. J Med Internet Res. 2023;25 doi: 10.2196/46924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oransky I. Retractions are increasing, but not enough. Nature. 2022;608:9. doi: 10.1038/d41586-022-02071-6. [DOI] [PubMed] [Google Scholar]
- Parrilla J.M. ChatGPT use shows that the grant-application system is broken. Nature. 2023;623:443. doi: 10.1038/d41586-023-03238-5. [DOI] [PubMed] [Google Scholar]
- Stokel-Walker C., Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614:214–216. doi: 10.1038/d41586-023-00340-6. [DOI] [PubMed] [Google Scholar]
- Thorp H.H. ChatGPT is fun, but not an author. Science. 2023;379:313. doi: 10.1126/science.adg7879. [DOI] [PubMed] [Google Scholar]
- Turing A.M. I.—computing machinery and intelligence. Mind. 1950;LIX:433–460. [Google Scholar]
- Van Noorden R.P., Perkel J.M. AI and science: what 1,600 researchers think. Nature. 2023;621:672–675. doi: 10.1038/d41586-023-02980-0. [DOI] [PubMed] [Google Scholar]
- Milestones D. 1956. “Artificial Intelligence Coined at Dartmouth”.https://home.dartmouth.edu/about/artificial-intelligence-ai-coined-dartmouth; [Google Scholar]