Abstract
Artificial intelligence (AI)-generated content detectors are not foolproof and often introduce other problems, as shown by Desaire et al. and Liang et al. in papers published recently in Patterns and Cell Reports Physical Science. Rather than “fighting” AI with more AI, we must develop an academic culture that promotes the use of generative AI in a creative, ethical manner.
Artificial intelligence (AI)-generated content detectors are not foolproof and often introduce other problems, as shown by Desaire et al. and Liang et al. in papers published recently in Patterns and Cell Reports Physical Science. Rather than “fighting” AI with more AI, we must develop an academic culture that promotes the use of generative AI in a creative, ethical manner.
Main text
Large language models (LLMs) such as OpenAI’s generative pre-trained transformer (GPT) are one of the most disruptive technologies we have witnessed of late. For years, LLMs were used behind the scenes in applications such as search engine auto-complete or machine translation.1 OpenAI changed that, introducing the public to its GPT-driven chatbot, highlighting its “human-level” linguistic aptitude and ability to interact in real time. In just 1 month, ChatGPT became an Internet sensation, with users asking it to create anything from sonnets to legal briefs to computer code. Along with the excitement, however, came the “alarm bells,” and growing public discourse about the impact of this technology on our lives and livelihoods.
Generative artificial intelligence (AI) marks a turning point for human-AI symbiosis. Other AI systems produce closed-ended outputs such as decisions, ranked lists, or descriptions. In contrast, generative systems produce or synthesize new content that is designed to be plausible with respect to the data on which they were trained. These systems, such as ChatGPT or DALL-E, can convincingly mimic fundamental aspects of human perception and communication. It is of note that prior to the rise of generative AI, there were already “cognitively inspired” systems for performing linguistic and visual tasks, and some of them (e.g., image recognition2) have reached or surpassed human-level performance. Despite this, there was relatively little discussion on whether these earlier technologies would replace human knowledge workers. What has changed with generative AI?
We can envision language and vision AI systems as lying on a spectrum, from those that produce closed-ended responses to those that produce open-ended responses. Figure 1 depicts example systems. In the case of language understanding, all three systems take as input a text to analyze and produce a response: text classification (systems that classify the text according to a characteristic of interest, such as whether it has positive, neutral, or negative sentiment), text summarization (systems that produce a shorter description of the input text), and LLMs (systems that predict, based on the input, what words would constitute an appropriate response). In sum, while all analyze diverse input (i.e., a text of interest to the user, which could be anything), the nature of the outputs varies from being very controlled to open-ended and unpredictable.3 The same can be observed in the examples for visual understanding. The open-ended and unpredictable nature of generative AI behaviors is why they are often perceived as “human like.”
Figure 1.
Example AI systems used directly and indirectly by end users for language and vision tasks, from closed ended to open ended
It was always anticipated that AI would transform the way we get things done, enabling us to become more efficient and focus on creative, rather than repetitive, tasks. However, visionaries such as Licklider4 described the augmentation of human intelligence rather than the replacement of knowledge workers. With generative AI, we are now asking how those in elite professions, such as computer programmers, attorneys, or marketing executives, will be impacted and potentially even replaced.
There is also a concern as to how generative AI will affect science and education. A recent study by Gao et al.5 found that academics could not consistently distinguish between medical research abstracts generated by ChatGPT versus humans. Another study on the integration of ChatGPT into education described both excitement about its transformational effects but also fears surrounding the possibility of rampant cheating.6 To protect academic integrity, there is a need to distinguish scholarship that is the genuine product of human intellect from AI-generated content. Many tools for automatically detecting AI-generated text have emerged. These approaches, however, have many limitations and often introduce additional problems. Two recent articles published in Patterns and Cell Reports Physical Science shed light on this vicious circle.
Citing the need to evaluate detectors of scientific texts, Desaire and colleagues7 evaluated OpenAI’s RoBERTa GPT-2 Detector. They also introduced a competing method, which takes advantage of the linguistic idiosyncrasies of academic writers. They demonstrated that it is straightforward to build a GPT detector using off-the-shelf tools and basic data science skills. However, they describe a deeper problem—the “arms race” between the LLMs and the detectors. ChatGPT is constantly collecting data from the public and learning to please its users; eventually, it will learn to outsmart any detector.
Liang et al.8 demonstrated that GPT detectors introduce other concerns, such as social bias. In an evaluation of seven detectors, the authors assessed their ability to correctly classify a set of English texts written by native Chinese speakers and another set written by native English speakers. The essays of Chinese writers had a much greater false positive rate (i.e., were wrongly classified as AI generated) compared with those of English writers. The authors explained that non-natives are penalized because they have more limited linguistic expression, resulting in lower perplexity, a feature exploited by the detectors. They then used ChatGPT to enrich the Chinese-written essays, which resulted in reduced false positives.
We must protect academic integrity from the misuse of generative AI. Purely technical solutions, however, only create more problems. The cultivation of a culture promoting the creative, transparent use of these tools is the crucial issue. As shown by Liang et al., ChatGPT can be used in positive ways, enabling non-native English speakers to enrich their linguistic expressions. Thus, it might help level the playing field between scholars of different linguistic backgrounds who publish in English. The most important consideration is not developing the next generation of AI-based AI detectors but rather defining the parameters of ethical use of these tools in academic and scientific work. It may seem like a daunting task, but it is not the first time that we have faced disruptive technologies. After all, it was only a few decades ago that the use of pocket calculators in the classroom was considered controversial, whereas today, they are essential tools for developing students’ more advanced mathematical skills.9 The key is in finding ways to keep the use of generative AI transparent and creative as to avoid cheating and to enhance, rather than replace, human intellectual pursuits.
Acknowledgments
This work received funding from the Cyprus Research and Innovation Foundation under grant EXCELLENCE/0421/0360 (KeepA(n)I, the EU’s Horizon 2020 Research and Innovation Programme under grant agreement no. 739578 (RISE), and the government of the Republic of Cyprus through the Deputy Ministry of Research, Innovation and Digital Policy.
Declaration of interests
The author declares no competing interests.
References
- 1.Li J., Tang T., Zhao W.X., Wen J.R. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Survey Track. 2021. Pretrained Language Models for Text Generation: A Survey; pp. 4492–4499. [Google Scholar]
- 2.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE International Conference on Computer Vision. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification; pp. 1026–1034. [Google Scholar]
- 3.Ganguli D., Hernandez D., Lovitt L., DasSarma N., Henighan T., Jones A., Joseph N., Kernion J., Mann B., Askell A., et al. 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. Predictability and surprise in large generative models; pp. 1747–1764. [Google Scholar]
- 4.Licklider J.C.R. Man-computer symbiosis. IRE Trans. Hum. Factors Electron. 1960:4–11. [Google Scholar]
- 5.Gao C.A., Howard F.M., Markov N.S., Dyer E.C., Ramesh S., Luo Y., Pearson A.T. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit. Med. 2023;6:75. doi: 10.1038/s41746-023-00819-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tlili A., Shehata B., Adarkwah M.A., Bozkurt A., Hickey D.T., Huang R., Agyemang B. What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learn. Environ. 2023;10:15. [Google Scholar]
- 7.Desaire H., Chua A.E., Isom M., Jarosova R., Hua D. Distinguishing academic science writing from humans of ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Rep. Phys. Sci. 2023;4 doi: 10.1016/j.xcrp.2023.101426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liang W., Yuksekgonul M., Mao Y., Wu E., Zou J. GPT Detectors Are Biased against Non-native English Writers. Patterns. 2023;4 doi: 10.1016/j.patter.2023.100779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ellington A.J. A meta-analysis of the effects of calculators on students' achievement and attitude levels in precollege mathematics classes. J. Res. Math. Educ. 2003;34:433–463. [Google Scholar]