Skip to main content
Acta Orthopaedica et Traumatologica Turcica logoLink to Acta Orthopaedica et Traumatologica Turcica
editorial
. 2024 Jan 1;58(1):1–3. doi: 10.5152/j.aott.2024.130224

Beware of Artificial Intelligence hallucinations or should we call confabulation?

Haluk Berk 1
PMCID: PMC11059964  PMID: 38525503

With the advent of generative artificial intelligence (AI), seemingly user-friendly experiences, immediate access to access to information, and enhanced societal productivity boomed. OpenAI (https://openai.com L.L.C., San Francisco, CA, USA) introduced an AI chatbot ChatGPT in November 2022, that utilizes the GPT-3 and GPT-4 Large Language Models (LLMs). A large language model is an algorithm trained on vast number of sources that can read and generate “natural language” text, yet LLMs are not flawless, sometimes providing incorrect, biased, or entirely fabricated responses.1 A major problem is false information produced by an AI bot that might mislead, misinform, or slander someone.

In academic literature, AI researchers refer to these mistakes as “hallucinations.” However, using hallucination to describe a machine error became controversial due to the fact that scholars feel it “humanizes” AI models. Generative AI is so new that such highly technical problems that LLM face with, could be explained by using metaphors from existing ideas. Benj Edwards feels the term “confabulation,” although far from being perfect, is a better metaphor than “hallucination.”2

Be it “hallucination” or “confabulation”, the occurrence of such an error would have serious consequences. Although there are warnings visible at the opening pages of most generative AI applications one would not expect to witness in an academic search full of fabricated information.

For me, it all started when I took the advice and logged in to ResearchBuddy (https://researchbuddy.app/search) simply because it was claimed to be more accurate on academic search (Figure 1). Impressive, well-structured 12 pages review came out after couple of seconds. Each section was supported by articles seemingly published at prestigious journals in the field (Figure 2). Highly convincing summary and conclusion followed each reference. Having done a detailed PubMed search, I was dismayed to see, to my amazement and surprise, that the articles had not actually been published. The summary and remark on the fictitious article were even worse.

Figure 1.

Figure 1.

Search result page (https://researchbuddy.app/search).

Figure 2.

Figure 2.

Key literature selected by ResearchBuddy and related summary supported by conclusions.

CEO of OpenAI Sam Altman’s earlier tweet also supports ChatGPT has the very same problem (Figure 3).

Figure 3.

Figure 3.

Sam Altman’s tweet on December 11th, 2022.

Borji provided a thorough overview of the risks associated with using ChatGPT, which included creating false material, running the danger of bias and discrimination, lacking transparency and dependability, cybersecurity issues, moral ramifications, and social consequences.3

Sallam et al., in their systematic review included 60 eligible records and showed that ethical concerns were commonly mentioned (55.0%), especially in the context of risk of bias and plagiarism. Other concerns were; the risk of incorrect/inaccurate information, (33.3%); citation/reference inaccuracy (16.7%); transparency issues.1

In conclusion Sallam summarizes his suggestions in 5 steps: 1. Utilize AI-identifying software; 2. Check references; 3. For prospective trials, check ClinicalTrials.gov.; 4. Require an attestation of LLM or other AI involvement. 5. Require authors to provide open-source data upon request.1

In a study conducted by Brameier et al. they probed ChatGPT and wrote two articles showing that LLM algorithms can generate text of publishable quality. I shall quote their conclusion which I find extremely important: “Like any tool, LLMs could be used to improve the quality of papers that are written based on real data, particularly for surgeon-scientists who experience challenges with writing, such as dyslexia or English as a second language. However, the use of LLMs in academic publishing also raises 2 considerable risks: the malicious use of AI to generate fake papers and the incorporation of hallucinations in otherwise valid papers written with the aid of AI.”4

References


Articles from Acta Orthopaedica et Traumatologica Turcica are provided here courtesy of Turkish Association of Orthopaedics and Traumatology

RESOURCES