The use of ChatGPT and other large language models in surgical science

Boris V Janssen; Geert Kazemier; Marc G Besselink

doi:10.1093/bjsopen/zrad032

editorial

. 2023 Mar 24;7(2):zrad032. doi: 10.1093/bjsopen/zrad032

The use of ChatGPT and other large language models in surgical science

Boris V Janssen ^1,^2,^✉, Geert Kazemier ^3,⁴, Marc G Besselink ^5,⁶

PMCID: PMC10037421 PMID: 36960954

ChatGPT, a large language model (LLM), was released on November 30, 2022. This model can generate seemingly intelligent writing by comprehensively answering prompts after being trained on vast amounts of text data¹. Its disruptive potential quickly caught the public’s attention, leading to many using it for writing tasks. As a result, media outlets have provided extensive coverage of ChatGPT and other LLMs, sparking discussions about their potential uses and controversies^2,3. While some view LLMs as valuable tools that could revolutionize science, others express scepticism and concern. However, their potential impact on surgical science remains unexplored. In this article, the potential role of LLMs like ChatGPT in surgical science, from the authors’ perspective as surgical scientists involved in both clinical trials and artificial intelligence-based surgical research, will be examined.

ChatGPT and similar models are a type of artificial intelligence that are designed to understand and generate natural language text. They belong to the category of transformer-based neural networks and are trained on a vast amount of text data, including books, articles, and websites. During the training process, the model analyses the text to learn the structure and patterns of the language. Once trained, the model can be used to generate new text that is comparable in style and content to the training data, based on the given prompts.

There are several ways that LLMs can be applied in surgical science. One of the most noteworthy is for writing tasks, which can enhance the productivity and efficiency of surgeon-scientists and editors. Additionally, in theory, LLMs have the potential to be used for data extraction and clinical decision-making.

Currently, LLMs can be used to generate drafts for various types of written materials, including study ideas, research protocols, manuscripts, grant proposals, instructional materials, and patient education materials. For example, a prompt to an LLM could be: ‘Write an outline for a grant application on the role of operation (X) for patients with (Y)’ (Fig. 1). LLMs can also be used to enhance the quality of existing text and eliminate errors, particularly for non-native speakers. This can ensure that written materials produced in surgical science are of higher quality and more understandable for the intended audience. For example, a prompt to the LLM could be: ‘Improve the text in this manuscript draft on (X): (text)’. In future, this text improvement could also be applied to clinical notes, such as surgical reports, if LLMs are integrated into an electronic health record environment.

Fig. 1 — Example of the use of ChatGPT for surgical science

In theory, if LLMs were integrated into clinical environments, they could be used to extract information from electronic health records or other data repositories. This would automate data collection for research purposes and potentially mitigate the challenges associated with manual extraction, such as time consumption and human error. For example, a researcher might prompt the system to extract surgical outcome measures from a surgical report. Additionally, LLMs could be used to answer clinician queries directly by analysing the available patient data and providing a comprehensive answer. This could promote more efficient patient management, such as when a clinician needs a summary of a patient’s therapy proceedings.

To consider the potential of LLMs in clinical decision-making, it is important to recognize the significance of language models like PubMed GPT and BioGPT, which are trained on medical knowledge^4,5. In a mature setting, LLMs could be utilized in clinical workflows, where they automatically evaluate patient information and produce a patient management plan that surgeons can use as a reference point or consider in their decision-making process. This has the potential to streamline the decision-making process, increase efficiency, and ensure that patients receive optimal care.

Although LLMs have the potential to be highly beneficial, there are several important considerations to keep in mind. For example, in terms of text generation and improvement, it is crucial for surgical scientists to understand the limitations of current LLMs. One of the most significant issues is known as ‘neural hallucinations,’ where the model generates text that is factually incorrect or nonsensical, despite appearing confident in its ability⁶. For instance, an LLM might suggest using churros, a type of fried pastry, as surgical instruments due to their size and flexibility⁷. Additionally, because LLMs are trained on specific datasets, there is a risk of introducing biases into the model’s output. Therefore, it is essential to thoroughly evaluate the output of an LLM before incorporating it into any work. When using these models for writing tasks, surgeon-scientists should be aware of these limitations and carefully check the validity of the model’s output.

To ensure the reliability and safety of LLMs, it is important to consider the potential issues that may arise when using them for theoretical applications. For example, when using LLMs for research data extraction, it is possible that the model may provide incorrect or biased conclusions, leading to low-quality data. Similarly, when LLMs are employed in patient management, it is necessary to approach their application with caution due to the ethical and legal implications. In the event that clinicians base their decisions on information provided by an LLM, it is unclear who bears the responsibility if something goes wrong. Therefore, once LLMs are implemented in clinical settings, it is critical to establish clear guidelines and rules for their use.

In conclusion, language models like ChatGPT have numerous potential applications in surgical science, ranging from text generation and improvement to data extraction and clinical decision-making. The use of these models can support surgeon-scientists in various areas, including writing, data collection, and even patient management. However, it is important to keep in mind that language models are not flawless and should be used in these areas with caution. As the field continues to advance, it will be essential to monitor developments and assess the impact of language models on the surgical science field. Ultimately, with responsible and thoughtful implementation, these models have the potential to be a valuable tool in surgical science and clinical care by augmenting, not replacing, human expertise.

Acknowledgements

ChatGPT was used during the writing of this article by providing text outline and editing.

Contributor Information

Boris V Janssen, Department of Surgery, Amsterdam UMC, Location University of Amsterdam, Amsterdam, The Netherlands; Cancer Center Amsterdam, Amsterdam, The Netherlands.

Geert Kazemier, Cancer Center Amsterdam, Amsterdam, The Netherlands; Department of Surgery, Amsterdam UMC, Location Vrije Universiteit, Amsterdam, The Netherlands.

Marc G Besselink, Department of Surgery, Amsterdam UMC, Location University of Amsterdam, Amsterdam, The Netherlands; Cancer Center Amsterdam, Amsterdam, The Netherlands.

Disclosure

The authors declare no conflict of interest.

References

1. van Dis EAM, Bollen J, Zuidema W, van Rooij R, Bockting CL. ChatGPT: five priorities for research. Nature 2023;614:224–226 [DOI] [PubMed] [Google Scholar]
2. Roose K. The Brilliance and Weirdness of ChatGPT: The New York Times; 2022. https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html (accessed 18 January 2023)
3. Treppner F. Künstliche Intelligenz Schreibt Wie Ein Mensch: Bild; 2022. https://www.bild.de/digital/2022/digital/muss-google-zittern-kuenstliche-intelligenz-schreibt-wie-ein-mensch-82129948.bild.html (accessed 18 January 2023)
4. Venigalla A. PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text: Mosaic ML; 2023. https://huggingface.co/stanford-crfm/pubmedgpt (accessed 18 January 2023)
5. Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon Het al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform 2022;23:bbac409. [DOI] [PubMed] [Google Scholar]
6. Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Yet al. Survey of hallucination in natural language generation. ACM Comput Surv 2022;55:1–38. DOI:10.1145/3571730 [Google Scholar]
7. Marcus G. How Come GPT Can Seem so Brilliant One Minute and so Breathtakingly Dumb the Next? Gary Marcus; 2022. https://garymarcus.substack.com/p/how-come-gpt-can-seem-so-brilliant (accessed 18 January 2023)

[zrad032-B1] 1. van Dis EAM, Bollen J, Zuidema W, van Rooij R, Bockting CL. ChatGPT: five priorities for research. Nature 2023;614:224–226 [DOI] [PubMed] [Google Scholar]

[zrad032-B2] 2. Roose K. The Brilliance and Weirdness of ChatGPT: The New York Times; 2022. https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html (accessed 18 January 2023)

[zrad032-B3] 3. Treppner F. Künstliche Intelligenz Schreibt Wie Ein Mensch: Bild; 2022. https://www.bild.de/digital/2022/digital/muss-google-zittern-kuenstliche-intelligenz-schreibt-wie-ein-mensch-82129948.bild.html (accessed 18 January 2023)

[zrad032-B4] 4. Venigalla A. PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text: Mosaic ML; 2023. https://huggingface.co/stanford-crfm/pubmedgpt (accessed 18 January 2023)

[zrad032-B5] 5. Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon Het al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform 2022;23:bbac409. [DOI] [PubMed] [Google Scholar]

[zrad032-B6] 6. Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Yet al. Survey of hallucination in natural language generation. ACM Comput Surv 2022;55:1–38. DOI:10.1145/3571730 [Google Scholar]

[zrad032-B7] 7. Marcus G. How Come GPT Can Seem so Brilliant One Minute and so Breathtakingly Dumb the Next? Gary Marcus; 2022. https://garymarcus.substack.com/p/how-come-gpt-can-seem-so-brilliant (accessed 18 January 2023)

PERMALINK

The use of ChatGPT and other large language models in surgical science

Boris V Janssen

Geert Kazemier

Marc G Besselink

Fig. 1.

Acknowledgements

Contributor Information

Disclosure

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The use of ChatGPT and other large language models in surgical science

Boris V Janssen

Geert Kazemier

Marc G Besselink

Fig. 1.

Acknowledgements

Contributor Information

Disclosure

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases