Dear Editor,
Thank you for allowing us to respond to the letter written on the "Performance of Generative Artificial Intelligence in Dental Licensing Examinations".1 We would like to express our gratitude to Hinpetch Daungsupawong and Viroj Wiwanitkit for their interest and comments. We are pleased to address these comments and engage in further discussion regarding the findings.
We agree with the suggestion to expand the evaluation to include domains and question types other than multiple-choice questions (MCQs) to obtain a more comprehensive understanding of Generative Artificial Intelligence (GenAI)’s proficiency, as mentioned in our discussion for future study direction. In this particular study, we chose to use MCQs from selected dental licensing examinations as the measurement metrics because they allowed for an objective assessment of the correctness of GenAI responses. This approach helped avoid potential human errors in analysing the content of answers.2,3 One relevant aspect to consider is Artificial hallucination, which is a phenomenon where the GenAI generates information that appears realistic but is, in fact entirely fabricated without any factual foundation.4,5 However, future research could incorporate open-ended questions and practical scenarios to test GenAI in contexts that require higher-order cognitive skills.6
Regarding the degree of difficulty of the MCQs used in the testing, we used questions from two dental licensing examinations that covered a broad range of dental subjects. These examinations are considered benchmarks for evaluating the desired level of expertise following professional training. The passing rates of these examinations are determined by a panel of education and clinical experts, serving as an indicator of the expected performance of human test-takers. Additionally, it is important to acknowledge that human performance can be affected by variables such as test-taking strategies and other subjective elements, which may impact the outcomes.7,8 In future studies, it may be worthwhile to consider integrating psychometric analyses to provide a more nuanced understanding of GenAI's performance in relation to question complexity.9
With advancements in neural network architecture, including the inclusion of multimodal functionality, there is an expectation that GenAI's performance will continue to improve, as demonstrated in the current study where the newer version of ChatGPT performs better.10,11 The quality and quantity of the training dataset used to train GenAI also significantly impact its performance. The incorporation of a web-browsing function enables ChatGPT 4.0 to update its database. However, it is uncertain if it can access subscription-based databases, which comprise the majority of dental journals and textbooks. It is anticipated that specialized medical and dental GenAI may be developed in the future, potentially exhibiting better performance than the current general-purpose GenAI. Moreover, the ability of GenAI to interpret and provide feedback on graphical and non-text elements is an area that holds promise for future development.12,13
In line with the results of the current study, ChatGPT has demonstrated the ability to pass the medical, legal, and business examinations.9,10 The authors believe that GenAI will be used by both dental professionals and the general public in the clinical settings as well as dental training and education. We agree that it is crucial to consider the ethical implications of integrating GenAI in clinical and educational settings.14 The present study serves as a prompt for discussion about responsible integration, and we advocate for the development of ethical guidelines to ensure that GenAI functions as an aid that complements human judgment rather than replacing it. In this regard, international organizations such as the World Health Organization (WHO)15 and the World Dental Federation(FDI)16 have made efforts to address the ethical issues in association with AI in healthcare.
Yours Sincerely.
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Chau RCW, Thu KM, Yu OY, Hsung RT-C, Lo ECM, Lam WYH. Performance of generative artificial intelligence in dental licensing examinations. Int Dent J. 2023;73(5):724–730. doi: 10.1016/j.identj.2023.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9(1):e45312. doi: 10.2196/45312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yanagita Y, Yokokawa D, Uchida S, Tawara J, Ikusaka M. Accuracy of ChatGPT on medical questions in the national medical licensing examination in Japan: evaluation study. JMIR Formative Res. 2023;7:e48023. doi: 10.2196/48023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Salvagno M, Taccone FS, Gerli AG. Artificial intelligence hallucinations. Critical Care. 2023;27(1):1–2. doi: 10.1186/s13054-023-04473-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15(2):e35179. doi: 10.7759/cureus.35179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: an observational study. Medicine. 2023;102(32):e34673. doi: 10.1097/MD.0000000000034673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hulman A, Dollerup OL, Mortensen JF, et al. ChatGPT- versus human-generated answers to frequently asked questions about diabetes: a turing test-inspired survey among employees of a Danish diabetes center. PLOS One. 2023;18(8) doi: 10.1371/journal.pone.0290773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li SW, Kemp MW, Logan SJS, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol. 2023;229(2):172. doi: 10.1016/j.ajog.2023.04.020. e1-e12. [DOI] [PubMed] [Google Scholar]
- 9.Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digital Health. 2023;2(2) doi: 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. arXiv preprint arXiv:230308774; 2023. Gpt-4 technical report. [Google Scholar]
- 11.Mok A. OpenAI is rolling out a game-changing feature to ChatGPT this week that could revolutionize how we use the internet: Yahoo!Finance; 2023 Available from: https://finance.yahoo.com/news/openai-rolling-game-changing-feature-172512274.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAKf9JxbdVyVointFQxCY85WQOkeL6eZNhZkW5L9tZio1cItBu_JTx-NUS4gPn8KZx-c39oTOrC0nkoirJYPIRcnzXVixOLFf6ixTzG1GY2GPk_GTtKopWALu0rCLbZqdfPqOFDOZCxXF7Ia-1zIbPaXkjOeLuFAch2Ya3kS-tyaS Accessed 25 January 2024.
- 12.Chau RCW, Hsung RT-C, McGrath C, Pow EHN, Lam WYH. Accuracy of artificial intelligence-designed single-molar dental prostheses: a feasibility study. J Prosthet Dent. 2023;S0022-3913(22):00764–00768. doi: 10.1016/j.prosdent.2022.12.004. [DOI] [PubMed] [Google Scholar]
- 13.Chau RCW, Li G-H, Tew IM, Thu KM, McGrath C, Lo W-L, et al. Accuracy of artificial intelligence-based photographic detection of gingivitis. Int Dent J. 2023;73(5):724–730. doi: 10.1016/j.identj.2023.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Thurzo A, Strunga M, Urban R, Surovková J, Afrashtehfar KI. Impact of artificial intelligence on dental education: a review and guide for curriculum update. Educ Sci. 2023;13(2):150. [Google Scholar]
- 15.World Health Organization . World Health Organization; Geneva, Switzerland: 2021. Ethics and governance of artificial intelligence for health: WHO guidance. [Google Scholar]
- 16.Falk Schwendicke MB, Uribe Sergio, Cheung William, Verma Mahesh, Linton Jina, Kim Young Jun. World Dental Federation (FDI); Geneva, Switzerland: 2023. Artificial Intelligence for dentistry. [Google Scholar]