Abstract
Background
Large language models such as ChatGPT are increasingly used by patients seeking perioperative information, yet their reliability for anesthesia-related patient education remains insufficiently evaluated. This study assessed the quality of ChatGPT-4.0 responses to frequently asked anesthesia questions using a multi-rater evaluation framework.
Methods
Twenty-two common anesthesia-related patient questions were identified through online search. Each question was submitted once to ChatGPT-4.0 (GPT-4-turbo; chat.openai.com) without follow-up prompts. Five anesthesiology and reanimation specialists—each with more than 20 years of experience—independently evaluated each response using a validated 4-point Likert-type scale (1 = excellent; 4 = unsatisfactory). Inter-rater reliability was calculated using a two-way random-effects model (ICC[2,1]).
Results
A total of 110 ratings were collected. Among these, 61.8% were classified as excellent, 32.7% as satisfactory requiring minimal clarification, and 5.5% as satisfactory requiring moderate clarification. No responses were rated as unsatisfactory. Mean scores for individual questions ranged from 1.0 to 2.4. Reviewer-wise averages ranged from 1.27 to 1.73, indicating generally positive evaluations with modest variability in scoring strictness. The overall inter-rater reliability was poor to fair (ICC = 0.25).
Conclusions
ChatGPT-4.0 provided high-quality responses to frequently asked patient questions about anesthesia and may serve as a supportive digital health tool for patient education. However, limited agreement among evaluators highlights the need for expert oversight and contextual refinement when integrating large language models into clinical communication pathways.
Keywords: ChatGPT, large language models, digital patient education, anesthesia, artificial intelligence, digital health
Introduction
Anesthesia has been successfully applied for many years to temporarily eliminate pain and consciousness during surgical and medical procedures. Approximately 313 million surgical procedures are performed worldwide every year and a large proportion of these procedures are performed under anesthesia. 1 Especially in developed countries, approximately 40 million anesthesia applications are performed annually. 2 These data show that anesthesia is an indispensable component of modern medicine and that people are very often associated with anesthesia.
ChatGPT is an artificial intelligence (AI) chatbot developed by OpenAI that has the capability of responding to complex queries in an interactive, conversation-based format. Released to the public in November 2022, it is the fastest-growing consumer application in history, having surpassed 100 million users by January 2023. 3 With a user-friendly interface, ChatGPT has wide- ranging implications for impacting healthcare delivery and patient education. 4 The utilization of AI in healthcare systems is crucial and imperative due to its ability to enhance precision and accuracy while reducing the time required for various aspects of the system. 4
Despite the growing body of literature evaluating ChatGPT performance across various medical specialties, anesthesia-related patient education represents a distinct and underexplored knowledge domain. Unlike specialties such as orthopedics, ophthalmology, or hepatology—where patient questions often focus on disease-specific management or procedural outcomes—anesthesia-related inquiries are closely linked to perioperative anxiety, safety concerns, and risk perception. Misinterpretation or inaccurate information in this context may have immediate implications for patient trust, informed consent, and perioperative decision-making. Therefore, assessing the quality of large language model–generated responses specifically within the field of anesthesia is particularly important and addresses a meaningful gap in the existing digital health literature.
Searching for information about health and disease on the internet has become a natural part of life and coping with a disease today. 5 For this reason, it has become common for patients to use the internet and artificial intelligence applications to get information about anesthesia, which is the most common condition faced by patients. In the light of the information in the literature, the authors planned this study based on the hypothesis that ChatGPT will provide satisfactory answers about the questions frequently asked by patients about anesthesia.
Methods
The query “patients’ questions about anasthesia” was searched on Google (www.google.com). A thorough review of “frequently asked questions” sections of numerous clinics was documented. Common questions were identified through a review of frequently asked questions sections on anesthesia-related clinic websites. Question selection was performed by the study authors, all of whom are anesthesiology and reanimation specialists with long-standing clinical experience and academic appointments. The same group of anesthesiologists subsequently evaluated the ChatGPT responses to ensure domain-specific consistency and expertise. To ensure reproducibility, websites were included if they provided publicly accessible, patient-facing educational content related to anesthesia or perioperative care. Questions addressing general anesthesia concepts, perioperative safety, preparation, and recovery were included, whereas highly procedure-specific or institution-specific questions were excluded. The final set of 22 questions was predefined prior to ChatGPT evaluation, and no questions were modified for wording or language after selection.
The questions were submitted to ChatGPT using the GPT-4-turbo model, OpenAI's optimized and premium version of GPT-4, through the official interface (chat.openai.com) on May 1, 2025. No follow-up prompts or clarifications were provided. A modified version of the rating system originally introduced by Mika et al. was used to assess the quality of responses. 6 This system has previously been applied in studies evaluating ChatGPT's performance on topics such as hip arthroplasty hip arthroscopy and ulnar collateral ligament reconstruction.6–8 Each response was independently assessed by the authors following an evidence-based review strategy. No additional prompts, instructions, or contextual information were provided to ChatGPT beyond the questions themselves. Each question was entered in a separate and new chat session to minimize potential cross-contamination between responses.
The rating scale assigns scores of 1 to 4:
“Excellent response not requiring clarification": Responses did not include any information that contradicted current literature.
“Satisfactory requiring minimal clarification": Responses requiring minimal clarification lacked additional information or detail.
“Satisfactory requiring moderate clarification": Responses requiring moderate clarification had outdated or irrelevant information.
“Unsatisfactory requiring substantial clarification": Responses had either incorrect or overly generalized information that could be misinterpreted.
In this study, each of the 22 questions posed to ChatGPT was evaluated by five independent anesthesiologists and reanimation specialists. All of the anesthesiologists and reanimation specialists who evaluated the ChatGPT responses had over 20 years of professional experience. To determine the inter-observer agreement, the intraclass correlation coefficient (ICC) was calculated using a two-way random-effects model with absolute agreement (ICC[2,1]). This method is suitable for assessing consistency among multiple raters evaluating the same set of items. The ICC was selected to quantify inter-rater reliability among multiple evaluators rating the same set of responses. A two-way random-effects model with absolute agreement was chosen because the raters were considered representative of a broader population of clinicians, and because absolute agreement rather than consistency was of primary interest.
Descriptive statistics were calculated using mean values with corresponding standard deviations, as all ratings were complete and no missing data were observed. Inter-rater reliability was interpreted according to commonly used thresholds, with ICC values <0.50 indicating poor reliability, 0.50–0.75 moderate reliability, and >0.75 good reliability. Statistical analyses were performed using SPSS software (version 29.0; IBM Corp., Armonk, NY, USA).
Results
The questions and answers of ChatGPT 4.0 are shown in Table 1.
Table 1.
Frequently asked questions about anesthesia and answers from ChatGPT.
| Question number | Frequently asked question | ChatGPT answer |
|---|---|---|
| 1 | What is the anaesthesia ? | Anesthesia is a medical treatment that prevents pain and awareness during surgery or other procedures. It works by blocking signals between your body and brain. There are three main types:
|
| 2 | What type of anesthesia will I receive? | That depends on the type of surgery you're having, your overall health, and your personal preferences. In general, there are three main types of anesthesia:
|
| 3 | Are there risks associated with anesthesia? | Yes, like any medical procedure, anesthesia carries some risks — but serious complications are rare, especially with modern techniques and careful monitoring. Common, mild side effects include:
|
| 4 | How will I feel after the anesthesia? | After anesthesia, most people feel drowsy, groggy, or a bit confused at first — this is completely normal and usually lasts a few hours. Here's what you can typically expect:
|
| 5 | How long will I be under anesthesia? | The duration of anesthesia depends entirely on the type and length of your surgery.
|
| 6 | Can I eat or drink before the surgery? | No — for your safety, you should not eat or drink before surgery. Typically, we follow these general guidelines (unless your doctor gives you different instructions):
Your surgical team will give you specific fasting instructions based on your case. It's very important to follow them closely. |
| 7 | What if I have allergies or medical conditions? | That's very important information — and it's something we always ask about before surgery. If you have allergies (to medications, latex, foods, etc.) or medical conditions like asthma, diabetes, heart disease, or previous reactions to anesthesia, your anesthesiologist will carefully review them before the procedure. These details help us:
|
| 8 | Who will be monitoring me during the procedure? | During your procedure, you’ll be continuously monitored by a board-certified anesthesiologist or a certified nurse anesthetist (CRNA) — often both. They will closely track your:
You’re never alone — your safety is our top priority from start to finish. |
| 9 | What happens if I wake up during the surgery? | That's a very common fear, but waking up during surgery is extremely rare — especially under general anesthesia. The condition is called anesthesia awareness, and it occurs in about 1 to 2 cases per 1000 surgeries with general anesthesia, often during emergency or high-risk procedures. To prevent this, we:
Your anesthesiologist's job is to keep you comfortable, safe, and unaware throughout the procedure. |
| 10 | How long will I stay in the recovery room? | Most patients stay in the recovery room — also called the Post-Anesthesia Care Unit (PACU) — for about 1 to 2 h, depending on:
|
| 11 | When will I be able to return to normal activities? | That depends on the type of anesthesia and the surgery you had. In general:
You should not drive, make legal decisions, or operate machinery for at least 24 h after receiving general anesthesia or sedation. Your care team will give you specific guidelines before you leave — always follow them closely to ensure a safe and smooth recovery |
| 12 | What should I do if I’m nervous about the anesthesia? | It's completely normal to feel nervous — and you’re not alone. Many people share the same concern. Here's what you can do:
|
| 13 | How Can I Prepare in the Days and Weeks Before My Surgery? | Good preparation can help reduce risks and support a smooth recovery. Here's how you can prepare in the days and weeks before your surgery: 1. Share your full medical history Tell your doctor and anesthesiologist about:
You may be asked to:
|
| 14 | Can I Take My Medications On Surgery Day? | It depends on the specific medications you take — some should be continued, while others may need to be paused. Common medications that are often continued:
Ask your anesthesiologist exactly which ones to take the morning of surgery (usually with a sip of water) Never guess — follow the medical team's specific instructions to stay safe |
| 15 | Can I Stay Awake During Surgery and Watch? | In some cases, yes — you can stay awake during surgery, but it depends on the type of procedure and type of anesthesia used. You may stay awake if:
However: For more complex or invasive surgeries, general anesthesia is required, and you will be completely unconscious. If you’re interested in staying awake, discuss it with your anesthesiologist ahead of time. Your safety and comfort are always the top priority. |
| 16 | Can I talk with my anesthesiologist before surgery? | Absolutely — yes, you will speak with your anesthesiologist before the surgery. This usually happens during a preoperative assessment, either a day before or on the day of surgery. During this conversation, the anesthesiologist will:
|
| 17 | Will I be in pain after my surgery? | Some discomfort after surgery is normal, but you should not be in severe pain — and we’ll do everything we can to keep you comfortable. What to expect:
|
| 18 | When can I see my family after surgery? | You’ll usually be able to see your family shortly after you wake up and are stable in the recovery room (PACU) — typically within 1 to 2 h after surgery. However, this can vary depending on:
Feel free to let your care team know if seeing your loved ones is important to you — we’ll do our best to make it happen as soon as it's safe. |
| 19 | Can I choose which type of anaesthetic I can Have? | In many cases, yes, you can have a say — but it depends on the type of surgery, your medical condition, and what's safest and most effective for you. You may have a choice if:
|
| 20 | Do I need to let the anaesthetist know that I may be pregnant? |
Yes — absolutely. If there's any chance you might be pregnant, it's very important to tell your anesthetist before surgery. Here's why:
|
| 21 | Can I breastfeed after an anaesthetic? |
Yes, in most cases, you can safely breastfeed after anesthesia. Modern anesthetic drugs are designed to leave the body quickly, and only very small amounts — if any — pass into breast milk. Studies from the Academy of Breastfeeding Medicine and American Society of Anesthesiologists support this. General Guidelines:
Still, always mention that you are breastfeeding during your pre-op visit. Your care team will choose medications that are safest for you and your baby. |
| 22 | When can I drive after an operation? | You should not drive for at least 24 h after receiving general anesthesia or sedation — even if you feel alert. Here's why:
In many cases, you'll need:
|
A total of 22 frequently asked questions were submitted to ChatGPT, and each response was independently evaluated by five anesthesiology and reanimation specialists using a 4-point Likert-type scale (1 = Excellent, 2 = Satisfactory with minimal clarification, 3 = Satisfactory with moderate clarification, 4 = Unsatisfactory). This resulted in 110 individual ratings.
Analysis of the rating distribution revealed that 61.8% of the assessments were classified as excellent (score = 1), 32.7% as satisfactory requiring minimal clarification (score = 2), and 5.5% as satisfactory requiring moderate clarification (score = 3). Notably, no responses were rated as unsatisfactory by any evaluator.
The average score across all questions ranged between 1.0 and 2.0, indicating generally high-quality outputs. Reviewer-wise analysis demonstrated that the mean scores provided by individual evaluators varied between 1.27 and 1.73, suggesting modest variability in rating strictness among assessors.
To assess inter-observer agreement, the ICC[2,1] was calculated using a two-way random-effects model. The resulting ICC value was 0.25, indicating poor to fair reliability among raters. The ICC value was 0.25, indicating poor to fair agreement among reviewers. This suggests that while reviewers shared some consistency in their evaluations, there were notable individual differences in scoring across questions.
The average ratings across reviewers were as follows: Reviewers 1 - 1.27, Reviewers 2 - 1.64, Reviewers 3 - 1.64, Reviewers 4 - 1.41, and Reviewers 5 - 1.73. These results indicate that while the majority of evaluators rated ChatGPT's responses positively, there was a modest level of variation in stringency among reviewers.
The average answers given by the evaluators to the questions are shown in Table 2.
Table 2.
Average evaluation scores given by anesthesiology and reanimation specialists for each question answered by ChatGPT.
| Question no | Average score |
|---|---|
| 1 | 2.20 |
| 2 | 1.60 |
| 3 | 2.40 |
| 4 | 1.20 |
| 5 | 2.00 |
| 6 | 1.60 |
| 7 | 1.20 |
| 8 | 1.40 |
| 9 | 1.20 |
| 10 | 1.20 |
| 11 | 1.40 |
| 12 | 1.00 |
| 13 | 1.00 |
| 14 | 1.60 |
| 15 | 1.40 |
| 16 | 1.00 |
| 17 | 1.60 |
| 18 | 1.60 |
| 19 | 1.40 |
| 20 | 1.00 |
| 21 | 1.60 |
| 22 | 1.00 |
Discussion
The most important finding of this study is that ChatGPT provides satisfactory and useful answers to frequently asked questions by patients about anesthesia, which is a very intensively applied procedure, and supports other examples in the literature that it can be used for patient education.
Yeo et al. evaluated the effectiveness of ChatGPT in answering questions from patients with cirrhosis and hepatocellular carcinoma and showed that the artificial intelligence application can be used as an additional information tool to improve outcomes for patients and physicians. 9 In another study investigating the role of ChatGPT in patient education in obstructive sleep apnea, a common disease affecting sleep, it was shown to provide generally appropriate answers. 10 Alqudah et al. evaluated the effectiveness of ChatGPT in the field of ophthalmology 11 this study, it was emphasized that the artificial intelligence application answered the questions in this field with moderate accuracy and reproducibility and that certain improvements are needed to be used in patient education. Almagazzachi et al. evaluated the use of ChatGPT in patient education on hypertension, which is a global epidemic affecting almost one third of the adult population; it was stated that artificial intelligence applications should be under human supervision and control in order to provide accurate and reliable information to patients in this way. 12 Frequently asked questions about a surgical procedure such as hip arthroscopy, which patients frequently and frequently ask questions, were evaluated by 2 high-volume hip arthroscopists and it was stated that they answered the questions with satisfactory accuracy. However, the authors found incorrect information especially in some questions and emphasized that caution should be exercised when using ChatGPT for patient education about hip arthroscopy 13 effect of ChatGPT on the frequently asked questions of patients about some orthopedic surgeries has also been studied in the literature and it has been shown that it can produce satisfactory answers.14,15 Studies on ChatGPT in most disciplines in medicine are increasing day by day in the literature. Kuo et al. evaluated the answers of ChatGPT to common patient questions related to anesthesia by 3 different board certified anesthesiologists and emphasized that artificial intelligence performed similarly to anesthesiologists in terms of general quality, but they should not be used independently and may be useful with supervision. 16 In the current study, the authors had anesthesia and reanimation specialists with more than 20 years of experience evaluate the responses of ChatGPT to frequently asked questions about anesthesia by patients and showed that ChatGPT was able to provide generally satisfactory results to the results of patients, but it was found that there was a moderate difference especially between the evaluators. In the light of this information, our current study shows that the use of artificial intelligence applications in patient education, just like other studies in the literature, can be most effective when used under the control of physicians and healthcare professionals.
Our findings can be directly compared with previous studies evaluating the quality of ChatGPT responses in medical contexts. For example, Kuo et al. evaluated the performance of ChatGPT-3.5 across more than 100 medical questions and reported that while most responses were generally acceptable, a substantial proportion required clarification or contained incomplete information. 16 In contrast, in the present study using GPT-4, over 60% of responses were rated as excellent, with no responses classified as unsatisfactory. This comparison suggests that newer iterations of large language models may demonstrate improved response quality, particularly for patient-focused educational content. Although differences in study design and rating frameworks limit direct numerical comparison, our findings support the notion of progressive performance improvement with model advancement from GPT-3.5 to GPT-4.
This study has several limitations. First, the evaluations were conducted exclusively by five anesthesiology and reanimation specialists. While their medical expertise is indisputable, their specialty may influence how they interpret and prioritize information in responses related to anesthesia-related patient education, potentially affecting the generalizability of the results. Second, the number of evaluated questions was limited to 22, which may not fully reflect the breadth of frequently asked patient questions across different subspecialties. Third, although the use of a 4-point Likert-type scale provided structured scoring, the inherently subjective nature of qualitative evaluations may contribute to variability in interpretation, as reflected in the low inter-rater agreement (ICC = 0.25). Lastly, only a single AI model (ChatGPT-4.0) was assessed without comparison to alternative platforms or versions, limiting the scope of inference.
Conclusion
ChatGPT-4.0 was able to provide generally high-quality answers to frequently asked patient questions, as evaluated by five anesthesiology and reanimation specialists. Most responses were rated as excellent or satisfactory with minimal clarification. These findings suggest that ChatGPT may serve as a supportive tool in patient education. However, the moderate variability in scoring among reviewers highlights the need for professional oversight when using AI in clinical communication.
Acknowledgements
None.
Footnotes
ORCID iDs: Yasemin Akçaalan https://orcid.org/0000-0003-0011-3489
Ezgi Erkilic https://orcid.org/0000-0002-1211-9354
Human ethics and consent to participate: Not applicable.
Consent for publication: All authors accept that.
Author contributions: YA conceptualized and designed the study. YA, EE, HG, TG, OK, and LO contributed to data collection and question selection. YA and EE performed data analysis and interpretation. YA drafted the manuscript. All authors critically reviewed the manuscript for important intellectual content and approved the final version.
Funding: The authors received no financial support for the research, authorship, and/or publication of this article.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Availability of data and materials: All data and materials are available. The datasets used during the current study are available from the corresponding author (YA) on reasonable request and after ethical permission.
Declarations: We confirm that all experiments were performed in accordance with the Declaration of Helsinki.
References
- 1.Meara JG, Leather AJM, Hagander L, et al. Global Surgery 2030: evidence and solutions for achieving health, welfare, and economic development. Lancet 2015; 386: 569–624. [DOI] [PubMed] [Google Scholar]
- 2.resources for-consumers speak-up-campaigns anesthesia-and-sedation - jointcommission. https://www.jointcommission.org/resources/for-consumers/speak-up-campaigns/anesthesia-and-sedation/?utm_source=ChatGPT.com. Accessed 4 May 2025.
- 3.ChatGPT reaches 100 million users two months after launch | Chatbots | The Guardian. https://www.theguardian.com/technology/2023/feb/02/ChatGPT-100-million-users-open-ai-fastest-growing-app. Accessed 4 May 2025.
- 4.Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6. doi: 10.3389/frai.2023.1169595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mirzaei A, Aslani P, Luca EJ, et al. Erratum: predictors of health information–seeking behavior: systematic literature review and network analysis (journal of medical internet research (2021) 23:7 (e21680) DOI: 10.2196/21680). J Med Internet Res 2022; 24: e39705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mika AP, Martin JR, Engstrom SM, et al. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. JBone Joint Surg 2023; 105: 1519–1526. [DOI] [PubMed] [Google Scholar]
- 7.Özbek EA, Ertan MB, Kından P, et al. ChatGPT can offer at least satisfactory responses to common patient questions regarding hip arthroscopy. Arthroscopy: J Arthroscopic Relat Surg 2024; 40. doi: 10.1016/J.ARTHRO.2024.08.036 [DOI] [PubMed] [Google Scholar]
- 8.Johns WL, Kellish A, Farronato D, et al. ChatGPT can offer satisfactory responses to common patient questions regarding elbow ulnar collateral ligament reconstruction. Arthrosc Sports Med Rehabil 2024; 6: 100893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answer- ing questions regarding cirrhosis and hepatocellu- lar carcinoma. Clin Mol Hepatol 2023; 29: 721–732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Campbell DJ, Estephan LE, Mastrolonardo EV, et al. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med 2023; 19: 1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Alqudah AA, Aleshawi AJ, Baker M, et al. Evaluating accuracy and reproducibility of ChatGPT responses to patient-based questions in ophthalmology: an observational study. Medicine (Baltimore) 2024; 103: e39120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Almagazzachi A, Mustafa A, Sedeh AE, et al. Generative artificial intelligence in patient education: chatGPT takes on hypertension questions. Cureus 2024; 16: e53441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.AlShehri Y, McConkey M, Lodhia P. ChatGPT provides satisfactory but occasionally inaccurate answers to common patient hip arthroscopy questions. Arthroscopy: J Arthroscopic Relat Surg 2025; 41: 1337–1347. [DOI] [PubMed] [Google Scholar]
- 14.Ghilzai U, Fiedler B, Ghali A, et al. ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology. Shoulder & Elbow 2024; 16: 17585732241283972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Artioli E, Veronesi F, Mazzotti A, et al. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop 2024; 12: e70138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kuo FH, Fierstein JL, Tudor BH, et al. Comparing ChatGPT and a single anesthesiologist’s responses to common patient questions: an exploratory cross-sectional survey of a panel of anesthesiologists. J Med Syst 2024; 48. doi: 10.1007/s10916-024-02100-z [DOI] [PubMed] [Google Scholar]
