Evaluating ChatGPT-4 as a digital patient education tool in anesthesia: A multi-rater quality assessment

Yasemin Akçaalan; Ezgi Erkilic; Handan Gulec; Tulin Gumus; Orhan Kanbak; Levent Ozturk

doi:10.1177/20552076261420876

. 2026 Mar 13;12:20552076261420876. doi: 10.1177/20552076261420876

Evaluating ChatGPT-4 as a digital patient education tool in anesthesia: A multi-rater quality assessment

Yasemin Akçaalan ^1,^✉, Ezgi Erkilic ², Handan Gulec ^2,³, Tulin Gumus ², Orhan Kanbak ², Levent Ozturk ^2,³

PMCID: PMC12988302 PMID: 41836615

Abstract

Background

Large language models such as ChatGPT are increasingly used by patients seeking perioperative information, yet their reliability for anesthesia-related patient education remains insufficiently evaluated. This study assessed the quality of ChatGPT-4.0 responses to frequently asked anesthesia questions using a multi-rater evaluation framework.

Methods

Twenty-two common anesthesia-related patient questions were identified through online search. Each question was submitted once to ChatGPT-4.0 (GPT-4-turbo; chat.openai.com) without follow-up prompts. Five anesthesiology and reanimation specialists—each with more than 20 years of experience—independently evaluated each response using a validated 4-point Likert-type scale (1 = excellent; 4 = unsatisfactory). Inter-rater reliability was calculated using a two-way random-effects model (ICC[2,1]).

Results

A total of 110 ratings were collected. Among these, 61.8% were classified as excellent, 32.7% as satisfactory requiring minimal clarification, and 5.5% as satisfactory requiring moderate clarification. No responses were rated as unsatisfactory. Mean scores for individual questions ranged from 1.0 to 2.4. Reviewer-wise averages ranged from 1.27 to 1.73, indicating generally positive evaluations with modest variability in scoring strictness. The overall inter-rater reliability was poor to fair (ICC = 0.25).

Conclusions

ChatGPT-4.0 provided high-quality responses to frequently asked patient questions about anesthesia and may serve as a supportive digital health tool for patient education. However, limited agreement among evaluators highlights the need for expert oversight and contextual refinement when integrating large language models into clinical communication pathways.

Keywords: ChatGPT, large language models, digital patient education, anesthesia, artificial intelligence, digital health

Introduction

Anesthesia has been successfully applied for many years to temporarily eliminate pain and consciousness during surgical and medical procedures. Approximately 313 million surgical procedures are performed worldwide every year and a large proportion of these procedures are performed under anesthesia.¹ Especially in developed countries, approximately 40 million anesthesia applications are performed annually.² These data show that anesthesia is an indispensable component of modern medicine and that people are very often associated with anesthesia.

ChatGPT is an artificial intelligence (AI) chatbot developed by OpenAI that has the capability of responding to complex queries in an interactive, conversation-based format. Released to the public in November 2022, it is the fastest-growing consumer application in history, having surpassed 100 million users by January 2023.³ With a user-friendly interface, ChatGPT has wide- ranging implications for impacting healthcare delivery and patient education.⁴ The utilization of AI in healthcare systems is crucial and imperative due to its ability to enhance precision and accuracy while reducing the time required for various aspects of the system.⁴

Despite the growing body of literature evaluating ChatGPT performance across various medical specialties, anesthesia-related patient education represents a distinct and underexplored knowledge domain. Unlike specialties such as orthopedics, ophthalmology, or hepatology—where patient questions often focus on disease-specific management or procedural outcomes—anesthesia-related inquiries are closely linked to perioperative anxiety, safety concerns, and risk perception. Misinterpretation or inaccurate information in this context may have immediate implications for patient trust, informed consent, and perioperative decision-making. Therefore, assessing the quality of large language model–generated responses specifically within the field of anesthesia is particularly important and addresses a meaningful gap in the existing digital health literature.

Searching for information about health and disease on the internet has become a natural part of life and coping with a disease today.⁵ For this reason, it has become common for patients to use the internet and artificial intelligence applications to get information about anesthesia, which is the most common condition faced by patients. In the light of the information in the literature, the authors planned this study based on the hypothesis that ChatGPT will provide satisfactory answers about the questions frequently asked by patients about anesthesia.

Methods

The query “patients’ questions about anasthesia” was searched on Google (www.google.com). A thorough review of “frequently asked questions” sections of numerous clinics was documented. Common questions were identified through a review of frequently asked questions sections on anesthesia-related clinic websites. Question selection was performed by the study authors, all of whom are anesthesiology and reanimation specialists with long-standing clinical experience and academic appointments. The same group of anesthesiologists subsequently evaluated the ChatGPT responses to ensure domain-specific consistency and expertise. To ensure reproducibility, websites were included if they provided publicly accessible, patient-facing educational content related to anesthesia or perioperative care. Questions addressing general anesthesia concepts, perioperative safety, preparation, and recovery were included, whereas highly procedure-specific or institution-specific questions were excluded. The final set of 22 questions was predefined prior to ChatGPT evaluation, and no questions were modified for wording or language after selection.

The questions were submitted to ChatGPT using the GPT-4-turbo model, OpenAI's optimized and premium version of GPT-4, through the official interface (chat.openai.com) on May 1, 2025. No follow-up prompts or clarifications were provided. A modified version of the rating system originally introduced by Mika et al. was used to assess the quality of responses.⁶ This system has previously been applied in studies evaluating ChatGPT's performance on topics such as hip arthroplasty hip arthroscopy and ulnar collateral ligament reconstruction.^6–8 Each response was independently assessed by the authors following an evidence-based review strategy. No additional prompts, instructions, or contextual information were provided to ChatGPT beyond the questions themselves. Each question was entered in a separate and new chat session to minimize potential cross-contamination between responses.

The rating scale assigns scores of 1 to 4:

“Excellent response not requiring clarification": Responses did not include any information that contradicted current literature.
“Satisfactory requiring minimal clarification": Responses requiring minimal clarification lacked additional information or detail.
“Satisfactory requiring moderate clarification": Responses requiring moderate clarification had outdated or irrelevant information.
“Unsatisfactory requiring substantial clarification": Responses had either incorrect or overly generalized information that could be misinterpreted.

In this study, each of the 22 questions posed to ChatGPT was evaluated by five independent anesthesiologists and reanimation specialists. All of the anesthesiologists and reanimation specialists who evaluated the ChatGPT responses had over 20 years of professional experience. To determine the inter-observer agreement, the intraclass correlation coefficient (ICC) was calculated using a two-way random-effects model with absolute agreement (ICC[2,1]). This method is suitable for assessing consistency among multiple raters evaluating the same set of items. The ICC was selected to quantify inter-rater reliability among multiple evaluators rating the same set of responses. A two-way random-effects model with absolute agreement was chosen because the raters were considered representative of a broader population of clinicians, and because absolute agreement rather than consistency was of primary interest.

Descriptive statistics were calculated using mean values with corresponding standard deviations, as all ratings were complete and no missing data were observed. Inter-rater reliability was interpreted according to commonly used thresholds, with ICC values <0.50 indicating poor reliability, 0.50–0.75 moderate reliability, and >0.75 good reliability. Statistical analyses were performed using SPSS software (version 29.0; IBM Corp., Armonk, NY, USA).

Results

The questions and answers of ChatGPT 4.0 are shown in Table 1.

Table 1.

Frequently asked questions about anesthesia and answers from ChatGPT.

Question number	Frequently asked question	ChatGPT answer
1	What is the anaesthesia ?	Anesthesia is a medical treatment that prevents pain and awareness during surgery or other procedures. It works by blocking signals between your body and brain. There are three main types: General anesthesia – You are completely unconscious and unaware. Used for major surgeries. Regional anesthesia – Numbs a larger part of your body (like a spinal block). You may be awake but feel no pain. Local anesthesia – Numbs a small, specific area. Used for minor procedures like stitches or dental work. Anesthesia is given and managed by trained specialists who ensure it's safe, effective, and tailored to you.
2	What type of anesthesia will I receive?	That depends on the type of surgery you're having, your overall health, and your personal preferences. In general, there are three main types of anesthesia: General anesthesia – You’ll be completely unconscious and unaware of the procedure. Regional anesthesia – Numbs a large part of the body, such as an epidural for lower body surgeries. Local anesthesia – Numbs a small, specific area, usually for minor procedures. Your anesthesiologist will review your medical history and explain which option is safest and most effective for you. You’ll have a chance to ask questions before anything begins.
3	Are there risks associated with anesthesia?	Yes, like any medical procedure, anesthesia carries some risks — but serious complications are rare, especially with modern techniques and careful monitoring. Common, mild side effects include: Nausea or vomiting Sore throat (after general anesthesia with a breathing tube) Drowsiness or confusion for a few hours Rare but serious risks may include: Allergic reactions Breathing problems Heart complications Awareness during surgery (very rare) Infections (with regional techniques) Your anesthesiologist will assess your health, medications, and allergies to minimize these risks. Overall, anesthesia is very safe when managed by trained professionals.
4	How will I feel after the anesthesia?	After anesthesia, most people feel drowsy, groggy, or a bit confused at first — this is completely normal and usually lasts a few hours. Here's what you can typically expect: General anesthesia: You may feel sleepy, have a dry throat, or mild nausea. Some people experience temporary memory lapses or chills. Regional anesthesia: The numb area may feel strange or tingly as sensation returns. You’ll be awake but might feel tired from sedatives. Local anesthesia: You'll recover quickly with little to no lingering effects. The care team will monitor you closely and manage any discomfort. You’ll be given fluids, oxygen if needed, and medication for nausea or pain. Most people feel much better within a few hours.
5	How long will I be under anesthesia?	The duration of anesthesia depends entirely on the type and length of your surgery. For short procedures, you may be under anesthesia for as little as 15–30 min. For longer or more complex surgeries, anesthesia can last several hours. Regardless of the length, your anesthesiologist will continuously adjust and monitor the medications to keep you safely asleep or numb for the entire procedure. Once the surgery is finished, they’ll stop the anesthesia and you’ll begin to wake up shortly afterward — usually within minutes to an hour, depending on the type used.
6	Can I eat or drink before the surgery?	No — for your safety, you should not eat or drink before surgery. Typically, we follow these general guidelines (unless your doctor gives you different instructions): No solid food for at least 6 h before your surgery Clear liquids (like water, apple juice, or black coffee) may be allowed up to 2 h before anesthesia Why is this important? If you have food or liquid in your stomach, there's a risk it could come up and enter your lungs during anesthesia — a serious complication called aspiration. Your surgical team will give you specific fasting instructions based on your case. It's very important to follow them closely.
7	What if I have allergies or medical conditions?	That's very important information — and it's something we always ask about before surgery. If you have allergies (to medications, latex, foods, etc.) or medical conditions like asthma, diabetes, heart disease, or previous reactions to anesthesia, your anesthesiologist will carefully review them before the procedure. These details help us: Choose the safest anesthesia medications Monitor your condition more closely Adjust doses or techniques as needed Always be honest and thorough during your preoperative interview — even small details can make a big difference in keeping you safe.
8	Who will be monitoring me during the procedure?	During your procedure, you’ll be continuously monitored by a board-certified anesthesiologist or a certified nurse anesthetist (CRNA) — often both. They will closely track your: Heart rate Blood pressure Oxygen levels Breathing Depth of anesthesia Modern monitoring equipment provides real-time data, and your anesthesia team stays by your side throughout the entire surgery to adjust medications and respond immediately to any changes. You’re never alone — your safety is our top priority from start to finish.
9	What happens if I wake up during the surgery?	That's a very common fear, but waking up during surgery is extremely rare — especially under general anesthesia. The condition is called anesthesia awareness, and it occurs in about 1 to 2 cases per 1000 surgeries with general anesthesia, often during emergency or high-risk procedures. To prevent this, we: Use precise, carefully calculated doses of anesthetic drugs Continuously monitor your brain activity, heart rate, and vital signs Adjust medications instantly if your body shows any sign of awareness If you’re receiving regional or local anesthesia, you may be awake — but you won’t feel pain. Sedatives can also be used to help you stay relaxed and unaware. Your anesthesiologist's job is to keep you comfortable, safe, and unaware throughout the procedure.
10	How long will I stay in the recovery room?	Most patients stay in the recovery room — also called the Post-Anesthesia Care Unit (PACU) — for about 1 to 2 h, depending on: The type of anesthesia you received The length and type of surgery How quickly your vital signs stabilize Whether you experience nausea, pain, or drowsiness During this time, nurses will: Monitor your breathing, heart rate, and blood pressure Manage your pain and nausea Make sure you’re fully waking up and feeling stable Once you're alert, comfortable, and medically stable, you’ll either be discharged home (for outpatient surgery) or transferred to your hospital room.
11	When will I be able to return to normal activities?	That depends on the type of anesthesia and the surgery you had. In general: After local or regional anesthesia, you may return to most normal activities within a few hours to a day, as long as you feel well. After general anesthesia, full recovery from grogginess and coordination issues can take 24 to 48 h. For more invasive surgeries, recovery may take days to weeks, depending on your body and your surgeon's instructions. Important: You should not drive, make legal decisions, or operate machinery for at least 24 h after receiving general anesthesia or sedation. Your care team will give you specific guidelines before you leave — always follow them closely to ensure a safe and smooth recovery
12	What should I do if I’m nervous about the anesthesia?	It's completely normal to feel nervous — and you’re not alone. Many people share the same concern. Here's what you can do: Talk to your anesthesiologist: At your pre-op visit, share your worries. We’ll explain everything clearly, walk you through what to expect, and answer all your questions. Ask about sedation before surgery: If your anxiety is strong, we can give you mild medication to help you relax before the procedure. Know that you’re in safe hands: Modern anesthesia is extremely safe, and your vital signs are monitored continuously by trained professionals. Understanding the process helps reduce fear — and your anesthesia team is here to support you every step of the way.
13	How Can I Prepare in the Days and Weeks Before My Surgery?	Good preparation can help reduce risks and support a smooth recovery. Here's how you can prepare in the days and weeks before your surgery: 1. Share your full medical history Tell your doctor and anesthesiologist about: All medications, including over-the-counter and supplements Any allergies or past reactions to anesthesia Medical conditions like diabetes, heart issues, or sleep apnea 2. Follow medication instructions You may be asked to: Stop certain medications like blood thinners or herbal supplements Continue others, like blood pressure medications — always check with your doctor 3. Quit smoking and alcohol Stop smoking at least 2 weeks before surgery — it improves lung function and healing Avoid alcohol, which can interfere with anesthesia and recovery 4. Eat healthy, stay active Maintain a balanced diet Stay physically active (as approved by your doctor) Get plenty of rest 5. Arrange support Have someone available to drive you home and stay with you Prepare your home for a comfortable recovery Your surgical team will give you detailed, personalized instructions — follow them carefully to ensure the best possible outcome.
14	Can I Take My Medications On Surgery Day?	It depends on the specific medications you take — some should be continued, while others may need to be paused. Common medications that are often continued: Blood pressure medications (e.g., beta blockers) Thyroid medications Certain heart medications Medications that are often paused: Blood thinners (e.g., aspirin, warfarin, clopidogrel) – may increase bleeding risk Oral diabetes medications or insulin – may need adjustment due to fasting Diuretics – can cause dehydration What to do: Bring a list of all medications to your pre-op appointment Ask your anesthesiologist exactly which ones to take the morning of surgery (usually with a sip of water) Never guess — follow the medical team's specific instructions to stay safe
15	Can I Stay Awake During Surgery and Watch?	In some cases, yes — you can stay awake during surgery, but it depends on the type of procedure and type of anesthesia used. You may stay awake if: The surgery uses regional anesthesia (like a spinal or nerve block) The procedure is minor and done with local anesthesia You request to stay awake, and your surgeon and anesthesiologist agree it's safe To keep you relaxed and comfortable, we can also give mild sedation — you’ll be calm, may not remember much, but still technically awake. However: For more complex or invasive surgeries, general anesthesia is required, and you will be completely unconscious. If you’re interested in staying awake, discuss it with your anesthesiologist ahead of time. Your safety and comfort are always the top priority.
16	Can I talk with my anesthesiologist before surgery?	Absolutely — yes, you will speak with your anesthesiologist before the surgery. This usually happens during a preoperative assessment, either a day before or on the day of surgery. During this conversation, the anesthesiologist will: Review your medical history and medications Discuss allergies and any past anesthesia experiences Explain the type of anesthesia planned for your surgery Answer any questions or concerns you may have Give specific instructions (e.g., fasting, medications) This is your chance to ask anything — don’t hesitate. Your anesthesiologist is there to ensure you feel safe, informed, and confident going into surgery.
17	Will I be in pain after my surgery?	Some discomfort after surgery is normal, but you should not be in severe pain — and we’ll do everything we can to keep you comfortable. What to expect: Mild to moderate pain is common, especially in the first 24–48 h. The level of pain depends on the type of surgery, your pain tolerance, and your medical history. How we manage your pain: Medications: We may use a combination of pain relievers — including acetaminophen, anti-inflammatories, opioids (if needed), or nerve blockers. Regional anesthesia (like nerve blocks) can also provide pain relief for several hours after surgery. You’ll receive a personalized pain management plan before you go home. Always tell your team if your pain feels too strong — effective pain control helps your body heal faster and reduces complications.
18	When can I see my family after surgery?	You’ll usually be able to see your family shortly after you wake up and are stable in the recovery room (PACU) — typically within 1 to 2 h after surgery. However, this can vary depending on: The type of surgery and anesthesia How quickly you recover and stabilize The hospital's visitation policy Whether you're going to a hospital room or being discharged home In most cases, once you’re awake, alert, and your vital signs are normal, a nurse will either bring your family in or take you to them. Feel free to let your care team know if seeing your loved ones is important to you — we’ll do our best to make it happen as soon as it's safe.
19	Can I choose which type of anaesthetic I can Have?	In many cases, yes, you can have a say — but it depends on the type of surgery, your medical condition, and what's safest and most effective for you. You may have a choice if: The procedure can be done with either regional or general anesthesia You have no major medical risks that limit your options You're undergoing minor or outpatient surgery Your anesthesiologist will: Explain the options, including benefits and risks Recommend the safest approach based on your health and the procedure Consider your preferences and comfort level when deciding So while the final decision is guided by medical safety, your voice matters — always feel free to ask and discuss what you’re most comfortable with.
20	Do I need to let the anaesthetist know that I may be pregnant?	Yes — absolutely. If there's any chance you might be pregnant, it's very important to tell your anesthetist before surgery. Here's why: Some anesthesia drugs and medications may affect the developing baby, especially in the early stages of pregnancy. Pregnancy can change how your body processes medications and how your airway and circulation respond to anesthesia. If you're pregnant, we may need to adjust medications, monitor differently, or even delay non-urgent surgery for safety. Even if you’re not sure — tell your team. Protecting both your health and your baby's is always a top priority.
21	Can I breastfeed after an anaesthetic?	Yes, in most cases, you can safely breastfeed after anesthesia. Modern anesthetic drugs are designed to leave the body quickly, and only very small amounts — if any — pass into breast milk. Studies from the Academy of Breastfeeding Medicine and American Society of Anesthesiologists support this. General Guidelines: Short procedures with general anesthesia: You can usually resume breastfeeding as soon as you are awake, alert, and feeling well. Sedation or regional anesthesia (e.g., spinal or epidural): These have minimal effect on breast milk, so breastfeeding can continue normally. You do NOT need to “pump and dump” after most common anesthetics. Still, always mention that you are breastfeeding during your pre-op visit. Your care team will choose medications that are safest for you and your baby.
22	When can I drive after an operation?	You should not drive for at least 24 h after receiving general anesthesia or sedation — even if you feel alert. Here's why: Anesthetic drugs can affect your reaction time, coordination, and judgment longer than you might realize. Pain medications, like opioids, can also impair your ability to drive safely. For regional or local anesthesia, you may be able to drive sooner — but only if you’re fully alert, off sedatives, and your body is functioning normally (e.g., no numb limbs). Always follow your surgeon's or anesthesiologist's advice. In many cases, you'll need: A responsible adult to drive you home after surgery To wait until you're completely off medications that impair driving When in doubt, wait — your safety and others’ depend on it.

Open in a new tab

A total of 22 frequently asked questions were submitted to ChatGPT, and each response was independently evaluated by five anesthesiology and reanimation specialists using a 4-point Likert-type scale (1 = Excellent, 2 = Satisfactory with minimal clarification, 3 = Satisfactory with moderate clarification, 4 = Unsatisfactory). This resulted in 110 individual ratings.

Analysis of the rating distribution revealed that 61.8% of the assessments were classified as excellent (score = 1), 32.7% as satisfactory requiring minimal clarification (score = 2), and 5.5% as satisfactory requiring moderate clarification (score = 3). Notably, no responses were rated as unsatisfactory by any evaluator.

The average score across all questions ranged between 1.0 and 2.0, indicating generally high-quality outputs. Reviewer-wise analysis demonstrated that the mean scores provided by individual evaluators varied between 1.27 and 1.73, suggesting modest variability in rating strictness among assessors.

To assess inter-observer agreement, the ICC[2,1] was calculated using a two-way random-effects model. The resulting ICC value was 0.25, indicating poor to fair reliability among raters. The ICC value was 0.25, indicating poor to fair agreement among reviewers. This suggests that while reviewers shared some consistency in their evaluations, there were notable individual differences in scoring across questions.

The average ratings across reviewers were as follows: Reviewers 1 - 1.27, Reviewers 2 - 1.64, Reviewers 3 - 1.64, Reviewers 4 - 1.41, and Reviewers 5 - 1.73. These results indicate that while the majority of evaluators rated ChatGPT's responses positively, there was a modest level of variation in stringency among reviewers.

The average answers given by the evaluators to the questions are shown in Table 2.

Table 2.

Average evaluation scores given by anesthesiology and reanimation specialists for each question answered by ChatGPT.

Question no	Average score
1	2.20
2	1.60
3	2.40
4	1.20
5	2.00
6	1.60
7	1.20
8	1.40
9	1.20
10	1.20
11	1.40
12	1.00
13	1.00
14	1.60
15	1.40
16	1.00
17	1.60
18	1.60
19	1.40
20	1.00
21	1.60
22	1.00

Open in a new tab

Discussion

The most important finding of this study is that ChatGPT provides satisfactory and useful answers to frequently asked questions by patients about anesthesia, which is a very intensively applied procedure, and supports other examples in the literature that it can be used for patient education.

Yeo et al. evaluated the effectiveness of ChatGPT in answering questions from patients with cirrhosis and hepatocellular carcinoma and showed that the artificial intelligence application can be used as an additional information tool to improve outcomes for patients and physicians.⁹ In another study investigating the role of ChatGPT in patient education in obstructive sleep apnea, a common disease affecting sleep, it was shown to provide generally appropriate answers.¹⁰ Alqudah et al. evaluated the effectiveness of ChatGPT in the field of ophthalmology¹¹ this study, it was emphasized that the artificial intelligence application answered the questions in this field with moderate accuracy and reproducibility and that certain improvements are needed to be used in patient education. Almagazzachi et al. evaluated the use of ChatGPT in patient education on hypertension, which is a global epidemic affecting almost one third of the adult population; it was stated that artificial intelligence applications should be under human supervision and control in order to provide accurate and reliable information to patients in this way.¹² Frequently asked questions about a surgical procedure such as hip arthroscopy, which patients frequently and frequently ask questions, were evaluated by 2 high-volume hip arthroscopists and it was stated that they answered the questions with satisfactory accuracy. However, the authors found incorrect information especially in some questions and emphasized that caution should be exercised when using ChatGPT for patient education about hip arthroscopy¹³ effect of ChatGPT on the frequently asked questions of patients about some orthopedic surgeries has also been studied in the literature and it has been shown that it can produce satisfactory answers.^14,15 Studies on ChatGPT in most disciplines in medicine are increasing day by day in the literature. Kuo et al. evaluated the answers of ChatGPT to common patient questions related to anesthesia by 3 different board certified anesthesiologists and emphasized that artificial intelligence performed similarly to anesthesiologists in terms of general quality, but they should not be used independently and may be useful with supervision.¹⁶ In the current study, the authors had anesthesia and reanimation specialists with more than 20 years of experience evaluate the responses of ChatGPT to frequently asked questions about anesthesia by patients and showed that ChatGPT was able to provide generally satisfactory results to the results of patients, but it was found that there was a moderate difference especially between the evaluators. In the light of this information, our current study shows that the use of artificial intelligence applications in patient education, just like other studies in the literature, can be most effective when used under the control of physicians and healthcare professionals.

Our findings can be directly compared with previous studies evaluating the quality of ChatGPT responses in medical contexts. For example, Kuo et al. evaluated the performance of ChatGPT-3.5 across more than 100 medical questions and reported that while most responses were generally acceptable, a substantial proportion required clarification or contained incomplete information.¹⁶ In contrast, in the present study using GPT-4, over 60% of responses were rated as excellent, with no responses classified as unsatisfactory. This comparison suggests that newer iterations of large language models may demonstrate improved response quality, particularly for patient-focused educational content. Although differences in study design and rating frameworks limit direct numerical comparison, our findings support the notion of progressive performance improvement with model advancement from GPT-3.5 to GPT-4.

This study has several limitations. First, the evaluations were conducted exclusively by five anesthesiology and reanimation specialists. While their medical expertise is indisputable, their specialty may influence how they interpret and prioritize information in responses related to anesthesia-related patient education, potentially affecting the generalizability of the results. Second, the number of evaluated questions was limited to 22, which may not fully reflect the breadth of frequently asked patient questions across different subspecialties. Third, although the use of a 4-point Likert-type scale provided structured scoring, the inherently subjective nature of qualitative evaluations may contribute to variability in interpretation, as reflected in the low inter-rater agreement (ICC = 0.25). Lastly, only a single AI model (ChatGPT-4.0) was assessed without comparison to alternative platforms or versions, limiting the scope of inference.

Conclusion

ChatGPT-4.0 was able to provide generally high-quality answers to frequently asked patient questions, as evaluated by five anesthesiology and reanimation specialists. Most responses were rated as excellent or satisfactory with minimal clarification. These findings suggest that ChatGPT may serve as a supportive tool in patient education. However, the moderate variability in scoring among reviewers highlights the need for professional oversight when using AI in clinical communication.

Acknowledgements

None.

Footnotes

ORCID iDs: Yasemin Akçaalan https://orcid.org/0000-0003-0011-3489

Ezgi Erkilic https://orcid.org/0000-0002-1211-9354

Human ethics and consent to participate: Not applicable.

Consent for publication: All authors accept that.

Author contributions: YA conceptualized and designed the study. YA, EE, HG, TG, OK, and LO contributed to data collection and question selection. YA and EE performed data analysis and interpretation. YA drafted the manuscript. All authors critically reviewed the manuscript for important intellectual content and approved the final version.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials: All data and materials are available. The datasets used during the current study are available from the corresponding author (YA) on reasonable request and after ethical permission.

Declarations: We confirm that all experiments were performed in accordance with the Declaration of Helsinki.

References

1.Meara JG, Leather AJM, Hagander L, et al. Global Surgery 2030: evidence and solutions for achieving health, welfare, and economic development. Lancet 2015; 386: 569–624. [DOI] [PubMed] [Google Scholar]
2.resources for-consumers speak-up-campaigns anesthesia-and-sedation - jointcommission. https://www.jointcommission.org/resources/for-consumers/speak-up-campaigns/anesthesia-and-sedation/?utm_source=ChatGPT.com. Accessed 4 May 2025.
3.ChatGPT reaches 100 million users two months after launch | Chatbots | The Guardian. https://www.theguardian.com/technology/2023/feb/02/ChatGPT-100-million-users-open-ai-fastest-growing-app. Accessed 4 May 2025.
4.Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6. doi: 10.3389/frai.2023.1169595 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Mirzaei A, Aslani P, Luca EJ, et al. Erratum: predictors of health information–seeking behavior: systematic literature review and network analysis (journal of medical internet research (2021) 23:7 (e21680) DOI: 10.2196/21680). J Med Internet Res 2022; 24: e39705. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Mika AP, Martin JR, Engstrom SM, et al. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. JBone Joint Surg 2023; 105: 1519–1526. [DOI] [PubMed] [Google Scholar]
7.Özbek EA, Ertan MB, Kından P, et al. ChatGPT can offer at least satisfactory responses to common patient questions regarding hip arthroscopy. Arthroscopy: J Arthroscopic Relat Surg 2024; 40. doi: 10.1016/J.ARTHRO.2024.08.036 [DOI] [PubMed] [Google Scholar]
8.Johns WL, Kellish A, Farronato D, et al. ChatGPT can offer satisfactory responses to common patient questions regarding elbow ulnar collateral ligament reconstruction. Arthrosc Sports Med Rehabil 2024; 6: 100893. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answer- ing questions regarding cirrhosis and hepatocellu- lar carcinoma. Clin Mol Hepatol 2023; 29: 721–732. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Campbell DJ, Estephan LE, Mastrolonardo EV, et al. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med 2023; 19: 1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Alqudah AA, Aleshawi AJ, Baker M, et al. Evaluating accuracy and reproducibility of ChatGPT responses to patient-based questions in ophthalmology: an observational study. Medicine (Baltimore) 2024; 103: e39120. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Almagazzachi A, Mustafa A, Sedeh AE, et al. Generative artificial intelligence in patient education: chatGPT takes on hypertension questions. Cureus 2024; 16: e53441. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.AlShehri Y, McConkey M, Lodhia P. ChatGPT provides satisfactory but occasionally inaccurate answers to common patient hip arthroscopy questions. Arthroscopy: J Arthroscopic Relat Surg 2025; 41: 1337–1347. [DOI] [PubMed] [Google Scholar]
14.Ghilzai U, Fiedler B, Ghali A, et al. ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology. Shoulder & Elbow 2024; 16: 17585732241283972. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Artioli E, Veronesi F, Mazzotti A, et al. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop 2024; 12: e70138. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kuo FH, Fierstein JL, Tudor BH, et al. Comparing ChatGPT and a single anesthesiologist’s responses to common patient questions: an exploratory cross-sectional survey of a panel of anesthesiologists. J Med Syst 2024; 48. doi: 10.1007/s10916-024-02100-z [DOI] [PubMed] [Google Scholar]

[bibr1-20552076261420876] 1.Meara JG, Leather AJM, Hagander L, et al. Global Surgery 2030: evidence and solutions for achieving health, welfare, and economic development. Lancet 2015; 386: 569–624. [DOI] [PubMed] [Google Scholar]

[bibr2-20552076261420876] 2.resources for-consumers speak-up-campaigns anesthesia-and-sedation - jointcommission. https://www.jointcommission.org/resources/for-consumers/speak-up-campaigns/anesthesia-and-sedation/?utm_source=ChatGPT.com. Accessed 4 May 2025.

[bibr3-20552076261420876] 3.ChatGPT reaches 100 million users two months after launch | Chatbots | The Guardian. https://www.theguardian.com/technology/2023/feb/02/ChatGPT-100-million-users-open-ai-fastest-growing-app. Accessed 4 May 2025.

[bibr4-20552076261420876] 4.Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6. doi: 10.3389/frai.2023.1169595 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr5-20552076261420876] 5.Mirzaei A, Aslani P, Luca EJ, et al. Erratum: predictors of health information–seeking behavior: systematic literature review and network analysis (journal of medical internet research (2021) 23:7 (e21680) DOI: 10.2196/21680). J Med Internet Res 2022; 24: e39705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr6-20552076261420876] 6.Mika AP, Martin JR, Engstrom SM, et al. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. JBone Joint Surg 2023; 105: 1519–1526. [DOI] [PubMed] [Google Scholar]

[bibr7-20552076261420876] 7.Özbek EA, Ertan MB, Kından P, et al. ChatGPT can offer at least satisfactory responses to common patient questions regarding hip arthroscopy. Arthroscopy: J Arthroscopic Relat Surg 2024; 40. doi: 10.1016/J.ARTHRO.2024.08.036 [DOI] [PubMed] [Google Scholar]

[bibr8-20552076261420876] 8.Johns WL, Kellish A, Farronato D, et al. ChatGPT can offer satisfactory responses to common patient questions regarding elbow ulnar collateral ligament reconstruction. Arthrosc Sports Med Rehabil 2024; 6: 100893. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr9-20552076261420876] 9.Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answer- ing questions regarding cirrhosis and hepatocellu- lar carcinoma. Clin Mol Hepatol 2023; 29: 721–732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr10-20552076261420876] 10.Campbell DJ, Estephan LE, Mastrolonardo EV, et al. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med 2023; 19: 1989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr11-20552076261420876] 11.Alqudah AA, Aleshawi AJ, Baker M, et al. Evaluating accuracy and reproducibility of ChatGPT responses to patient-based questions in ophthalmology: an observational study. Medicine (Baltimore) 2024; 103: e39120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr12-20552076261420876] 12.Almagazzachi A, Mustafa A, Sedeh AE, et al. Generative artificial intelligence in patient education: chatGPT takes on hypertension questions. Cureus 2024; 16: e53441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr13-20552076261420876] 13.AlShehri Y, McConkey M, Lodhia P. ChatGPT provides satisfactory but occasionally inaccurate answers to common patient hip arthroscopy questions. Arthroscopy: J Arthroscopic Relat Surg 2025; 41: 1337–1347. [DOI] [PubMed] [Google Scholar]

[bibr14-20552076261420876] 14.Ghilzai U, Fiedler B, Ghali A, et al. ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology. Shoulder & Elbow 2024; 16: 17585732241283972. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr15-20552076261420876] 15.Artioli E, Veronesi F, Mazzotti A, et al. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop 2024; 12: e70138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr16-20552076261420876] 16.Kuo FH, Fierstein JL, Tudor BH, et al. Comparing ChatGPT and a single anesthesiologist’s responses to common patient questions: an exploratory cross-sectional survey of a panel of anesthesiologists. J Med Syst 2024; 48. doi: 10.1007/s10916-024-02100-z [DOI] [PubMed] [Google Scholar]

PERMALINK

Evaluating ChatGPT-4 as a digital patient education tool in anesthesia: A multi-rater quality assessment

Yasemin Akçaalan

Ezgi Erkilic

Handan Gulec

Tulin Gumus

Orhan Kanbak

Levent Ozturk

Abstract

Background

Methods

Results

Conclusions

Introduction

Methods

Results

Table 1.

Table 2.

Discussion

Conclusion

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Evaluating ChatGPT-4 as a digital patient education tool in anesthesia: A multi-rater quality assessment

Yasemin Akçaalan

Ezgi Erkilic

Handan Gulec

Tulin Gumus

Orhan Kanbak

Levent Ozturk

Abstract

Background

Methods

Results

Conclusions

Introduction

Methods

Results

Table 1.

Table 2.

Discussion

Conclusion

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases