Comparison Between ChatGPT and Google Search as Sources of Postoperative Patient Instructions

Noel F Ayoub; Yu-Jin Lee; David Grimm; Karthik Balakrishnan

doi:10.1001/jamaoto.2023.0704

. 2023 Apr 27;149(6):556–558. doi: 10.1001/jamaoto.2023.0704

Comparison Between ChatGPT and Google Search as Sources of Postoperative Patient Instructions

Noel F Ayoub ^1,^✉, Yu-Jin Lee ¹, David Grimm ¹, Karthik Balakrishnan ¹

PMCID: PMC10141286 PMID: 37103921

Abstract

This qualitative study rates the level of understandability, actionability, and procedure-specific content in postoperative instructions generated from ChatGPT, Google Search, and Stanford University.

ChatGPT (generative pretrained transformer), an artificial intelligence–powered language model chatbot, has been described as an innovative resource for many industries, including health care.¹ Lower health literacy and limited understanding of postoperative instructions have been associated with worse outcomes.^2,3 While currently ChatGPT cannot supplant a human clinician, it can serve as a medical knowledge source. This qualitative study assessed the value of ChatGPT in augmenting patient knowledge and generating postoperative instructions for use in populations with low educational or health literacy levels.

Methods

We analyzed postoperative patient instructions for 8 common pediatric otolaryngologic procedures: tympanostomy tube placement, tonsillectomy and adenoidectomy, inferior turbinate reduction, tympanoplasty, cochlear implant, neck mass resection, microdirect laryngoscopy and bronchoscopy, and tongue-tie release. Stanford University Institutional Review Board deemed this study exempt from review and waived the informed consent requirement given the study design. We followed the SRQR reporting guideline.

Postoperative instructions were obtained from ChatGPT, Google Search, and Stanford University (hereafter, institution). This phrase was entered into ChatGPT: Please provide postoperative instructions for the family of a child who just underwent a [procedure]. Provide them at a 5th grade reading level. Similarly, this phrase was entered into Google Search: My child just underwent [procedure]. What do I need to know and watch out for? The first nonsponsored Google Search results were used for analysis. Results were extracted and blinded. To enable adequate blinding, we standardized all fonts and removed audiovisuals (eg, pictures). Two of us (N.F.A., Y.-J.L.) scored the instructions.

The primary outcome was the Patient Education Materials Assessment Tool–printable (PEMAT-P)⁴ score, which assessed the understandability and actionability of instructions for patients of different backgrounds and health literacy levels. As a secondary outcome, instructions were scored on whether they addressed procedure-specific items. We a priori generated a list of 4 items specific to each procedure that were deemed important for each instruction to mention; see the Table 1 footnote for these items.

Table 1. Understandability, Actionability, and Procedure-Specific Scores for Each Procedure.

Procedure and Instructions source	PEMAT-P understandability score, %	PEMAT-P actionability score, %	Procedure-specific items score, %^a
Tympanostomy tube placement
Institution^b	91	80	100
Google Search	82	100	75
ChatGPT	82	80	100
Tonsillectomy and adenoidectomy
Institution^b	91	80	100
Google Search	82	100	100
ChatGPT	82	80	100
Inferior turbinate reduction
Institution^b	91	100	100
Google Search	82	80	75
ChatGPT	73	80	100
Tympanoplasty
Institution^b	91	100	100
Google Search	82	100	100
ChatGPT	82	80	100
Cochlear implant
Institution^b	91	100	100
Google Search	82	40	0
ChatGPT	82	20	100
Neck mass resection
Institution^b	91	80	75
Google Search	82	80	100
ChatGPT	82	80	75
Microdirect laryngoscopy and bronchoscopy
Institution^b	91	80	100
Google Search	82	80	75
ChatGPT	82	80	100
Tongue-tie release
Institution^b	91	100	100
Google Search	73	80	50
ChatGPT	82	80	100

Open in a new tab

Abbreviation: PEMAT-P, Patient Education Materials Assessment Tool–printable.

^{^a}

Reviewers analyzed whether each instruction discussed 4 items specific to each procedure. Tympanostomy tube placement: (1) follow-up/tube check, (2) otorrhea, (3) ear drops, (4) when to call a clinician. Tonsillectomy and adenoidectomy: (1) pain management, (2) what to do if there is bleeding, (3) oral hydration, (4) when to call a clinician. Inferior turbinate reduction: (1) what to do if there is bleeding, (2) nasal sprays, (3) pain management, (4) when to call a clinician. Microdirect laryngoscopy and bronchoscopy: (1) pain management, (2) difficulty breathing, (3) difficulty swallowing, (4) when to call a clinician. Neck mass resection: (1) pain management, (2) difficulty breathing/swallowing, (3) swelling, (4) when to call a clinician. Tympanoplasty: (1) dry ear precautions, (2) ear drops, (3) pain management, (4) when to call a clinician. Cochlear implant: (1) what to do if there is fever, (2) what to do if there is swelling, (3) pain management, (4) wound care. Tongue-tie release: (1) pain, (2) difficulty eating, (3) bleeding, (4) when to call a clinician.

^{^b}

Standardized postoperative instructions from Stanford University School of Medicine.

Scores were compared using 1-way analysis of variance and Kruskal-Wallis tests with η² (90% CI) as the appropriate effect size.⁵ Analysis was performed February 6, 2023 using R, version 4 (R Core Team).

Results

Overall, understandability scores ranged from 73% to 91%; actionability scores, 20% to 100%; and procedure-specific items, 0% to 100% (Table 1). ChatGPT-generated instructions were scored from 73% to 82% for understandability, 20% to 80% for actionability, and 75% to 100% for procedure-specific items.

Institution-generated instructions consistently had the highest scores (Table 2). Understandability scores were highest for institution (91%) vs ChatGPT (81%) and Google Search (81%) instructions (η², 0.86; 90% CI, 0.67-1.00). Actionability scores were lowest for ChatGPT (73%), intermediate for Google Search (83%), and highest for institution (92%) instructions (η², 0.22; 90% CI, 0.04-0.55). For procedure-specific items, ChatGPT (97%) and institution (97%) instructions had the highest scores and Google Search had the lowest (72%) (η², 0.23; 90% CI, 0-0.64).

Table 2. Comparison of ChatGPT, Google Search, and Institution Instructions.

	Scores, %			η² (90% CI)
	ChatGPT	Google search	Institution^a	η² (90% CI)
PEMAT-P total	78	81	91	0.52 (0.16-0.68)
PEMAT-P understandability	81	81	91	0.86 (0.67-1.00)
PEMAT-P actionability	73	83	92	0.22 (0.04-0.55)
Procedure-specific items	97	72	97	0.23 (0-0.64)

Open in a new tab

Abbreviation: PEMAT-P, Patient Education Materials Assessment Tool–printable.

^{^a}

Standardized postoperative instructions from Stanford University School of Medicine.

Discussion

Findings suggest that ChatGPT provides instructions that are helpful for patients with a fifth-grade reading level or different health literacy levels. However, ChatGPT-generated instructions scored lower in understandability, actionability, and procedure-specific content than Google Search– and institution-specific instructions. Despite these findings, ChatGPT may be beneficial for patients and clinicians, especially when alternative resources are limited.

Online search engines are common sources of medical information for the public: 7% of Google searches are health-related.⁶ However, ChatGPT has advantages over search engines: it is free, can be customized to different literacy levels, and provides succinct information. ChatGPT provides direct answers that are often well-written, detailed, and in if-then format, which give patients access to immediate information while waiting to reach a clinician.

Study limitations were that only a few procedures and resources were analyzed, and the analysis was performed only in English. ChatGPT limitations included lack of citations; inability of users to confirm the accuracy of the information or explore topics further; and a knowledge base with a 2021 end point, excluding the latest data, events, or practice.

Supplement.

Data Sharing Statement

Click here for additional data file.^{(17.2KB, pdf)}

References

1.Roose K. How ChatGPT kicked off an A.I. arms race. New York Times. February 3, 2023. Accessed February 5, 2023. https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-intelligence.html
2.Theiss LM, Wood T, McLeod MC, et al. The association of health literacy and postoperative complications after colorectal surgery: a cohort study. Am J Surg. 2022;223(6):1047-1052. doi: 10.1016/j.amjsurg.2021.10.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.De Oliveira GS Jr, McCarthy RJ, Wolf MS, Holl J. The impact of health literacy in the care of surgical patients: a qualitative systematic review. BMC Surg. 2015;15:86. doi: 10.1186/s12893-015-0073-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Shoemaker SJ, Wolf MS, Brach C. The Patient Education Materials Assessment Tool (PEMAT) and User’s Guide. AHRQ Publication No. 14-0002-EF. Agency for Healthcare Research and Quality; 2013.
5.Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013;4:863. doi: 10.3389/fpsyg.2013.00863 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Drees J. Google receives more than 1 billion health questions every day. Becker’s Health IT. March 11, 2019. Accessed February 2023. https://www.beckershospitalreview.com/healthcare-information-technology/google-receives-more-than-1-billion-health-questions-every-day.html

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

Data Sharing Statement

Click here for additional data file.^{(17.2KB, pdf)}

[old230004r1] 1.Roose K. How ChatGPT kicked off an A.I. arms race. New York Times. February 3, 2023. Accessed February 5, 2023. https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-intelligence.html

[old230004r2] 2.Theiss LM, Wood T, McLeod MC, et al. The association of health literacy and postoperative complications after colorectal surgery: a cohort study. Am J Surg. 2022;223(6):1047-1052. doi: 10.1016/j.amjsurg.2021.10.024 [DOI] [PMC free article] [PubMed] [Google Scholar]

[old230004r3] 3.De Oliveira GS Jr, McCarthy RJ, Wolf MS, Holl J. The impact of health literacy in the care of surgical patients: a qualitative systematic review. BMC Surg. 2015;15:86. doi: 10.1186/s12893-015-0073-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[old230004r4] 4.Shoemaker SJ, Wolf MS, Brach C. The Patient Education Materials Assessment Tool (PEMAT) and User’s Guide. AHRQ Publication No. 14-0002-EF. Agency for Healthcare Research and Quality; 2013.

[old230004r5] 5.Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013;4:863. doi: 10.3389/fpsyg.2013.00863 [DOI] [PMC free article] [PubMed] [Google Scholar]

[old230004r6] 6.Drees J. Google receives more than 1 billion health questions every day. Becker’s Health IT. March 11, 2019. Accessed February 2023. https://www.beckershospitalreview.com/healthcare-information-technology/google-receives-more-than-1-billion-health-questions-every-day.html

PERMALINK

Comparison Between ChatGPT and Google Search as Sources of Postoperative Patient Instructions

Noel F Ayoub, MD, MBA

Yu-Jin Lee, MD, MS

David Grimm, BS

Karthik Balakrishnan, MD, MPH

Abstract

Methods

Table 1. Understandability, Actionability, and Procedure-Specific Scores for Each Procedure.

Results

Table 2. Comparison of ChatGPT, Google Search, and Institution Instructions.

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comparison Between ChatGPT and Google Search as Sources of Postoperative Patient Instructions

Noel F Ayoub, MD, MBA

Yu-Jin Lee, MD, MS

David Grimm, BS

Karthik Balakrishnan, MD, MPH

Abstract

Methods

Table 1. Understandability, Actionability, and Procedure-Specific Scores for Each Procedure.

Results

Table 2. Comparison of ChatGPT, Google Search, and Institution Instructions.

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases