Abstract
Informed consent is a crucial requirement of a patient's surgical care but can be a burdensome task. Artificial intelligence (AI) and machine learning language models may provide an alternative approach to writing detailed, readable consent forms in an efficient manner. No studies have assessed the accuracy and completeness of AI-generated consents for aesthetic plastic surgeries. This study aims to compare the length, reading level, accuracy, and completeness of informed consent forms that are AI chatbot (ChatGPT-4; OpenAI, San Francisco, CA) generated vs plastic surgeon generated for the most commonly performed aesthetic plastic surgeries. This study is a cross-sectional design comparing informed consent forms created by the American Association of Plastic Surgeons (ASPS) with informed consent forms generated by ChatGPT-4 for the 5 most commonly performed plastic surgery procedures: liposuction, breast augmentation, abdominoplasty, breast lift, and blepharoplasty. The average word count of ChatGPT forms was lower than the ASPS-generated forms (1023 vs 2901, P = .01). Average reading level for ChatGPT forms was also lower than the ASPS forms (11.2 vs 12.5, P = .02). There was no difference between accuracy and completeness scores for general descriptions of the surgery, risks, benefits, or alternatives. The mean overall impression score for ChatGPT consents was 2.33, whereas it was 2.23 for ASPS consent forms (P = .18). In this study, the authors demonstrate that informed consent forms generated by ChatGPT were significantly shorter and more readable than ASPS forms with no significant difference in completeness and accuracy.
Level of Evidence: 5 (Risk)
Informed consent is a crucial requirement of a patient's surgical care. Ethically, it is the foundation for shared decision making between a patient and provider.1 For an informed consent form to be valid, it must inform a patient of their treatment options along with all of the risks and alternatives for each option. Additionally, this information must be presented in a manner that the patient can fully comprehend in order to rationally reach a decision regarding a surgery.1 Writing a detailed, informed consent form suitable for patient understanding may be a tedious part of a surgeon's workflow, detract from face time with patients, and drive burnout.2 Physicians may spend up to twice as much time completing documentation and administrative tasks compared with time engaging with patients, contributing to high rates of surgeon dissatisfaction.3
Artificial intelligence (AI) and machine learning language models may provide an alternative approach to writing detailed, readable consent forms in an efficient manner. This is especially true with the advent of ChatGPT (OpenAI, San Francisco, CA), an AI chatbot capable of developing a response to a prompt in a matter of seconds.4 When judged by physician specialists, ChatGPT has been shown to generate “almost complete and completely correct” information regarding diverse medical questions.5 Using ChatGPT, surgeons may be able to develop sufficient informed consent forms more efficiently, easing their documentation load.
Decker et al demonstrated that AI chatbots have the potential to enhance informed consent documentation, showing that consent forms generated from ChatGPT were less complex and more accurate than those made by surgeons.6 However, no studies have assessed the potential of using AI chatbots to generate reliable, readable informed consent documentation for aesthetic plastic surgeries specifically. This study aims to compare the length, reading level, accuracy, and completeness of informed consent forms that are AI chatbot (ChatGPT-4) generated vs plastic surgeon generated for the most commonly performed aesthetic plastic surgeries.
METHODS
This study is a cross-sectional design comparing informed consent forms created by the American Society of Plastic Surgeons (ASPS) with informed consent forms generated by ChatGPT-4 in January 2024 for the 5 most commonly performed plastic surgery procedures, as listed by ASPS: liposuction, breast augmentation, abdominoplasty, breast lift, and blepharoplasty. The surgeon-generated consents were found online published by the ASPS and were accessible free of charge. The chatbot-generated informed consent forms were created using the following prompt “Generate an informed consent document for a patient for [procedure name] including a description of the surgery and an explanation of the risks, benefits, and alternatives of the surgery.” The above prompt did not include specific instructions on desired length or reading level. Further, the prompt for each surgery was input in separate ChatGPT sessions in an Incognito window to reduce any confounding effects from internet history or previous browser searches. Example documentation (ie, ASPS consent or other) was not uploaded to ChatGPT when the prompts were submitted.
In order to assess the readability of the consent forms, the word count of each form was calculated and compared and reading level was assessed using the Flesh–Kincaid grade level. This index calculates a reading level (comparable with the US educational grade levels) using word count, sentence count, and syllable count.
In order to assess completeness and accuracy of each consent form, 5 attending plastic surgeons from Emory University Hospital blindly assessed each informed consent form using a scoring rubric modeled after the system implemented by Decker et al.6 The evaluation form for each consent is included in Supplemental Table 1. ChatGPT and ASPS consents were altered to a standardized font and formatting and placed in a randomly generated order to better facilitate blinding. Surgeons used a 4-point scale (0 = inaccurate, 1 = absent, 2 = incomplete, 3 = complete and accurate) to assess the accuracy and completeness of the general description of the surgery, risks, benefits, and alternatives to surgery. Average scores across reviewers were calculated and independent t-tests compared mean scores between ChatGPT and ASPS forms.
RESULTS
A total of 10 informed consent forms were assessed, with 1 form generated from ChatGPT and the ASPS for each of the 5 most common surgeries (Appendix). The average word count of ChatGPT forms was lower than the ASPS-generated forms (1023 vs 2901, P = .01). Word counts for ChatGPT forms ranged from 970 (blepharoplasty) to 1142 (breast augmentation), compared with a range of 2027 words (abdominoplasty) to 3913 words (liposuction) for ASPS forms. Average reading level for ChatGPT forms was also lower than the ASPS forms (11.2 vs 12.5, P = .02). Among ChatGPT forms, breast lift and blepharoplasty had the lowest reading level (10.8), while liposuction had the highest (12.1). Among ASPS forms, liposuction had the lowest reading level (11.3) and breast augmentation had the highest (13.2).
As seen in Table 1, there was no statistically significant difference between accuracy and completeness scores for general descriptions of the surgery, risks, benefits, or alternatives. The mean overall impression score for ChatGPT consents was 2.33, whereas it was 2.23 for ASPS consent forms (P = .18).
Table 1.
Comparison of Attending Physicians' Completeness and Accuracy Scores for ChatGPT- vs ASPS-Generated Consent Forms
| ChatGPT scores Mean (SD) |
ASPS scores Mean (SD) |
P-value | |
|---|---|---|---|
| General description of surgery | 2.80 (0.41) | 2.84 (0.37) | .72 |
| Risks | 1.94 (0.68) | 1.76 (0.75) | .05 |
| Expected postoperative course | 2.16 (0.55) | 1.84 (0.85) | .12 |
| Expected pain | 1.80 (0.50) | 1.40 (0.65) | .02 |
| Potential complications | 2.48 (0.51) | 2.88 (0.33) | .002 |
| Recovery time | 1.64 (0.64) | 1.24 (0.53) | .02 |
| Expected restrictions or residual effects | 1.64 (0.76) | 1.44 (0.77) | .36 |
| Benefits | 2.80 (0.50) | 2.76 (0.52) | .78 |
| Description of how procedure will improve survival and/or symptoms | 2.80 (0.50) | 2.76 (0.52) | .78 |
| Alternatives | 2.84 (0.51) | 2.84 (0.51) | 1.00 |
| Expected outcome without procedure | 2.68 (0.69) | 2.76 (0.60) | .66 |
| Alternative treatments | 3.00 (0.00) | 2.92 (0.40) | .33 |
| Total average score | 2.33 (0.74) | 2.23 (0.90) | .18 |
ASPS, American Association of Plastic Surgeons; SD, standard deviation.
Within the risks subcategory, ChatGPT forms had higher scores in descriptions of expected pain (1.80 vs 1.40, P = .02) and recovery time (1.64 vs 1.24, P = .02) than ASPS forms. ASPS forms scored higher in descriptions of potential complications (2.88 vs 2.48, P = .002).
DISCUSSION
AI and ChatGPT are novel technologies that have a high potential to impact the medical field.4 Around 58.1% of physicians believe that the time spent documenting is not appropriate and reduces time spent with patients.2 Currently, only a few studies assess ChatGPT as a tool to create informed consent forms for patients, with no studies to our knowledge assessing ChatGPT as a tool for plastic surgeons. Our study demonstrates that informed consent forms generated by ChatGPT were significantly shorter and more readable than ASPS forms with no significant difference in completeness and accuracy.
Regarding readability, both average word count (1023 vs 2901, P = .01) and average grade level (11.2 vs 12.5, P = .02) were significantly lower for the ChatGPT-generated consents when compared with the ASPS consents. The nearly 3 times longer word count among ASPS forms translates to a significantly longer time to read and comprehend—a word count of 1023 translates to roughly 4 min 18 s to silently read, whereas a word count of 2901 translates to a read time of 12 min 11 s. Although ASPS forms require that surgeons generate (either by typing or writing) the entire length of the consent form, ChatGPT forms in this study were generated in an average of 11 s.
Although the difference in reading grade level was statistically significant in this study, these differences may not be clinically significant. The average adult in the United States reads at an eighth grade reading level, and the National Institute of Health and American Medical Association recommends that material intended for patients should be written between a sixth and eighth grade reading level.7 All 10 forms in this study generated by both ChatGPT and ASPS consent forms were at grade levels higher than an eighth grade reading level. Consent forms from both sources can improve readability in order to improve accessibility and patient understanding.
There was no significant difference in average reviewer scores for any major category (general description, risks, benefits, and alternatives) between the ChatGPT and ASPS forms. However, among the risks, subcategories, expected pain, potential complications, and recovery time were significantly different between the 2 groups. ChatGPT forms often included additional information that set helpful expectations for surgery but may not be traditionally included in a consent, perhaps inflating their scores in these categories. For example, the blepharoplasty consent generated by ChatGPT includes a section on swelling, bruising, and discomfort that may be present after surgery and recommends cold compresses to manage these symptoms. It also includes a cautionary statement against wearing contact lenses after surgery, whereas the ASPS form does not. Although included in the Decker table and metric analysis of consent forms, these descriptions may not be universally included in traditional consent forms and thus may have undervalued ASPS-form performance in these categories.
The mean scores for the overall risk category were 1.94 for the ChatGPT forms and 1.76 for the ASPS forms. Although there was not a significant difference in these scores (P = .05), the lower scores for risks compared with other consent categories indicate that the independent reviewers felt that discussion of risks for both consents was relatively shallow. One of the key elements for an informed consent document is for patients to receive adequate information regarding risks and benefits of treatment in order to make a rational decision.1 Perhaps these results suggest that more care must be put into ensuring that risks of a surgical procedure are clearly communicated to patients, regardless of how an informed consent document is generated.
There were significant limitations to this study to consider. Our study analyzed the performance of ChatGPT-4 as 1 widely used and high-powered language chatbot and thus may not be reproducible among users using earlier generations of the platform or other AI models. The landscape of AI chatbots is ever changing, and although our research does not analyze the most recent models, it is the first that our authors are aware of that directly compares ASPS- vs AI-generated consents. ChatGPT was chosen as a representative example of AI language models and is arguably the most widely known and used AI chatbot. Future research efforts may be directed to include newer models. This study included only attending plastic surgeon reviewers at a single academic institution and thus may not be generalized to surgeons at other institutions or in private practice settings. Perhaps most importantly, this study assesses only the written language of consent forms and does not include the verbal dialogue that occurs between patients and physicians to substantiate their understanding of the procedure prior to consenting. Future studies can expand upon the current experimental design by including attending plastic surgeon reviewers at multiple institutions and utilizing various publicly available AI models.
CONCLUSIONS
The results of our study suggest that ChatGPT-generated informed consent forms may provide shorter, more readable, and comparably reliable information regarding common plastic surgery procedures. ChatGPT may be a promising new avenue for surgeons to provide more efficient and effective documentation for patients. As data continue to support the efficacy of AI-generated documentation, surgeons may increasingly rely on AI-based language models to reduce their own documentation time while still reliably supporting patient understanding. Patients may also seek and utilize AI-generated information to supplement informed consent conversations with providers. ChatGPT may be a promising new avenue for surgeons to provide more efficient and effective documentation for patients.
Supplemental Material
This article contains supplemental material located online at https://doi.org/10.1093/asjof/ojae092.
Supplementary Material
Disclosures
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Funding
The authors received no financial support for the research, authorship, and publication of this article, including payment of the article processing charge.
REFERENCES
- 1. Bernat JL. Patient-centered informed consent in surgical practice. Arch Surg. 2006;141:86. doi: 10.1001/archsurg.141.1.86 [DOI] [PubMed] [Google Scholar]
- 2. Gaffney A, Woolhandler S, Cai C, et al. Medical documentation burden among US office-based physicians in 2019: a national study. JAMA Intern Med. 2022;182:564–566. doi: 10.1001/jamainternmed.2022.0372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ball CG, McBeth PB. The impact of documentation burden on patient care and surgeon satisfaction. Can J Surg. 2021;64:E457–E458. doi: 10.1503/cjs.013921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber Phys Syst. 2023;3:121–154. doi: 10.1016/j.iotcps.2023.04.003 [DOI] [Google Scholar]
- 5. Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq. 2023. 10.21203/rs.3.rs-2566942/v1 [DOI] [Google Scholar]
- 6. Decker H, Trang K, Ramirez J, et al. Large language model−based chatbot vs surgeon-generated informed consent documentation for common procedures. JAMA Netw Open. 2023;6:e2336997. doi: 10.1001/jamanetworkopen.2023.36997 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Mertz K, Burn MB, Eppler SL, Kamal RN. The reading level of surgical consent forms in hand surgery. J Hand Surg Glob Online. 2019;1:149–153. doi: 10.1016/j.jhsg.2019.04.003 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
