Clinician Perspectives on AI-Generated Drafts of Patient Test Result Explanations

Shreya J Shah; Abishek Nair; Kirsten Murtagh; Stephen P Ma; Kyle Vogt; Danyelle Clutter; Liban Sheikh; Haley Schmidt; Margaret Smith; Arun Lakhotia; Lance Bullock; Aditya Bhasin; Michael A Pfeffer; Christopher Sharp; Steven Lin; Patricia Garcia

doi:10.1001/jamanetworkopen.2025.28794

. 2025 Aug 22;8(8):e2528794. doi: 10.1001/jamanetworkopen.2025.28794

Clinician Perspectives on AI-Generated Drafts of Patient Test Result Explanations

Shreya J Shah ^1,^2,^✉, Abishek Nair ³, Kirsten Murtagh ², Stephen P Ma ¹, Kyle Vogt ³, Danyelle Clutter ³, Liban Sheikh ³, Haley Schmidt ³, Margaret Smith ², Arun Lakhotia ³, Lance Bullock ³, Aditya Bhasin ³, Michael A Pfeffer ^1,³, Christopher Sharp ¹, Steven Lin ^1,², Patricia Garcia ¹

¹Department of Medicine, Stanford University School of Medicine, Stanford, California

²Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Stanford University School of Medicine, Stanford, California

³Technology and Digital Solutions, Stanford Medicine, Stanford, California

Accepted for Publication: June 24, 2025.

Published: August 22, 2025. doi:10.1001/jamanetworkopen.2025.28794

^✉

Corresponding Author: Shreya J. Shah, MD, Department of Medicine, Stanford University School of Medicine, 585 Broadway, Ste 800, Redwood City, CA 94063 (sjshah@stanford.edu).

Author Contributions: Dr Shah had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Shah, Nair, Murtagh, Ma, Vogt, Smith, Lakhotia, Bullock, Bhasin, Pfeffer, Sharp, Lin, Garcia.

Acquisition, analysis, or interpretation of data: Shah, Nair, Murtagh, Vogt, Clutter, Sheikh, Schmidt, Garcia.

Drafting of the manuscript: Shah, Nair, Lakhotia, Garcia.

Critical review of the manuscript for important intellectual content: Shah, Nair, Murtagh, Ma, Vogt, Clutter, Sheikh, Schmidt, Smith, Bullock, Bhasin, Pfeffer, Sharp, Lin, Garcia.

Statistical analysis: Shah, Nair, Murtagh.

Obtained funding: Smith, Bhasin, Sharp.

Administrative, technical, or material support: Shah, Nair, Murtagh, Vogt, Clutter, Sheikh, Schmidt, Lakhotia, Bullock, Bhasin, Pfeffer, Sharp, Lin, Garcia.

Supervision: Shah, Nair, Ma, Vogt, Smith, Bhasin, Pfeffer, Sharp, Lin, Garcia.

Conflict of Interest Disclosures: Dr Shah reported that the institution received Amazon Web Services (AWS) cloud credit while the pilot was underway. Ms Smith reported receiving grants from the American Board of Family Medicine Foundation, Google Health, Omada Health, and MDI Health outside the submitted work. Dr Pfeffer reported that the institution received AWS cloud credit while the pilot was underway. Dr Lin reported receiving personal fees from Google and Codex Health and grants from Omada Health. No other disclosures were reported.

Data Sharing Statement: See Supplement 2.

Additional Contributions: The following groups contributed to this work: Stanford Health Care, Stanford Medicine Technology and Digital Solutions, MyHealth at Stanford, Stanford Informatics Education, Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health in the Stanford University Department of Medicine, and Stanford Medicine Partners.

Additional Information: The selection of the large language model (Claude 3.5 Sonnet via Amazon Bedrock, an AWS service) and the design of the pilot were completed independently and prior to any discussion with AWS regarding cloud credits.

^✉

Corresponding author.

PMCID: PMC12374212 PMID: 40844780

Abstract

This quality improvement study evaluates clinician perspectives on the usability and utility of generative artificial intelligence (AI)–based large language model tool to draft result comments for laboratory, imaging, and pathology results.

Introduction

The 21st Century Cures Act mandating immediate release of test results to patients enhanced transparency but also led to patient anxiety.^1,2 Patients prefer results explained directly by clinicians.² Generative artificial intelligence (AI) provides an opportunity to enhance patient understanding of test results³ while reducing clinician burden and burnout.^4,5 Stanford Health Care developed a novel generative AI-based large language model (LLM) tool to draft result comments for laboratory, imaging, and pathology results. This study evaluated clinician perspectives from a pilot implementation guided by the RE-AIM/PRISM framework,^5,6 assessing usability, utility, and suggestions for tool improvement.

Methods

This prospective quality improvement study was conducted from January 13 through March 31, 2025, and followed the SQUIRE reporting guideline. The institutional review board (IRB) at Stanford University determined that this study met the criteria for quality improvement and was exempt from the need for IRB–mandated consent. Stanford Health Care developed an electronic health record–integrated tool for drafting result comments similar in concept to a previously studied AI inbox message replies tool.⁵ Claude 3.5 Sonnet (Anthropic) was selected as the LLM for the pilot based on its superior response time, fidelity to prompt instructions, and ability to generate outputs resembling result comments from clinicians. Primary care clinicians across both faculty and community practice networks were invited to participate in 2 waves staggered by 4 weeks, with postsurveys administered after 4 or 8 weeks of tool use. Surveys, guided by RE-AIM/PRISM, assessed usability and utility (5-point Likert scale), and perceived time impact per result comment (eMethods in Supplement 1). Descriptive statistics were used to summarize Likert responses; “strongly agree” and “agree” responses were combined to report survey results. Free-text survey responses were systematically analyzed by 2 researchers (S.J.S., K.M.) using deductive thematic coding followed by inductive theme identification during consensus reconciliation. Comments were segmented into phrases; assigned a positive, negative, or neutral sentiment; and summarized. Phrases were allowed to have multiple codes. Microsoft Excel was used for analysis.

Results

Of 244 clinicians who used the tool at least once, 93 (38.1%) completed postsurveys (62 [66.7%] female; 31 [33.3%] male; 46 [49.5%] with ≥15 years after training in practice). Clinicians reported favorable usability (79 [84.9%] found the tool easy to use) and utility, particularly for laboratory (67 [72.0%]) and imaging (59 [63.4%]) results, improved efficiency (66 [71.0%]), and higher quality explanations (67 [72.0%]) (Figure). Over half reported using the tool frequently (54 [58.1%]) and that the tool was ready for broad implementation (50 [53.8%]). Most (77 [82.8%]) anticipated long-term use, and 39 (41.9%) felt motivated to send result comments to patients more frequently. Median perceived time savings per result was 1.1 minutes (range, 5.0 minutes saved to 3.3 minutes additional time spent).

Themes from free-text responses included positive sentiments regarding tool utility and patient engagement and negative sentiments related to content accuracy and completeness (Table). Clinicians offered specific suggestions for tool optimization (eg, including more patient context from recent visit notes) and improved workflow integration (eg, updating drafts sequentially for results released over time).

Table. Qualitative Encoding of Free-Text Comments From Surveys.

Theme	Representative quotations	Comments, No.
Theme	Representative quotations	Negative	Neutral	Positive	Total
Overall tool utility	Positive: “Very helpful in explaining results to patients.” Negative: “The result comments of ‘this is common for your age’” is not helpful. The order of lab results seems random and would be better if customizable.”	2	3	20	25
Content accuracy	Positive: “It is getting better and more accurate in the area of lab results and imaging results.” Negative: “Lab results needs more work—some labs not addressed or incorrectly addressed.”	13	1	2	16
Content completeness	Positive: “They touch on all results, which I often used to skip, but I imagine the patient appreciates.” Negative: “Some result comments omit providing info to components, which is important with abnormal lab values.”	6	1	4	11
Impact on workflow	Positive: “I liked how the tool pulled up prior historical data easily—that makes it a good cross reference.” Negative: “When new results arrive on a single order placement (and the Comment is recreated), it’s a clunky workflow since the other results already have comments attached.”	3	1	6	10
Draft result comment length	Positive: “Particularly useful when covering another MD’s inbox! Appreciate the ‘to the point’ brevity in human language.” Negative: “Would also like it if it can be more concise sometimes it gives a little too much information.”	6	0	1	7
Future use and readiness for scale	Positive: “I am willing to continue to use, as I’m confident the tool will improve and ultimately be a time-saver.” Negative: “May need more time and refinement before prime-time broader use of the tool.”	2	0	5	7
Impact on patient engagement	Positive: “I love the quality and content of the drafts. They are more complete and empathetic than I historically wrote.” Negative: “When I have used the AI-drafted interpretation, the patient will message me with more questions so I’ve stopped using this feature as much as I would have liked.”	1	0	6	7
Impact on time and efficiency	Positive: “I think it really helps me release results quickly to the patients who are waiting and act on them accordingly.” Negative: “I use a dotphrase for lab comments and the AI response takes longer.”	1	1	4	6
Voice and tone	Positive: “Great tool—I really like the tone of the comments as well!” Negative: “It does not sound like ‘me.’ I still edit for clarity.”	1	1	2	4
Utility specific to imaging results	Positive: “I find the radiology comments to be the most helpful and the language used sounds the most like language that I would produce myself.” Negative: NA	0	1	2	3
Utility specific to lab results	Positive: NA Negative: “I like echo, radiology report. For lab results, it does not interpret all the labs were done, missing some important labs and I’m not sure why.”	3	0	0	3
Utility specific to pathology results	Neutral: “I’m not sure whether it works well for pathology—not enough experience!”	0	2	0	2
Total	NA	38	11	53	102

Open in a new tab

Abbreviations: AI, artificial intelligence; NA, not applicable.

Discussion

This study demonstrated the utility of a generative AI tool for drafting test result explanations, highlighting ease of use, improved efficiency, and higher-quality explanations. Barriers to adoption included content accuracy and completeness. Limitations include selection bias, limited generalizability beyond primary care, and underrepresentation of certain test types (eg, pathology). While these early results suggest that AI-generated draft result comments could help reduce clinician burden and enhance patient experience, further optimization grounded in clinician feedback is needed to improve accuracy and completeness of draft explanations. Additional improvements should focus on optimizing prompts, updating the LLM, incorporating patient-specific context, and streamlining workflow integration. Future evaluations should quantify impacts on clinician inbox burden (time spent, message volume) and consider patient perspectives.

Supplement 1.

eMethods. Survey

jamanetwopen-e2528794-s001.pdf^{(246.3KB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e2528794-s002.pdf^{(13.6KB, pdf)}

References

1.Steitz BD, Turer RW, Lin CT, et al. Perspectives of patients about immediate access to test results through an online patient portal. JAMA Netw Open. 2023;6(3):e233572. doi: 10.1001/jamanetworkopen.2023.3572 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lustria MLA, Aliche O, Killian MO, He Z. Enhancing patient engagement and understanding: is providing direct access to laboratory results through patient portals adequate? JAMIA Open. 2025;8(2):ooaf009. doi: 10.1093/jamiaopen/ooaf009 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.He Z, Bhasuran B, Jin Q, et al. Quality of answers of generative large language models versus peer users for interpreting laboratory test results for lay patients: evaluation study. J Med Internet Res. 2024;26:e56655. doi: 10.2196/56655 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Holmgren AJ, Downing NL, Tang M, Sharp C, Longhurst C, Huckman RS. Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic health record use. J Am Med Inform Assoc. 2022;29(3):453-460. doi: 10.1093/jamia/ocab268 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Garcia P, Ma SP, Shah S, et al. Artificial intelligence-generated draft replies to patient inbox messages. JAMA Netw Open. 2024;7(3):e243201. doi: 10.1001/jamanetworkopen.2024.3201 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chan SL, Lee JW, Ong MEH, et al. Implementation of prediction models in the emergency department from an implementation science perspective-determinants, outcomes, and real-world impact: a scoping review. Ann Emerg Med. 2023;82(1):22-36. doi: 10.1016/j.annemergmed.2023.02.001 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eMethods. Survey

jamanetwopen-e2528794-s001.pdf^{(246.3KB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e2528794-s002.pdf^{(13.6KB, pdf)}

[zld250182r1] 1.Steitz BD, Turer RW, Lin CT, et al. Perspectives of patients about immediate access to test results through an online patient portal. JAMA Netw Open. 2023;6(3):e233572. doi: 10.1001/jamanetworkopen.2023.3572 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zld250182r2] 2.Lustria MLA, Aliche O, Killian MO, He Z. Enhancing patient engagement and understanding: is providing direct access to laboratory results through patient portals adequate? JAMIA Open. 2025;8(2):ooaf009. doi: 10.1093/jamiaopen/ooaf009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zld250182r3] 3.He Z, Bhasuran B, Jin Q, et al. Quality of answers of generative large language models versus peer users for interpreting laboratory test results for lay patients: evaluation study. J Med Internet Res. 2024;26:e56655. doi: 10.2196/56655 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zld250182r4] 4.Holmgren AJ, Downing NL, Tang M, Sharp C, Longhurst C, Huckman RS. Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic health record use. J Am Med Inform Assoc. 2022;29(3):453-460. doi: 10.1093/jamia/ocab268 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zld250182r5] 5.Garcia P, Ma SP, Shah S, et al. Artificial intelligence-generated draft replies to patient inbox messages. JAMA Netw Open. 2024;7(3):e243201. doi: 10.1001/jamanetworkopen.2024.3201 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zld250182r6] 6.Chan SL, Lee JW, Ong MEH, et al. Implementation of prediction models in the emergency department from an implementation science perspective-determinants, outcomes, and real-world impact: a scoping review. Ann Emerg Med. 2023;82(1):22-36. doi: 10.1016/j.annemergmed.2023.02.001 [DOI] [PubMed] [Google Scholar]

PERMALINK

Clinician Perspectives on AI-Generated Drafts of Patient Test Result Explanations

Shreya J Shah, MD

Abishek Nair, BTech

Kirsten Murtagh, MA, BA

Stephen P Ma, MD, PhD

Kyle Vogt, BA

Danyelle Clutter, MBA

Liban Sheikh, BS

Haley Schmidt, MBA

Margaret Smith, MBA

Arun Lakhotia, MS

Lance Bullock, BS

Aditya Bhasin, MS

Michael A Pfeffer, MD

Christopher Sharp, MD

Steven Lin, MD

Patricia Garcia, MD

Abstract

Introduction

Methods

Results

Figure. Postsurvey Likert Scale Results From 93 Respondents.

Table. Qualitative Encoding of Free-Text Comments From Surveys.

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Clinician Perspectives on AI-Generated Drafts of Patient Test Result Explanations

Shreya J Shah, MD

Abishek Nair, BTech

Kirsten Murtagh, MA, BA

Stephen P Ma, MD, PhD

Kyle Vogt, BA

Danyelle Clutter, MBA

Liban Sheikh, BS

Haley Schmidt, MBA

Margaret Smith, MBA

Arun Lakhotia, MS

Lance Bullock, BS

Aditya Bhasin, MS

Michael A Pfeffer, MD

Christopher Sharp, MD

Steven Lin, MD

Patricia Garcia, MD

Abstract

Introduction

Methods

Results

Figure. Postsurvey Likert Scale Results From 93 Respondents.

Table. Qualitative Encoding of Free-Text Comments From Surveys.

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases