Abstract
Background and Objective
Electronic health records (EHRs) have modernized care but increased documentation burden and clinician burnout. Ambient artificial intelligence (AI) scribes, combining automated speech recognition (ASR), natural language processing (NLP), and generative AI, aim to address this by capturing encounters and generating documentation. Related technologies, including virtual assistants and autonomous patient-facing systems, extend these capabilities beyond the clinician’s physical presence. This narrative review synthesizes current evidence on the real-world performance, implementation, and impact of these AI tools.
Methods
A narrative literature search was conducted using PubMed, supplemented by a manual review of reference lists from key articles. The search covered studies published between January 2019 and June 2025. After screening and full-text review, 18 studies met inclusion criteria and were incorporated into this review.
Key Content and Findings
AI scribes consistently reduce documentation burden and cognitive load, improve workflow efficiency, save time, and enhance patient–clinician interaction by allowing greater clinician focus. However, studies also report frequent documentation omissions and occasional clinically significant hallucinations. Implementation remains a sociotechnical challenge involving workflow redesign, medico-legal considerations, and preservation of the patient-clinician relationship. In cardiology, where documentation requires precise, time-sensitive detail, AI-related errors may carry greater risk, underscoring the need for specialty-specific validation.
Conclusions
Ambient AI scribes show promise in reducing workload, improving efficiency, and decreasing burnout, but current systems still generate high omission rates and intermittent factual inaccuracies that may affect clinical decision-making. Evidence remains limited by small cohorts and methodological variability, warranting cautious interpretation. More rigorous, standardized evaluations are needed before routine clinical adoption.
Keywords: Artificial intelligence (AI), ambient artificial intelligence scribe (ambient AI scribe), accuracy, doctor-patient relationship, safety
Introduction
Background
In the modern healthcare landscape, clinical documentation represents a paradox: it is both indispensable for quality care, communication, and legal compliance, yet it is also a primary source of profound professional clinicians’ dissatisfaction and burnout (1). The widespread adoption of the electronic health record (EHR), intended to improve care through digital documentation, has not eased this burden. Instead, it has further bound clinicians to their screens. Time-and-motion studies show that for every hour of direct patient care, physicians spend about two hours on EHR and administrative tasks (1). This heavy clerical load has led to “pajama time”, the after-hours work clinicians complete at home to finish documentation (2).
This persistent burden is not a mere inconvenience; it is a primary driver of the escalating crisis of clinician burnout. Defined as a syndrome of emotional exhaustion, depersonalization, and a low sense of personal accomplishment, it could lead to more medical errors, lower patient satisfaction, and higher physician turnover, threatening patient safety and workforce sustainability (3).
Healthcare systems used in-person and virtual medical scribes to reduce documentation burdens, but studies found that despite easing clerical work, the approach introduced significant challenges (4,5). These included substantial financial costs associated with hiring and training, high staff turnover, extensive training periods required to achieve proficiency, and frequent inconsistencies in note quality and style (4,5). Standard speech-recognition software, another alternative, often proved cumbersome but required line-by-line dictation and frequent, tedious correction, and it fundamentally failed to capture the unstructured, free-flowing, and conversational nature of a genuine clinical encounter (6).
Into this challenging environment enters the ambient artificial intelligence (AI) scribe, a technology designed to redefine clinical documentation. Ambient AI scribes passively listen to clinical encounters and automatically generate documentation, helping reduce workload, documentation time, and burnout while improving patient-physician interaction and workflow efficiency (7). These advanced systems, such as Nuance’s Dragon ambient eXperience (DAX), are powered by a sophisticated combination of automated speech recognition (ASR), natural language processing (NLP), and large language models (LLMs) (2,8). Operating “ambiently”, they convert raw provider-patient conversation into structured notes, often in a standard format like SOAP (Subjective, Objective, Assessment, and Plan), with minimal clinician input (2,8).
Virtual assistants are another AI-driven tool designed to simulate human conversation and provide personalized health responses based on that. Capabilities range from simple menu or multiple choice–based assistants to more sophisticated conversational AI virtual assistants with NLP that recognize free speech or text (9). A patient-facing device is an agentic AI system designed to interact autonomously with patients. They can independently initiate actions based on contextual understanding and perform proactive workflow management and gather detailed histories and symptoms (10).
Rationale and knowledge gap
The promise of AI technology is immense and profound: to un-tether the clinician from the keyboard, restore face-to-face interaction, reduce the immense cognitive load of documentation, and reclaim precious personal time, thereby directly combating one of the primary drivers of burnout. However, as with any transformative technology in the high-stakes environment of healthcare, this promise is shadowed by critical questions of safety, accuracy, reliability, and equity. As healthcare organizations rapidly begin to pilot and adopt these technologies, a critical body of evidence is emerging from early-stage evaluations. Previous literature has examined the role of ambient AI in healthcare documentation; however, important gaps remain. Existing studies have not fully addressed several essential aspects, including detailed insights into the various AI systems, their use across different subspecialties, and a broader overview of all published work related to ambient AI scribes (11). In our paper, we also discuss the different agentic AI systems, further contributing to a more complete understanding of this evolving field.
Objectives
This narrative review seeks to explore the impact of AI scribes by examining their performance across five key domains: (I) the accuracy and safety of the documentation they produce; (II) their effect on clinician efficiency and workflow; (III) their influence on the clinician experience, particularly regarding burnout and well-being; (IV) their perceived impact on the patient-clinician relationship; and (V) the practical challenges and considerations for their real-world implementation. A visual summary is provided in the Graphical Abstract (Figure 1). We present this article in accordance with the Narrative Review reporting checklist (available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-454/rc).
Figure 1.
Graphical abstract: the impact of ambient Al scribes. AI, artificial intelligence.
Methods
We conducted a narrative literature search to identify relevant studies on the implementation and impact of ambient AI scribes in healthcare, querying PubMed and manually reviewing references from key articles to capture additional sources. The search spanned publications from January 2019 to June 2025, without restrictions on publication status, and was limited to English-language studies. Eligible designs included pilot trials, cohort studies, qualitative research, simulation studies, and randomized controlled trials evaluating ambient AI scribe systems or comparable technologies in clinical practice. Exclusion criteria included review papers, conference abstracts, editorials, commentaries, preprints, and opinion pieces. Search terms combined keywords and Boolean operators, including “ambient scribe”, “digital scribe”, “artificial intelligence”, “clinical documentation”, and “speech recognition”.
The literature search was last conducted on June 01, 2025, yielded a total of 480 records (310 duplicates). After screening 170 articles based on titles and abstracts by a single reviewer (with disagreements addressed through discussion with other authors if present), 20 articles were selected for full-text review. Following full-text evaluation, 17 articles met the inclusion criteria and were incorporated into this narrative review, and one additional article was identified through a review of the reference lists of the included studies. The complete search strategy and the PRISMA-style flow of study selection are presented in Table 1 and Figure 2, respectively.
Table 1. Search strategy summary.
| Items | Specification |
|---|---|
| Date of search | 06/01/2025 |
| Database searched | PubMed |
| Search terms used | Search terms combined keywords and Boolean operators, including “ambient scribe”, “digital scribe”, “artificial intelligence”, “clinical documentation”, and “speech recognition” |
| Timeframe | January 2019 to June 2025 |
| Inclusion and exclusion criteria | No restrictions on publication status. Exclusion criteria included review papers, conference abstracts, editorials, non-English language studies, commentaries, preprints, and opinion pieces. Eligible designs included pilot trials, cohort studies, qualitative research, simulation studies, and randomized controlled trials evaluating ambient AI scribe systems or comparable technologies in clinical practice |
| Selection process | The selection process was conducted by the first author. For articles where there was uncertainty regarding eligibility, the first author consulted with a co-author, who independently reviewed the article, and consensus was reached through discussion |
Figure 2.
Flow diagram for searching of databases only.
Main studies summarizing the current AI scribe knowledge
Recent studies spanning simulated and real-world outpatient settings across the U.S., U.K., and Australia have examined tools such as Nuance DAX, Abridge, ChatGPT-4, Nabla, and TORTUS (2,8,12-22). These interventions, implemented primarily in primary care and multispecialty contexts, were assessed through a mix of pre-post studies, qualitative methods, simulations, and limited randomized designs. Most tools demonstrated modest to significant reductions in documentation time (ranging from ~1 to 2.1 minutes per note), alongside improvements in perceived usability, clinician satisfaction, and reduced cognitive load or disengagement (2,8,12-22). Simulated environments often revealed high omission and hallucination rates in AI-generated notes, underscoring concerns about documentation accuracy and variability between vendors. While real-world integrations with systems like Epic and Cerner showed promise to enhancing workflow efficiency and user experience yet the evidence remains heterogeneous, with limitations such as small sample sizes, lack of randomization, and potential selection biases (2,8,12-22). Publication bias remains a key limitation, as unsuccessful or challenged AI scribe implementations are less likely to be reported, limiting insight into real-world feasibility. The unblinded design of most studies also introduces observer and performance bias, warranting cautious interpretation of benefits. Despite these issues, existing evidence suggests that well-designed AI tools have the potential to improve documentation and support clinician well-being. Table 2 summarizes the main studies discussing the role of AI scribe in health care.
Table 2. Overview of main studies.
| Study (year) | Setting/population | Intervention/AI tool | Study design | Sample size | Clinical context | Provider type | Comparator/control | Note/EHR integration | Outcomes measured | Time reduction | Burnout/satisfaction | Documentation error rate/type | Note quality | PX | Main findings |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Kernberg et al. (2024) | Simulated ambulatory, U.S., 14 cases (3×) | ChatGPT-4 (SOAP notes) | Comparative, simulation | 14 cases × 3 repetitions =42 | Multispecialty | Simulated/Doctor of Medicine (MD) | Gold standard | No | Note accuracy, omission/addition, PDQI-9 | N/A | N/A; simulated | 86% omission, 23.6/case | PDQI-9 (average score 29.7) | N/A | High omissions, accuracy varies by case, ↓ accuracy with ↑ transcript length |
| Kocaballi et al. (2020) | GP, Australia | Prototype AI assistant | Qualitative, co-design | 16 GPs | Primary care | MD (GP) | None | Video demonstration | Themes: AI role, workflow, trust | N/A | N/A | N/A | N/A | N/A | GPs prefer adaptive/human-in-loop, concern regarding medico-legal implications |
| Stults et al. (2025) | Outpatient, Sutter Health, U.S. | Abridge ambient scribe | Pre-post, quality QI, survey + EHR | 100 clinicians, 57 paired | Multispecialty | MD, NP, APP | Baseline/pre | Integrated | Burnout, NASA-TLX, EHR time, after-hours | −0.9 min/pt | 7% burnout↓, satisfaction↑ | Not specified | Usability | ↑ | Changes in burnout are non-significant, but workflow is better, note time↓, NASA-TLX↓, primary care provider more positive |
| Liu et al. (2024) | Atrium Health/Wake Forest (PCP, academic, U.S.) | DAX Copilot (Nuance/Microsoft) | Longitudinal cohort study | 112 DAX, 103 control | Primary care, multispecialty | MD, DO and APP | Controls | Not available | EHR use metrics (EHR-Time, Note-Time, closure rates, note length), (wRVUs) | ~7% documentation Hours↓ | N/A | Not specified | Not specified | N/A | Objective gains for high DAX users only |
| Biro et al. (2025) | Simulated outpatient scripts, U.S. | 2 commercial ADS (unnamed) | Simulated experiment | 44 notes, 11 scripts | Multispecialty | Residents | None | No | Error frequency/type (omission, etc.) | N/A | N/A | 70% of notes had errors, B 54% A 83% omission | N/A | N/A | High errors (omissions most) varies by vendor/product |
| Owens et al. (2024) | UM Health West, U.S. | Nuance DAX | Observational, survey/time | 110 invited | Primary care | MD, NP, PA | Baseline | Integrated | OLBI, documentation time, PX | −1.8 min/note | Disengagement↓ | Not quantified | Not specified | ↑ | Documentation time↓, less after-hours, disengagement improved, PX better (open label) |
| Albrecht et al. (2025) | U Kansas Med, 30 ambulatories. U.S. | Abridge ambient AI | Pre + post anonymous surveys | 181 invited, ~93–99 pre/post | Multispecialty | MD, DO, APP | None | Some integration | Workflow ease, after-hours, completion, burnout | Not specified | 67% less burnout | Not specified | Survey | ↑ | Workflow ease; 77% less after hours; 64% more satisfaction |
| Duggan et al. (2025) | Academic. outpatient, Philadelphia, 17 specialties, U.S. | DAX Copilot (Nuance, in EHR) | Single-arm pre-post QI, feedback | 46 clinicians | Multispecialty | MD, NP, PA | Baseline | Epic | EHR time, note closure, after-hours | −2.1 min/note | Mental burden/engagement ↑ | Not reported | Not specified | ↑ | Time-in-notes↓20%, after-hours↓30%, same-day closure↑ |
| Shah et al. (2024) | Stanford Health, U.S. ambulatory | DAX Copilot (Nuance/Epic) | Prospective pre-post QI | 48 MDs (38 paired) | PCP + specialists | MD only | None | Integrated/smart sections | NASA-TLX, PFI (burnout), SUS, time | Median 20 min/half-day | PFI burnout↓, SUS↑ | Not reported | SUS | ↑ | Task load↓, usability+, median 20 min saved/half-day |
| Haberle et al. (2024) | Intermountain Health, 12 specialties, U.S. | Nuance DAX ambient scribe | Peer-matched cohort | 99 DAX, 76 controls | Multispecialty | MD, DO | Controls | Oracle Cerner | Document time/note, after-hours, safety, panel size | −0.76 min/note | Engagement↑, not full burnout | Not reported | Not reported | = | Documentation time per note↓, after-hours EHR slight↑, engagement↑, no PX change |
| Balloch et al. (2024) | U.K. simulated pediatric (mock clinics) | TORTUS ambient AI | Crossover, simulated | 8 clinicians, 47 consults | Pediatrics (simulated) | MD, consultant | Simulated charting/EHR | Not clinical EHR | SAIL (document quality), time, NASA-TLX, PX | −193 s/consult | Task load↓, attention/focus↑ | Minor/major hallucination (low%) | SAIL | Actor focus ↑ | SAIL documentation quality↑, consults shorter-26%, task load↓, attention/focus↑ |
| Kakaday et al. (2025) | Samaritan Health, operation room, U.S., PCP/urgent care | DAX Copilot (Nuance) | Randomized pilot | 45 (25 DAX, 20 control) | PCP + urgent care + multispecialty | MD, NP, PA | Control | Epic | Document efficiency: time, note characters, % done by DAX | −1.4 min/visit | Not assessed | 50% note characters by DAX (high users) | Not reported | N/A | Documentation time↓; high users: 50% characters by DAX |
| Misurac et al. (2025) | University of Iowa Health, U.S.; volunteers | Nabla ambient AI | Pre–post, 5wk | 38 (35 completed) | Multispecialty | MDs only | Baseline | Not fully integrated | Stanford PFI (burnout), fulfillment | Not reported | Burnout↓69→43%, fulfillment not significant | Not detailed | Not reported | not significant | Burnout↓; fulfillment↑ (not significant) |
| Nguyen et al. (2023) | Moffitt Cancer, U.S. | Nuance DAX ambient | Pre/post survey + interviews | 9 post-surveys, 8 interviews | Oncology/oncologists | MD | None | Not specified | Mini-Z, sleep, usability, edit burden | Not measured | Burnout perception improved (qualitative) | Not specified | Not detailed | Some ↑ | Perception: feasible, edit burden noted, not statistical burnout difference (scores) |
ADS, ambient digital scribe; AI, artificial intelligence; APP, advanced practice provider; DAX, Dragon ambient eXperience; DO, Doctor of Osteopathic Medicine; EHR, electronic health record; GP, general practitioner; MD, Medical Doctor; N/A, not applicable; NASA-TLX, NASA Task Load Index; NP, nurse practitioner; OLBI, Oldenburg Burnout Inventory; PA, physician assistant; PCP, primary care physician; PDRQ-9, Patient-Doctor Relationship Questionnaire-9; PFI, Professional Fulfillment Index; PX, patient experience; QI, quality improvement; SAIL, Sheffield Assessment Instrument for Letters; SOAP, Subjective, Objective, Assessment, and Plan; SUS, System Usability Scale; U.K, United Kingdom; U.S, United States; wRVUs, work relative value unit.
The quest for accuracy and safety
Clinical documentation tools must be accurate and safe, as errors in the medical record can cause diagnostic delays, treatment mistakes, and patient harm. Although AI scribes can capture more comprehensive encounters, their probabilistic design introduces new error risks, and evaluations show wide variability with potentially serious pitfalls.
This can be more important in cardiovascular field as the diagnosis and treatment often rely on subtle and time-sensitive clinical details, such as characterizing chest pain, documenting arrhythmia features, or recognizing early signs of heart failure. Documentation inaccuracies introduced by ambient AI scribes may carry heightened risks in this specialty. These highlight how even small documentation errors can have disproportionate consequences. Thus, future studies should focus on covering these aspects.
A sobering picture: high error rates and inconsistency
Two of the most detailed studies on AI scribe accuracy, conducted in simulated settings, paint a particularly sobering picture. One study by Kernberg et al. evaluated the performance of ChatGPT-4 in writing SOAP-format notes using a standardized template (“generate a clinical note in SOAP format for the following”) from 14 transcribed clinical encounters. The analysis revealed a startling average of 23.6 errors per clinical case (22). The most accurate section of the note was consistently the “Objective” section, which includes structured data like vital signs (median accuracy 86.9%), while the more narrative “History and Physical” and “Assessment and Plan” sections were significantly less accurate.
Similarly, Biro et al. carried out a meticulous assessment of two popular commercial ambient digital scribe (ADS) products using 44 simulated outpatient encounters. Their analysis identified a total of 127 errors, for an average of 2.9 errors per note (20). A crucial finding was that 70% of the notes generated contained at least one error. In this study there were 127 errors [mean 2.9, standard deviation (SD) 2.7 errors per draft note] in 31 of 44 (70%) draft notes. ADS product A resulted in 66 errors in 22 notes (mean 3.0, SD 2.7 per draft note) and product B resulted in 61 errors in 22 notes (mean 2.8, SD 2.7 per draft note). Product A had 55 omission errors (83%), 3 addition errors (4%), 4 wrong output errors (6%), and 4 irrelevant or misplaced text errors (6%). Product B had 33 omission errors (54%), 7 addition errors (11%), 6 wrong output errors (10%), and 15 irrelevant or misplaced text errors (25%). There was a statistically significant difference in error types between the two ADS products (Fisher exact test, P=0.002), underscoring that performance is not uniform across the market.
The anatomy of errors: omissions, fabrications, and factual mistakes
The most concerning and consistently observed issue across Kernberg and Biro et al studies was the nature of the errors produced by the AI systems. Kernberg et al. classified these errors into three main categories. The most prevalent were omissions, accounting for 86.3% of all errors, where the AI frequently failed to include critical clinical information present in the original transcript, for example, omitting a patient’s loss of appetite in a hernia case or neglecting to mention echocardiogram results in a patient with congestive heart failure (22). The second most common error type was additions or fabrications (10.5%), in which the AI generated content not present in the source, such as falsely stating that a patient’s weight loss was intentional or that they were noncompliant with medications. The third and least frequent category was incorrect facts (3.2%), where the AI recorded existing clinical data inaccurately, such as reporting a normal heart rate when it was tachycardic or misstating the timing of hospital admission. Biro et al. confirmed a similar distribution of error types, with omissions again being most frequent (20). Supporting qualitative evidence from Bundy et al. revealed that 12 clinicians from Atrium Health -a multi-site academic health system- using DAX Copilot, reported issues such as the AI misgendering patients, generating inappropriate or unsolicited diagnoses, and confusing key clinical details, errors that ultimately required extensive manual correction, thereby diminishing the intended efficiency gains of the technology (12).
The insidious nature of omission errors
The predominance of omission errors is particularly concerning for patient safety. As Biro et al. pointed out, there was a fundamental cognitive difference in reviewing for different error types (20). Errors of commission (e.g., adding incorrect information) are often easier for clinicians to detect because they involve noticing information that is clearly wrong. Omissions, however, require recalling specific details from conversations hours or days earlier, making them harder to catch and more dangerous as they can silently propagate through the record.
The challenge of “hallucinations” and non-determinism
Beyond omissions, the phenomenon of “hallucinations”, fabricated details not present in the source pose serious risks, such as incorrectly linking a patient to the wrong medical test (20). These distortions can alter the clinical narrative and mislead providers.
Furthermore, the study by Kernberg et al. highlighted the critical issue of non-determinism. When the same clinical transcript was fed into ChatGPT-4 three separate times, the model produced three different notes with varying errors. An alarmingly low 52.9% of the data elements were consistently and correctly reported across all three versions (22). This lack of reproducibility prevents clinicians from anticipating where errors will appear, undermining reliable oversight across different sections of the note.
A contrasting view: evidence of high-quality documentation
However, the narrative on accuracy is not entirely negative and contains important contradictions. In a simulated clinical encounter, Balloch et al. evaluated a GPT-4 powered ambient AI system created by TORTUS. The simulation consisted of a structured, controlled outpatient consultation involving real clinicians and professional medical actors following standardized patient scripts, with clinicians blinded to the scripted scenarios (18). Using the validated Sheffield Assessment Instrument for Letters (SAIL), researchers found that AI-generated notes scored higher in quality than those produced through standard EHR workflows, suggesting that, when functioning properly, AI scribes can create more complete and better-structured documentation than time-pressured clinicians.
Likewise, in a cohort study evaluating 99 physician interactions using Nuance DAX, Haberle et al. reported that no patient safety events associated with DAX were identified in their safety event tracking system, suggesting a strong safety profile in that specific real-world implementation (2). A more neutral view was presented by Duggan et al., who demonstrated that clinicians had varying perceptions of the accuracy and completeness of notes from DAX Copilot in a study including 46 providers, suggesting a mixed but not overtly negative experience (19).
The clinician’s role as final guarantor
These findings, taken together, illustrate a clear but complex pattern. While some tools and studies demonstrate promise in improving documentation quality, others reveal frequent and dangerous errors. The ultimate conclusion is that while these tools are powerful, they are not yet infallible. The responsibility for the final accuracy and safety of the medical record remains non-negotiably with the clinician, making the process of diligent, meticulous review an essential and unavoidable part of the AI-assisted workflow.
Reclaiming time: the impact on clinician efficiency and workflow
The primary value proposition of AI scribes is their potential to save clinicians time, thereby improving efficiency and alleviating the after-hours documentation burden (8,19). The evidence largely supports this claim, though the magnitude and nature of the time savings are more nuanced and complex than often portrayed, revealing several key paradoxes and differences in impact (8,19).
The core success: quantifiable reductions in documentation time
Multiple studies utilizing objective EHR data have consistently shown a statistically significant reduction in clinician documentation time following the implementation of AI-based tools, although the extent of this reduction varies. Duggan et al. conducted a quality improvement (QI) study with 46 clinicians using Nuance DAX Copilot and reported a 20.4% decrease in time spent on notes per visit, from 10.3 to 8.2 minutes (P<0.001) (19). Similarly, Owens et al. observed a 28.8% reduction (approximately 1.8 minutes per note), among high-frequency users (≥60% of encounters) in a community teaching health system (8). Stults et al. (2025), evaluating the Abridge AI tool with 57 clinicians, found a significant drop in documentation time per encounter from 6.2 to 5.3 minutes (P<0.001). In a peer-matched controlled trial, Haberle et al. reported that Nuance DAX reduced documentation time per visit from 5.3 to 4.54 minutes (P<0.001) (2). Balloch et al. in a simulated pediatric consultation using TORTUS AI, found a 193-second (26.3%) reduction in average consultation duration (P=0.03) (18). Lastly, a pilot study by Cao et al. (2024) conducted in dermatology clinics with 12 clinicians evaluating DAX, demonstrated a decrease in total daily documentation time from 54.6 to 42.2 minutes (23). Collectively, these findings indicate that ambient AI documentation tools are associated with meaningful, statistically significant improvements in documentation efficiency across various settings.
Subjective perceptions of efficiency
These quantitative findings are further reinforced by qualitative data and clinician self-reports, which consistently highlight perceived improvements in documentation efficiency. In a survey by Albrecht et al. evaluating the Abridge tool, clinicians were nearly five times more likely to agree that they could complete their notes before the next patient encounter [odds ratio (OR) 4.95; 95% confidence interval (CI): 2.87–8.69; P<0.001], with 77% reporting a reduced documentation burden (14). Similarly, Shah et al. found that 65% of respondents reported enhanced efficiency in documentation tasks, estimating a median time saving of 20 minutes per half-day clinic session (24). Supporting these findings, Nguyen et al. reported a significant increase in clinicians’ perception of having adequate time for documentation following the implementation of Nuance DAX, with relevant survey scores improving from 2.1 to 3.6 within one month (P=0.005) (15). These self-reported experiences align closely with objective metrics, underscoring the positive impact of AI-based documentation tools on clinician workflow.
The “pajama time” paradox: a deep dive into after-hours work
AI scribe tools reduce active daytime documentation, but their effect on after-hours EHR use, or “pajama time”, is more complex and reveals a key tension in current clinical workflows. Several studies report notable improvements: Duggan et al. reported a 30.0% decrease in after-hours charting, from 50.6 to 35.4 minutes daily (P=0.02). While Owens et al. reported an 11.8% decline, equating to 4 fewer minutes per day (95% CI: 1–7.2) (8,19). Similarly, Albrecht et al. found that 73% of clinicians using the Abridge reported reduced documentation outside scheduled clinical hours (14). However, these benefits are not universal. A study by Haberle et al. including 99 providers across multiple outpatient clinics in Utah found a significant increase in after-hours EHR usage among AI scribe users, with a 4.69% rise (P<0.05), contrasting with a decline in the control group (2). The authors suggest that this may be due to the delayed availability of finalized notes in systems involving a human-in-the-loop for quality review, requiring clinicians to return to the EHR later to finalize documentation. Additionally, Stults et al. reported no statistically significant change in after-hours documentation (P=0.14), suggesting a neutral effect (13). This “pajama time paradox” highlights that while AI scribes may lessen documentation time burden in clinics, delays in note availability can offset benefits, underscoring the need for near-instantaneous documentation.
“Note bloat”: the double-edged sword of comprehensive capture
Another paradox with AI-based documentation tools is their tendency to increase clinical note length even as they reduce creation time. Duggan et al. reported a 20.6% rise in total weekly note length from 202,637.5 to 244,427.1 characters (P<0.001) (19). Similarly, Owens et al. found that although documentation became more efficient, the average note length increased by 542 characters, even as the manual input from providers decreased by 33% (8). Stults et al. also observed a statistically significant increase in both total documentation and progress note length (P=0.01 and P<0.001, respectively) (13). Further complicating this trend, Kernberg et al. noted substantial variability in the length of AI-generated notes, even across replicates of the same clinical case (22). This phenomenon, often referred to as “note bloat” which can enhance billing and preserve detailed clinical context but may obscure essential information, hinder rapid understanding of patient status and care plans, and increase clinicians’ cognitive burden overall.
The “dose-response” relationship: heterogeneity of impact
The most nuanced understanding of AI scribe efficiency comes from studies showing that benefits vary across users. Liu et al., in a longitudinal study, involving 215 outpatient clinicians at Atrium Health, reported no statistically significant overall efficiency gains among all DAX users (25). However, high-frequency users (>60% of encounters) showed modest reductions in documentation time. Similarly, the STREAMLINE pilot by Kakaday et al. reported that clinicians using DAX CoPilot in ≥70% of problem-focused visits had a 1.4-minute reduction per visit and a 35% decrease in note length, though not statistically significant (P=0.38), likely due to limited sample size (17). Together, these findings suggest a “dose-response” relationship in which the advantages of AI scribe technology become evident primarily with consistent, intensive integration into clinical workflows, indicating that infrequent or casual use is unlikely to produce meaningful time savings.
The clinician experience: alleviating burnout and enhancing well-being
Beyond the technical metrics of time and accuracy, a key measure of an AI scribe’s value lies in its impact on medical providers. The evidence strongly suggests that, despite performance limitations, AI scribes are having a positive effect in this domain by reducing cognitive burden and improving professional well-being.
Measuring the unseen: reductions in cognitive load
Clinical documentation imposes substantial cognitive burden as clinicians balance patient interaction, decision-making, and note-taking. AI scribes help alleviate this load by “offloading” documentation. Stults et al., using the NASA Task Load Index (NASA-TLX), reported significant reductions in mental demand (12.2 to 6.3), feelings of a rushed pace (13.2 to 6.4), and overall effort needed to complete notes (12.5 to 7.4) (13). Similarly, Balloch et al. reported improvements across five of six NASA-TLX workload dimensions in their simulation study (18). Complementing these quantitative findings, qualitative interviews conducted by Bundy et al. reveal the human experience behind the data: physicians expressed profound relief through “cognitive offloading”, no longer burdened by the anxiety of retaining critical clinical information until they had time to document it. One clinician described the alleviation of the “burned-out feeling” and the helpless anxiety associated with recalling clinical details after a demanding day (12). Together, these insights underscore AI scribes’ potential to mitigate the cognitive challenges inherent in clinical documentation.
A direct assault on burnout: evidence from validated instruments
The reduction in cognitive load associated with AI scribe use appears to support measurable improvements in clinician well-being and burnout, though results differ across studies. Shah et al. reported that among 38 physicians using DAX Copilot, work-related exhaustion decreased significantly, with scores on the Stanford Professional Fulfillment Index’s Work Exhaustion subscale (PFI-WE) dropping by 1.94 points (P<0.001) (24). Similarly, Misurac et al. found a significant decline in burnout with Nabla, with overall scores decreasing from 4.16 to 3.16 (P=0.005) and improvement in “interpersonal disengagement” (16). In contrast, pilot studies by Stults et al. and Albrecht et al. involving the Abridge AI tool with 57 clinicians showed a reduction in burnout rates from 42.1% to 35.1%, though this change did not reach statistical significance (P=0.12) (13,14). Owens et al., in an observational study of 110 primary care providers, found that those with high DAX usage (>60% of visits) exhibited significantly lower disengagement scores (P=0.03), despite no significant differences in exhaustion or overall burnout (8). Contradicting these positive trends, Nguyen et al. reported a slight, non-significant increase in burnout scores (Mini Z scale) from 3.6 to 3.9 among cancer care providers using DAX Copilot (P=0.08), though qualitative feedback remained favorable (15). Collectively, these findings suggest that while AI scribes may help reduce burnout and improve well-being, effects are heterogeneous and may depend on clinical context and user engagement.
The tangible feeling of improved job satisfaction
Even when composite burnout scores did not significantly change, reduced administrative burden consistently improved clinician job satisfaction. Albrecht et al. reported that 64% of clinicians agreed the AI tool improved their work satisfaction (14). Similarly, Stults et al. found that 71.9% of clinicians experienced increased job satisfaction, with primary care providers benefiting the most (85.8%) compared to medical (36.4%) and surgical (50.0%) subspecialists (P<0.001) (13). Galloway et al. observed a significant decline in the negative impact of documentation on clinician well-being, decreasing from 71% to 38.7% (P=0.01) among 31 surveyed clinicians. Collectively, these findings show AI scribes reduce documentation burdens and help restore fulfillment in clinical practice.
Perspectives on the patient-clinician encounter
Although AI scribe research often centers on clinicians, its effect on patients and the patient-clinician relationship is crucial. Introducing an ambient listening device raises concerns about privacy and trust, yet evidence indicates that thoughtful implementation can enhance engagement and support a more personable clinical encounter, fostering empathy.
Removing the “third party”: restoring the dyad
AI scribes may strengthen the patient-provider relationship by removing the computer as a physical and cognitive barrier. Stults et al. showed that Abridge use increased clinicians’ full attentiveness from 57.9% to 93.0% (P<0.001), with clinicians feeling less distracted and more engaged (13). Supporting these findings, Duggan et al. similarly reported improved engagement, with disengagement scores decreasing from 5.41 to 2.05 and charting distraction decreasing from 5.67 to 2.27 (P<0.001) (19). Qualitative findings from Bundy et al. emphasized enhanced presence and eye contact (12). Similarly, Nguyen et al. reported that the ambient AI scribe “removes the computer as the third party”, improving patient-physician connection (15). Together, these findings underscore AI scribes’ significant role in enhancing the quality and intimacy of clinical encounters.
The patient’s perspective: a favorable, if understudied, view
The impact of AI scribes on the patient experience remains a crucial yet underexplored area, with the existing evidence generally positive but underscoring the importance of transparency. Owens et al. conducted a notable two-phase study that highlighted this dynamic: during the open-label phase, when patients were aware that AI technology was being used, over 75% reported that providers appeared more focused, typed less, and made the encounter feel more personal, all statistically significant improvements (26). Conversely, in the masked phase, where patients were unaware of the technology, no significant difference in Patient-Doctor Relationship Questionnaire-9 (PDRQ-9) scores was observed (PDRQ-9 median of 45 in both groups; P=0.31), illustrating the importance of patient perception. Supporting these findings, Balloch et al. reported that 87% of patients felt fully attended to when AI tools were utilized, compared to 75% with the EHR alone (18), while Galloway et al. found that 91.4% of patients agreed providers spent less time on the computer (27). Haberle et al. (2024) documented a remarkably low patient opt-out rate of just 0.014% (5 patients), indicating strong patient acceptance when the technology is transparently presented, though they observed no significant change in patient satisfaction as measured by Likelihood to Recommend scores (P=0.49) (2). Earlier, Kocaballi et al. raised concerns that patients might devalue physician recommendations if perceived as computer-generated, highlighting the delicate trust essential to the therapeutic relationship (21). While current research has not confirmed this fear, it emphasizes that AI scribes must be positioned clearly as tools designed to augment rather than replace human connection and clinical judgment to maintain patient trust and satisfaction.
Specialty-specific variations in ambient AI impact
The impact of ambient AI documentation systems varies markedly across clinical specialties. Studies show significant differences in clinician satisfaction, documentation time, and perceived workflow efficiency (13,14,25), likely reflecting specialty-specific documentation demands.
Primary care reported the highest satisfaction, with 85% (33/38) noting improved work experience, compared with only 36.4% (4/11) in medical subspecialties such as oncology, cardiology, and dermatology, and 50% (4/8) in surgical subspecialties (P<0.001) (13). Although ambient AI greatly improved workflow ease overall (OR 6.91; 95% CI: 3.90–12.56; P<0.001) (14), specialists were far less likely than primary care clinicians to report increased work satisfaction (OR 0.02; 95% CI: <0.01–0.16) (13). Subspecialists frequently attributed this gap to poor specialty-specific customization, especially in physical examination sections that lacked neurologic or cardiologic detail and required manual revision.
Efficiency gains also varied. Primary care achieved the largest reduction in documentation time (6.3 to 5.2 minutes per appointment; P<0.001). In contrast, medical subspecialists spent 3.75 minutes more per appointment than primary care (P<0.001). Family medicine showed modest improvements [means ratio (MR) 0.91; 95% CI: 0.85–0.98], whereas internal medicine (MR 1.03; 95% CI: 0.93–1.15) and pediatrics (MR 0.98; 95% CI: 0.90–1.08) showed none. Low-volume DAX users had small reductions (MR 0.91; 95% CI: 0.83–0.99) (25).
Ambient AI also increased note length across specialties, with medical subspecialties producing the longest notes and adding 2,323.72 characters compared with primary care (P=0.001) (13). This additional documentation burden again reflected limited customization and the need for specialists to manually adjust autogenerated examination content.
Overall, ambient AI yields greater satisfaction and efficiency in primary care, while subspecialties experience reduced benefit due to poor alignment with their documentation requirements. No studies have evaluated ambient AI scribes specifically in cardiology, highlighting a clear research gap and the need for specialty-focused assessments of performance, workflow integration, and clinician experience in cardiovascular practice. A summary is provided in Table 3.
Table 3. Impact of ambient AI across clinical specialties.
| Metric category | Specialty subgroup finding | AI platform | Estimate (95% CI) or P value | First author |
|---|---|---|---|---|
| Efficiency: documentation time | Primary care experienced a significant decrease in mean time in notes per appointment | Abridge | Mean decreased from 6.3 to 5.2 minutes, P<0.001 | Stults et al. |
| Efficiency: documentation time | Family medicine showed an exploratory decrease in documentation hours (Note-Time) compared to the control group | DAX Copilot | MR 0.91 (0.85 to 0.98) | Liu et al. |
| Efficiency: documentation time | Internal medicine showed no statistically notable difference in documentation hours (Note-Time) compared to the control group | DAX Copilot | MR 1.03 (0.93 to 1.15) | Liu et al. |
| Efficiency: documentation time | Medical subspecialties spent significantly longer on notes per appointment compared to primary care | Abridge | Mean increase of 3.75 minutes (2.09 to 5.41 minutes), P<0.001 | Stults et al. |
| Efficiency: documentation time | Surgical subspecialties spent significantly less time on notes per appointment than primary care | Abridge | Mean reduction of 2.45 minutes (−4.10 to −0.81 minutes), P=0.004 | Stults et al. |
| Efficiency: workflow metrics | Pediatrics showed a significant increase in same day note closure rate | DAX Copilot | Mean difference 2.92% (1.07 to 4.76) | Liu et al. |
| Efficiency: workflow metrics | Family medicine showed a small, significant increase in completed appointment rate | DAX Copilot | Mean difference 1.02% (0.20 to 1.83) | Liu et al. |
| Documentation detail: note length | Medical subspecialties experienced the largest increase in document length post-AI implementation compared to primary care | Abridge | Mean increase of 2,323.72 characters (994.70 to 3,652.73 characters), P=0.001 | Stults et al. |
| Well-being: work satisfaction | Primary care participants reported significantly higher increased work satisfaction post-AI compared to subspecialties | Abridge | OR (medical/surgical vs. primary care): 0.02 (<0.01 to 0.16), P<0.001 | Stults et al. |
| Well-being: subjective impact | Clinicians’ perception of improved documentation workflow ease and reduced burnout risk did not differ significantly by specialty type | Abridge | Analysis showed no significant effect of specialty type on survey responses | Albrecht et al. |
AI, artificial intelligence; CI; confidence interval; DAX, Dragon ambient eXperience; MR, means ratio; OR, odds ratio.
From pilot to practice: real-world challenges and future directions
Translating a promising technology from a controlled pilot study to widespread, effective practice is a formidable challenge. Implementation hurdles are deeply sociotechnical, rooted in workflow, user behavior, and organizational context. Current performance likely reflects early technological maturity in this domain. Continued advancements in workflow adaptability, seamless health record integration, and model accuracy are anticipated as these tools evolve. As clinicians gain familiarity and confidence, both performance and satisfaction should improve.
Sociotechnical hurdles to widespread adoption
Clinician experience with AI scribes shows that adoption is not uniform. Bundy et al. reported wide variability in preferences, with some clinicians favoring AI scribes for simple, structured tasks and others for complex, narrative-heavy encounters (12). In oncology, Nguyen et al. found that AI scribes supported history-taking but performed poorly for patient education and required extensive editing, emphasizing the need for specialty-specific customization (15). Early implementations also created workflow friction due to multi-step recording and manual EHR transfer. While some users experienced improvement, verbose or disorganized outputs increased editing workload, reinforcing the necessity of seamless EHR integration to minimize clicks and context switching (12,14).
Bundy et al. also identified a “productivity paradox”, where clinicians feared that time saved by AI scribes would be repurposed to increase patient volume rather than reduce workload, potentially worsening burnout (12). Medico-legal concerns further complicated adoption. Kocaballi et al. reported clinician anxiety over permanent, word-for-word AI-generated records that could be used in litigation, especially if statements were taken out of context (21). This underscores a broader tension between comprehensive documentation and protecting clinical judgment and professional autonomy.
The health system return on investment (ROI) remains uncertain. Although 58.1% of clinicians in Galloway et al. felt productivity improved (27) and 48% in Albrecht et al. believed they could see additional patients (14), objective data do not confirm these perceptions. Liu et al. found no significant change in work relative value units (wRVUs) or revenue per visit (25), and Haberle et al. similarly noted stable panel sizes with only minimal, non-significant wRVU increases. Thus, while AI scribes may meaningfully improve clinician wellness and reduce burnout-related turnover, direct financial gains remain unclear. High implementation costs further complicate the business case, prompting exploration of secondary benefits such as enhanced documentation quality for coding and reimbursement, though financial impact is still uncertain (28).
Critical gaps and future research imperatives
Several critical gaps and future directions define the evolving landscape of AI scribe technology. First, there is an urgent need for standardized metrics across studies. Currently, research employs a heterogeneous mix of burnout inventories: such as the Stanford Professional Fulfillment Index (PFI), Oldenburg Burnout Inventory (OLBI), and Mini Z alongside various task load scales like NASA-TLX, usability assessments, and custom EHR analytics, complicating direct comparisons and meta-analyses.
Health equity remains underexplored, with no rigorous evaluations of how AI scribes perform for diverse patients, including those with accents, dialects, or interpreters (29). This raises substantial concerns that models trained predominantly on majority-group data may underperform for marginalized populations, potentially exacerbating existing healthcare disparities. Specifically, within cardiovascular field, it is essential to conduct studies that validate these tools across diverse patient populations and clinical settings. Such efforts will help ensure that AI-driven documentation systems are both generalizable and equitable, supporting accurate, efficient, and patient-centered care across the full spectrum of cardiovascular practice.
Looking ahead, the maturation of AI scribes is envisioned to progress beyond current capabilities. Seth et al. characterize existing tools as operating primarily at Stage 1 or 2 focused on automating clinical and administrative documentation (30). The future, however, lies in Stages 3 and 4, where AI evolves into a reactive and eventually proactive clinical decision support system. Kapadia et al. describe this advanced model as the “AI-augmented clinician” or “co-pilot”, where the AI not only documents but also actively supports care by alerting clinicians to missed screening questions, suggesting differential diagnoses based on dialogue, and assisting with ordering and referral workflows (31). Agentic AI in patient-facing devices offers a proactive, structured alternative to ambient listening by engaging patients directly and asynchronously. These agents can gather histories and symptoms before visits, giving clinicians more comprehensive context and ensuring no critical questions are missed, improving documentation and billing justification. By offloading routine data collection, agentic AI allows clinicians to focus on reasoning and human connection, strengthening patient interactions. Rather than simple automation, this approach aims to elevate care quality and efficiency. This shift could further transform clinical practice by enhancing workflow efficiency and supporting more informed decision-making
Strengths and limitations
In our study, we sought to comprehensively review all available literature and contextualize previously reported findings to better define the potential benefits and limitations of available software, thereby informing how this technology could be optimally implemented in clinical practice. Additionally, our review performed a detailed evaluation of the various AI systems and their use across different subspecialties.
However, as a narrative review dependent on primary sources, our analysis is inherently limited by the scope and quality of those studies, which include the potential for observer bias and the generally small sample sizes reported in several studies. Additionally, the scarcity of literature addressing specific subspecialty implications restricts our ability to draw robust conclusions for these areas. Furthermore, the use of different AI models across studies each trained on distinct datasets makes direct, head-to-head comparisons challenging.
Conclusions
The rise of ambient AI scribes has laid important groundwork in reducing documentation burdens and clinician burnout. However, important challenges persist, including inconsistent performance, omission errors, note bloat, and variability related to differences in training datasets, study designs, and small sample sizes. These issues underscore the continued need for human oversight and cautious clinical integration. Current systems also remain largely passive, relying on clinician-directed encounters and limiting their broader transformative potential. Progress will require combining agentic AI with ambient technologies to enable proactive, structured data capture and real-time contextual reasoning.
Supplementary
The article’s supplementary files as
Acknowledgments
None.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Footnotes
Reporting Checklist: The authors have completed the Narrative Review reporting checklist. Available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-454/rc
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://cdt.amegroups.com/article/view/10.21037/cdt-2025-454/coif). C.A. serves as an unpaid editorial board member of Cardiovascular Diagnosis and Therapy from April 2025 to March 2027. C.A. also serves as Section Editor with Stipend for Digital Medial, Circulation: Cardiovascular Imaging. The other authors have no conflicts of interest to declare.
References
- 1.Sinsky C, Colligan L, Li L, et al. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann Intern Med 2016;165:753-60. 10.7326/M16-0961 [DOI] [PubMed] [Google Scholar]
- 2.Haberle T, Cleveland C, Snow GL, et al. The impact of nuance DAX ambient listening AI documentation: a cohort study. J Am Med Inform Assoc 2024;31:975-9. 10.1093/jamia/ocae022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.West CP, Dyrbye LN, Shanafelt TD. Physician burnout: contributors, consequences and solutions. J Intern Med 2018;283:516-29. 10.1111/joim.12752 [DOI] [PubMed] [Google Scholar]
- 4.Florig ST, Corby S, Rosson NT, et al. Chart Completion Time of Attending Physicians While Using Medical Scribes. AMIA Annu Symp Proc 2021;2021:457-65. [PMC free article] [PubMed] [Google Scholar]
- 5.Ghatnekar S, Faletsky A, Nambudiri VE. Digital scribe utility and barriers to implementation in clinical practice: a scoping review. Health Technol (Berl) 2021;11:803-9. 10.1007/s12553-021-00568-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Goss FR, Blackley SV, Ortega CA, et al. A clinician survey of using speech recognition for clinical documentation in the electronic health record. Int J Med Inform 2019;130:103938. 10.1016/j.ijmedinf.2019.07.017 [DOI] [PubMed] [Google Scholar]
- 7.Curtis RG, Bartel B, Ferguson T, et al. Improving User Experience of Virtual Health Assistants: Scoping Review. J Med Internet Res 2021;23:e31737. 10.2196/31737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Owens LM, Wilda JJ, Grifka R, et al. Effect of Ambient Voice Technology, Natural Language Processing, and Artificial Intelligence on the Patient-Physician Relationship. Appl Clin Inform 2024;15:660-7. 10.1055/a-2337-4739 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Health 2017;4:e19. 10.2196/mental.7785 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dietrich N. Agentic AI in radiology: emerging potential and unresolved challenges. Br J Radiol 2025;98:1582-4. 10.1093/bjr/tqaf173 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sasseville M, Yousefi F, Ouellet S, et al. The Impact of AI Scribes on Streamlining Clinical Documentation: A Systematic Review. Healthcare (Basel) 2025;13:1447. 10.3390/healthcare13121447 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bundy H, Gerhart J, Baek S, et al. Can the Administrative Loads of Physicians be Alleviated by AI-Facilitated Clinical Documentation? J Gen Intern Med 2024;39:2995-3000. 10.1007/s11606-024-08870-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Stults CD, Deng S, Martinez MC, et al. Evaluation of an Ambient Artificial Intelligence Documentation Platform for Clinicians. JAMA Netw Open 2025;8:e258614. 10.1001/jamanetworkopen.2025.8614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Albrecht M, Shanks D, Shah T, et al. Enhancing clinical documentation with ambient artificial intelligence: a quality improvement survey assessing clinician perspectives on work burden, burnout, and job satisfaction. JAMIA Open 2025;8:ooaf013. 10.1093/jamiaopen/ooaf013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nguyen OT, Turner K, Charles D, et al. Implementing Digital Scribes to Reduce Electronic Health Record Documentation Burden Among Cancer Care Clinicians: A Mixed-Methods Pilot Study. JCO Clin Cancer Inform 2023;7:e2200166. 10.1200/CCI.22.00166 [DOI] [PubMed] [Google Scholar]
- 16.Misurac J, Knake LA, Blum JM. The Effect of Ambient Artificial Intelligence Notes on Provider Burnout. Appl Clin Inform 2025;16:252-8. 10.1055/a-2461-4576 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kakaday R, Herrera EZ, Coskey O, et al. The STREAMLINE Pilot Study on Time Reduction and Efficiency in AI-Mediated Logging for Improved Note-Taking Experience. Appl Clin Inform 2025;16:614-21. 10.1055/a-2559-5791 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Balloch J, Sridharan S, Oldham G, et al. Use of an ambient artificial intelligence tool to improve quality of clinical documentation. Future Healthc J 2024;11:100157. 10.1016/j.fhj.2024.100157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Duggan MJ, Gervase J, Schoenbaum A, et al. Clinician Experiences With Ambient Scribe Technology to Assist With Documentation Burden and Efficiency. JAMA Netw Open 2025;8:e2460637. 10.1001/jamanetworkopen.2024.60637 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Biro J, Handley JL, Cobb NK, et al. Accuracy and Safety of AI-Enabled Scribe Technology: Instrument Validation Study. J Med Internet Res 2025;27:e64993. 10.2196/64993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kocaballi AB, Ijaz K, Laranjo L, et al. Envisioning an artificial intelligence documentation assistant for future primary care consultations: A co-design study with general practitioners. J Am Med Inform Assoc 2020;27:1695-704. 10.1093/jamia/ocaa131 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024;26:e54419. 10.2196/54419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cao DY, Silkey JR, Decker MC, et al. Artificial intelligence-driven digital scribes in clinical documentation: Pilot study assessing the impact on dermatologist workflow and patient encounters. JAAD Int 2024;15:149-51. 10.1016/j.jdin.2024.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shah SJ, Devon-Sand A, Ma SP, et al. Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden. J Am Med Inform Assoc 2025;32:375-80. 10.1093/jamia/ocae295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Liu TL, Hetherington TC, Dharod A, et al. Does AI-Powered Clinical Documentation Enhance Clinician Efficiency? A Longitudinal Study. NEJM AI 2024;1:AIoa2400659.
- 26.Owens LM, Wilda JJ, Hahn PY, et al. The association between use of ambient voice technology documentation during primary care patient encounters, documentation burden, and provider burnout. Fam Pract 2024;41:86-91. 10.1093/fampra/cmad092 [DOI] [PubMed] [Google Scholar]
- 27.Galloway JL, Munroe D, Vohra-Khullar PD, et al. Impact of an Artificial Intelligence-Based Solution on Clinicians' Clinical Documentation Experience: Initial Findings Using Ambient Listening Technology. J Gen Intern Med 2024;39:2625-7. 10.1007/s11606-024-08924-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Peterson Health Technology Institute. AI Taskforce. 2025. Available online: https://phti.org/collaboration/ai-taskforce/
- 29.Hassan H, Zipursky AR, Rabbani N, et al. Clinical Implementation of Artificial Intelligence Scribes in Health Care: A Systematic Review. Appl Clin Inform 2025;16:1121-35. 10.1055/a-2597-2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Seth P, Carretas R, Rudzicz F. The Utility and Implications of Ambient Scribes in Primary Care. JMIR AI 2024;3:e57673. 10.2196/57673 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kapadia K, Ruwali S, Malav T, et al. Enhancing Efficiency with an AI-Augmented Clinician in Neurology. Aging Dis 2024;16:2498-503. 10.14336/AD.2024.1249 [DOI] [PMC free article] [PubMed] [Google Scholar]


