Abstract
Objective
To assess the impact of the use of an ambient listening/digital scribing solution (Nuance Dragon Ambient eXperience (DAX)) on caregiver engagement, time spent on Electronic Health Record (EHR) including time after hours, productivity, attributed panel size for value-based care providers, documentation timeliness, and Current Procedural Terminology (CPT) submissions.
Materials and Methods
We performed a peer-matched controlled cohort study from March to September 2022 to evaluate the impact of DAX in outpatient clinics in an integrated healthcare system. Primary outcome measurements included provider engagement survey results, reported patient safety events related to DAX use, patients’ Likelihood to Recommend score, number of patients opting out of ambient listening, change in work relative values units, attributed value-based primary care panel size, documentation completion and CPT code submission deficiency rates, and note turnaround time.
Results
A total of 99 providers representing 12 specialties enrolled in the study; 76 matched control group providers were included for analysis. Median utilization of DAX was 47% among active participants. We found positive trends in provider engagement, while non-participants saw worsening engagement and no practical change in productivity. There was a statistically significant worsening of after-hours EHR. There was no quantifiable effect on patient safety.
Discussion
Nuance DAX use showed positive trends in provider engagement at no risk to patient safety, experience, or clinical documentation. There were no significant benefits to patient experience, documentation, or measures of provider productivity.
Conclusion
Our results highlight the potential of ambient dictation as a tool for improving the provider experience. Head-to-head comparisons of EHR documentation efficiency training are needed.
Keywords: Dragon Ambient eXperience, AI documentation, generative AI, ambient listening technology, provider engagement
Background and significance
The burden of EHR use, especially documentation, has been repeatedly cited as a top contributor to physician burnout, career dissatisfaction, and attrition from the practice of medicine.1 EHR inefficiency reduces physician productivity, interferes with maintenance of top-of-license work, and results in impaired work-life balance; after-hours and evening documentation has been implicated as a source of burnout and lower engagement.2 In 2021, Intermountain increased the expectations on the volume of care of providers. Aside from EHR personalization or mastery coaching and efficiency training,3 few enhancements to provider efficiency had been deployed to directly support these greater care volume expectations.4 To address this need, Intermountain performed a market assessment of available ambient listening/digital scribing services. At the time, ambient listening technology services that translated patient encounters into a medical document leveraging human effort and artificial intelligence with longstanding business maturity were few. Most vendors offering comparable services exclusively leveraged human scribes. Nuance was selected as a vendor for their Dragon Ambient eXperience (DAX) solution to save the provider time spent in documentation due to the maturity of their product offering, acquisition by Microsoft, and historical relationship with Intermountain Health.
Objective
The purpose of this study was to evaluate the impact of the implementation of an ambient listening/digital scribing solution (Nuance DAX) on provider engagement, productivity, panel size, documentation (including time spent in the EHR after hours), and coding timeliness. We also aimed to measure the impact to patient safety and likelihood to recommend (LTR) by patients.
Materials and methods
We performed a cohort study to measure the impact of the implementation of Nuance DAX on provider engagement, productivity time spent in the EHR, time spent in the EHR after hours, documentation and procedure coding timeliness as well as patient LTR. The study was approved by the Intermountain Health Institutional Review Board (IRB No. 1052062). We adhered to best practices for reporting cohort studies.5
Product description and workflow overview
Nuance DAX leverages ambient listening, conversational AI, and generative AI technology to create outpatient provider clinical documentation. A workflow description of the product follows: The Nuance DAX iOS app is leveraged on a provider’s iPhone. As a patient is being seen in clinic, the patient is selected by the provider from an interfaced clinic schedule within the app, and a recording is started. The iPhone’s voice recording hardware, together with the Nuance DAX app software, captures the ambient natural conversation as a voice recording. When the recording is complete, it is sent to the Microsoft/Nuance cloud for conversational AI parsing. From the output of the conversational AI, a generative AI-derived draft note is made available to Nuance and their human quality reviewers. These human quality reviewers edit the rough AI-generated draft using both the parsed text from the conversational AI output and the voice recording as needed. The human reviewer then enters the provider’s EHR and pastes a copy of the edited draft into an EHR note for the provider. It is then routed or saved to the provider’s attention for final edit and signature.
Setting
Intermountain Health (Intermountain) is a 33-hospital integrated health system, ranging from rural to quaternary care centers, a children’s hospital and 385 ambulatory practices across seven states. This study was completed in a subset of clinics in the Intermountain Medical Group in Utah which used Oracle Cerner as their EHR from March 2022 to September 2022.
Participants
The objective of this study was to recruit 100 study providers to detect a 5% difference in work relative values unit (wRVU)-based productivity. These were randomly selected from adult and pediatric primary care, orthopedics, sports medicine, allergy, endocrinology, rheumatology, cardiology, neurology, neurosurgery, OB/GYN, oncology, urology, otolaryngology, and psychiatry. Unused licenses from those initially selected were then made available through self-referral. A peer-matched control cohort was identified using the following methodology created via expert opinion by the authors: Control providers were selected by categorizing them by specialty into 5% increment productivity groups and then randomly matching them in a 1:1 fashion to a study participant of the same categorized productivity. Where control providers of matching specialty productivity were scarce, two study participants were matched to a single control.
To be included in the study, participants needed to attend a scheduled group virtual training, procure an iPhone, and complete onboarding activities. Providers who currently used a human scribe or did not see patients in a clinic setting were excluded. For providers who were self-referring for participation, preference for inclusion was given to providers who had higher volumes of patient encounters.
Patient engagement and communication
At patient intake, medical assistants use a standard script with the patients to explain the purpose of the study and provide them with clear and easy-to-understand information about their right to opt-out at each visit. The number of patients who opted-out each business day was reported by clinical staff using an electronic data submission tool.
EHR configuration and nuance quality documentation specialist provisioning
Before implementing DAX, we had to prepare the EHR by designing and implementing new note types, templates, and workflows for providers. To simplify the process and ensure scalability for the future, we created a single standard note type and template to be used across all specialties for notes created using DAX.
Also, a new role had to be created in the EHR for the Nuance Quality Documentation Specialists who prepared and sent the document to the provider for review and final signature. We made sure that this role had the appropriate authorship privileges and patient access restrictions configured.
Device procurement, connectivity, and interfacing
To use Nuance DAX, providers needed an iOS device running iOS12+ and access to the internet. The project team recommended to participants the use of an iPhone version 6 or higher. Participants used the guest Wi-Fi network or cellular service.
Nuance DAX requires patient scheduling information from our practice management application. Intermountain IT teams designed and built an integration solution that would transfer a provider's daily schedule to the Nuance DAX app.
Provider and staff training, onboarding, and support
Intermountain and Nuance jointly managed onboarding, training, and support for participants, including introductory townhalls, an eLearning module, an online survey to understand workflow and documentation preferences, and one-on-one virtual meetings for pre-, go-live, and post-go-live support, training, and feedback. Intermountain provided training for clinical staff and practice management, including general virtual training for medical assistant workflow and specific training for opt-out conversations with patients who were not interested in using DAX. Durable training materials, including videos, were created and shared.
Outcome measurement and data sources
Ten outcome measures were selected. Data were collected during the measurement period of March to July 2022, with some historical data provided for comparison.
To measure impact on provider engagement, we utilized Press Ganey Workforce Engagement surveys sent in September 2021 and May 2022. Press Ganey measures engagement via a six-item index, with result reported on a 5-point Likert scale (1 = Disengaged, 5 = Highly engaged).6 Values were provided for all employed physician respondents during the two surveys, with a breakout of both Primary Care provider scores and DAX participant scores for the May 2022 survey. The report contained a count of respondents, and a score for the following categories: engagement, safety, resilience—decompression and work-life balance.
Provider productivity was measured by wRVU utilizing a standard operational report containing monthly wRVU values, 2022 wRVU YTD (year to date), YTD Annualized, and target wRVUs and a calculated percentage of 2022 Target wRVU’s annualized. To understand impact on attributed panel size for value-based care (VBC) providers, we used data from a monthly report produced by Castell (Intermountain Health Care Population Health Services Organization) that contained a count of risk lives for each VBC provider.
To assess risk to documentation timeliness and Current Procedural Terminology (CPT) submission associated with DAX utilization, a monthly report was created containing total encounters for each provider (cohort and DAX participant) and a count of encounters with documentation (or charges) submitted within 24 hours of patient check-out time.
To measure patient experience, we gathered a monthly LTR report containing monthly LTR percentages from patient surveys administered by Press Ganey for each provider from April to July 2022. We also tracked the number of patients who opted-out of having DAX used for their encounter by having clinical staff using an electronic data submission tool.
We used Oracle Cerner Advance standard reports to obtain data necessary to measure impact on electronic health record usage (percentage of time spent after hours and average documentation time per patient). After-hours time was defined as any meaningful EHR activity between the hours of 6 pm and 6 am. We received a daily report from Nuance detailing the average turnaround time for notes, as well as monthly participant DAX utilization.
Statistical methods
We performed a power analysis to determine a sample size of 98 test participants needed to detect a 4% increase in provider productivity (increase in wRVU values) with 80% power (α = 0.05, β = 0.20).
Mixed-effects regression models were used to compare outcomes between groups (DAX participants vs cohort). Where the outcomes were binary, logistic models were used. Where outcomes were counts, Poisson regression models were used. For after-hours time, a linear mixed-effects regression was used. The random effects for the mixed models corresponded to individual providers. We did that to allow for differences between the individual providers and account for a correlation within the providers. All regressions compared January-February (pre-DAX) to March-July (DAX) across groups (participant vs cohort). In the case of mean baseline to conclusion comparisons, paired t-tests were used. All tests were two tailed using P ≤ .05 for significance.
Results
General characteristics of DAX participants and DAX utilization
A total of 99 providers enrolled in the study and were included in our measurements; 99 control group providers were included for analysis. Of the initial 190 targeted providers for participation, only 55 (28.9%) met inclusion criteria and 5 (2.6%) were excluded. The most common reason for lack of inclusion was general interest and willingness to attend an initial virtual training. All excluded participants were currently using a scribe. The remaining 44 available licenses were then provided to those willing to self-nominate for participation in the study and were offered on a first come, first-serve basis and with specific leadership sponsorship in some circumstances. There were 87 self-nominated providers, and all 44 available licenses were allocated to the first providers who met inclusion and exclusion criteria, with preference given to those with higher volumes of clinic appointments, on a first come, first-serve basis.
Of the providers enrolled, 42 were family and internal medicine providers, 15 were orthopedic and sports medicine providers, 14 were pediatric primary care providers, 11 were from other surgical, interventional, cardiology, neuroscience, OB/GYN, and oncology specialties, 5 were from medical specialties (allergy, endocrine, rheumatology), and 1 was a psychiatrist focused on treating common behavioral health conditions in the adult and pediatric primary care setting (Table 1).
Table 1.
Participants by specialty.
Clinical specialty | Number of participants (%) |
---|---|
Adult and pediatric primary care | 56 (56.6) |
Orthopedics and sports medicine | 15 (15.2) |
Surgical, cardiac, interventional, neuroscience, OB/GYN, oncologic, and women’s health | 11 (11.1) |
Medical specialties (endocrine, rheumatology, etc.) | 5 (5.1) |
Psychiatry | 1 (1.0) |
Over the course of the study, 9 providers (9.1%) stopped using DAX completely and 32 (32.3%) providers were had their license removed due to low utilization (defined as less than 20% average utilization over the final 4 weeks of the study). Median utilization of DAX for clinical encounters at conclusion was 47% among active participants. All providers who stopped using DAX or had low utilization were included in the result analysis.
Provider experience
In a Press Ganey survey sent in September of 2021, the provider engagement score for all employed providers was 3.90 (1.00-5.00). Caregiver engagement scores for all employed providers decreased modestly to 3.83 in May of 2022. When comparing responses in May of 2022 for those participating in the DAX study to all employed adult primary care providers, we saw higher scores in engagement (3.62 vs 3.37), safety (4.16 vs 3.92), resilience/decompression (2.83 vs 2.81), and work-life balance (3.14 vs 2.90).
The mean percent time spent in after-hours (6 pm to 6 am) EHR work for the study group was 14.2% compared to the control group of 14.9%. At study conclusion, the after-hours time increased by 4.69% with significance (P < .05) while decreasing 0.945% for the control group (Table 2). The time spent in documentation per patient within the EHR at baseline among the study and control groups was 5.3 min per patient and 5.5 min per patient, respectively. At study conclusion, only the DAX participants saw a statistically significant decrease in time (Table 2).
Table 2.
Outcomes of DAX vs control groups.
Outcome description | DAX providers |
Control providers |
||||
---|---|---|---|---|---|---|
Baseline | Conclusion | Significance | Baseline | Conclusion | Significance | |
Productivity (% annualized wRVU) | 90.6 | 94.2 | <0.001 | 91.6 | 95.3 | 0.051 |
Documentation EHR time (min/patient) | 5.3 | 4.54 | <0.001 | 5.5 | 5.35 | 0.20 |
24-h documentation deficiency rate (%) | 9.9 | 6.3 | <0.001 | 8.1 | 5.9 | <0.001 |
24-h CPT submission deficiency rate (%) | 24.0 | 30.3 | <0.001 | 23.5 | 29.3 | 0.003 |
Patient experience
The baseline mean for patient’s LTR score of the DAX group was 86.3% compared to the control baseline mean of 86.1% with no statistical significance detected between baseline and conclusion (P = .91 and P = .57, respectively) (Table 2). A mixed-effects logistic regression was fit with no significant difference found across months or across DAX participation (P = .4985).
The total number of patients opting out of having DAX used for their clinic encounter was 5, which represented 0.014% of total DAX encounters.
Productivity and panel sizes
The study had 80% power to detect a 4% increase in physician productivity after deployment with a size of 98 study participants. The productivity assessment for the study group demonstrated a baseline mean of 90.6% of projected wRVUs annualized vs the control group mean of 91.6%. At the conclusion of the study, the DAX participants had a mean projected wRVU annualized of 94.2% vs the control group mean of 95.3% (a 1% difference). There was a statistically significant increase in the wRVU productivity of the study group (P < .001). The difference measured in the control group was not significant (P = .05) (Table 2).
For VBC-specific primary care providers, the mean panel size of the study vs control cohorts at baseline was 776 and 752, respectively. At the end of the study, these means were 773 and 742. A model fit comparing baseline to study conclusion does not show a significant difference (P = .17) (Table 2).
Clinical and financial risk
There were no patient safety events reported in our safety event tracking system related to DAX.
At baseline, our study group completed their clinical documentation with a 24hr deficiency rate of 8.6% compared to the control cohort of 7.7%. At the conclusion of the study, the respective 24 h documentation deficiency rate was 6.3% for the study group (P < .001) vs 5.9% (P < .001) for the control. Results of a mixed-effects regression model showed no statistically significant interaction or effect of DAX participation on this 24 h deficiency rate, despite statistically significant improvements in both groups over the course of our study (P = .322 and P = .066).
The CPT submission deficiency rate for the study group was 27.9% at baseline and 30.0% at study conclusion (P < .001), where the CPT submission deficiency rate for the control group was 27.5% at baseline and 29.34% at conclusion (P = .003). Results of a mixed-effects regression model show there is not a statistically significant interaction or effect of DAX participation on CPT submission rates (P = .57 and P = .51).
An expert internal audit by the Intermountain Medical Group Professional Coding & Reimbursement team and the Castell Clinical Documentation Integrity teams did not find any meaningful evidence of over-coding, missed risk-adjustment opportunity, or insufficient supporting documentation.
Discussion
The scientific literature describes the technical challenges of ambient listening AI clinical documentation and proposes scoping guidelines and ethical use of such.7–11 However, we find this to be the first published study describing the use and practical outcomes of ambient listening AI documentation in real-world outpatient clinical care—and at modest scale.
The use of Nuance DAX in the Intermountain Medical Group had no quantifiable effect on patient safety, LTR, VBC-attributed panel sizes for primary care providers, or coding and documentation risks or benefit. We found positive trends in provider engagement, while non-participants saw worsening engagement, and a statistically significant increase in productivity, although the change was small and of no practical significance. This may be due, in part, to a lack of incentive or compulsion to increase productivity beyond the expectations set in 2021.
Our study included the results of low utilizers of DAX and those who removed themselves from this study; this “intention to treat” approach may be dampening the positive effects seen in outcomes related to engagement. It may also have impacted observed effects on outcomes related to productivity, and 24-h documentation and CPT deficiency rates.
As none of the provider notes utilized the direct results of generative AI as their primary source for their drafted document at the conclusion of the study (which included a delay in receiving the note due to the human editor review), it is possible that more benefit in engagement will be seen as the quality and delivery time of the notes improves and becomes more consistent over longer periods of time. Advances in direct use of generative AI that reduce or eliminate the need for non-clinician human editors may also improve engagement by increasing overall satisfaction with the workflow and product, but more studies are needed as those technologies rapidly advance.
We saw utilization of DAX at a median of nearly 50% use for all potential encounters. While this appears on its surface as problematic utilization, caution is warranted. Some highly scripted, templated, and complex visits (Medicare Annual Wellness Visits, Well-Child Visits, visits wherein highly templated documentation can be used) may not be ideal candidates for DAX use, and conscientious discrimination and best practice of how and when to use DAX for a given encounter should be left to the individual provider, with general guidance on how, if desired, to approach these visits using DAX.
Our study showed a statistically significant worsening of after-hours EHR use, albeit a small amount (2.6% more-time spent after-hours than baseline, ∼4% more than control). This amount may not be pragmatically relevant but is an important finding. We speculate that DAX may be facilitating less time spent on documentation during business hours, but increasing the time after-hours (eg, logging into the EHR after-hours to edit and sign their returned note). While providers were educated to the delay in receiving their draft note from Nuance (maximum of 4 business hours after submission), we allowed providers to choose how to best integrate their “delayed” note into their workflow, whether via submission after-hours or the next business day. Again, we speculate that this “after-hours editing/signature” workflow is the underlying cause of this increase in observed after-hours time.
The time spent in documentation for DAX providers was also statistically lower, but time saved per patient was less than 1 min on average.
Limitations
Our study had several important limitations. First, our study did not aim to define the qualitative differences in the provider who would benefit from DAX. Further studies that measure and match control providers by engagement, rather than by productivity, may help differentiate qualitative factors that allow health systems to focus this investment to the providers who will benefit the most.
Our study was designed to analyze the impact of DAX by comparing outcome measures before and immediately after implementation of DAX. During implementation, factors such as note turnaround time, provider attrition, and scaling adoption could have introduced bias in these results. Further study would be needed to understand the impact during early adoption vs late adoption phases. Additionally, our study measured the effectiveness of a single intervention; however, further studies would be needed to understand how this compares with lower-cost interventions, such as workflow optimization and personalization training.
Our patient’s experience assessment was limited to LTR scores and opt-out rates.
As mentioned, we had difficulty doing statistically robust assessments of provider engagement. Survey data were unavailable for our control group. Therefore, a direct comparison between study participant group and control was not feasible. A significant portion of the DAX participants were self-nominated and therefore might represent a group with tendencies toward higher engagement that were not controlled for in this study.
Our study was weighted with adult primary care and orthopedic/sports medicine physicians, which limits extrapolation of results outside of those specialties. While there are EHR efficiency data to suggest some benefit in documentation time for specific specialties, no conclusions can be drawn due to low sample and effect size.
Lastly, although we saw statistical significance in the productivity comparisons of DAX providers to the control group, the difference was small enough to question the power of the study to suggest a strong conclusion and the practical implications of the effect.
Conclusion
Nuance DAX appears to have no benefit in productivity for fee-for-service providers and no improvement in total panel size for those delivering value-based primary care. We saw modest positive trends in engagement between DAX users and non-DAX users, but the limitations in our available control data challenge any strong conclusion. Nuance DAX posed no risk or benefit to patient experience, safety, or clinical documentation. Further studies to evaluate the impact of more advanced technology (DAX Co-Pilot) and head-to-head comparisons of impact of personalization and efficiency training vs Ambient AI documentation are needed. If clinical or financial benefits are seen in these studies, additional qualitative research on the benefiting provider would be valuable.
Contributor Information
Tyler Haberle, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Courtney Cleveland, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Greg L Snow, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Chris Barber, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Nikki Stookey, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Cari Thornock, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Laurie Younger, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Buzzy Mullahkhel, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Diego Ize-Ludlow, Digital Technology Services, Intermountain Health, Salt Lake City, UT 84120, United States.
Author contributions
TH, CC, GS, CB, BM, and DI contributed to the conception and design of this study. All authors participated in the initial drafting and revisions of the work; approved the final version; and accept accountability for the overall integrity of the research process and the article.
Funding
None declared.
Conflicts of interest
None declared.
Data availability
The data underlying this article will be shared on reasonable request to the corresponding author.
References
- 1. Gesner E, Gazarian P, Dykes P.. The burden and burnout in documenting patient care: an integrative literature review. Stud Health Technol Inform. 2019;264:1194-1198. [DOI] [PubMed] [Google Scholar]
- 2. Saag HS, Shah K, Jones SA, et al. Pajama time: working after work in the electronic health record. J Gen Intern Med. 2019;34(9):1695-1696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Next-Level EHR Mastery Coaching—Arch Case Study. Accessed July 22, 2023. https://klasresearch.com/archcollaborative/casestudy/next-level-ehr-mastery-coaching/378
- 4. Hilliard RW, Haskell J, Gardner RL.. Are specific elements of electronic health record use associated with clinician burnout more than others? J Am Med Inform Assoc. 2020;27(9):1401-1410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med. 2007;147(8):573-577. [DOI] [PubMed] [Google Scholar]
- 6. Making Data Simple: Introducing an Easier Way to Understand Employee Engagement. Press Ganey. https://info.pressganey.com/press-ganey-blog-healthcare-experience-insights/making-data-simple
- 7. Lin SY, Shanafelt TD, Asch SM.. Reimagining clinical documentation with artificial intelligence. Mayo Clin Proc. 2018;93(5):563-565. [DOI] [PubMed] [Google Scholar]
- 8. Quiroz JC, Laranjo L, Kocaballi AB, Berkovsky S, Rezazadegan D, Coiera E.. Challenges of developing a digital scribe to reduce clinical documentation burden. NPJ Digit Med. 2019;2:114- [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Bravo J, Cook D, Riva G.. Ambient intelligence for health environments. J Biomed Inform. 2016;64:207-210. [DOI] [PubMed] [Google Scholar]
- 10. Ng ZQP, Ling LYJ, Chew HSJ, Lau Y.. The role of artificial intelligence in enhancing clinical nursing care: a scoping review. J Nurs Manag. 2022;30(8):3654-3674. [DOI] [PubMed] [Google Scholar]
- 11. Martinez-Martin N, Luo Z, Kaushal A, et al. Ethical issues in using ambient intelligence in health-care settings. Lancet Digit Health. 2021;3(2):e115-e123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data underlying this article will be shared on reasonable request to the corresponding author.