Abstract
Patient portals are widely available to facilitate self-service interactions, including appointment booking, offering convenience to users. Promoting clinically appropriate care pathways, however, is complex; portals must recognize patient intent while striving for an intuitive experience. In October 2024, the Southern California Permanente Medical Group, which serves 4.9 M patients, deployed the Kaiser Permanente Intelligent Navigator (KPIN), a system that augments the patient portal to enhance care navigation and the patient experience. It applies natural language processing to generate alerts for high-acuity cases, and it recommends suitable care offerings. Early findings are promising, with the AUC for clinical alerts and clinical navigation at 0.977 and 0.889, respectively. KPIN’s adjusted successful booking rate was 53.68%, with an abandonment rate of 2.94% (IQR: 2.77–3.11%), aligning with patient survey results showing an 8.63 percentage point increase for positive sentiment. These results highlight the success of KPIN in an integrated value-based care delivery model.
Subject terms: Health services, Technology
Introduction
Healthcare organizations continue to adapt to an increasingly consumer-centric environment, one in which patients prefer convenience and personalization, including when engaging the digital front door1–3. Online patient portals are now a mainstay for care navigation, offering patients the ability to access services, communicate with providers, and self-schedule appointments. Patients expect a digital experience that is characterized by the user-friendliness they already enjoy in other contexts, such as booking flights and reserving hotel rooms4,5. However, because patient safety is paramount, patient portals must ensure that ease of use in the interface design does not compromise user well-being.
In 2021 the Southern California Permanente Medical Group (SCPMG) (hereinafter, “SCPMG” or “the organization”), a care delivery arm of Kaiser Permanente, created a multifaceted Virtual Safety Net (VSN) ecosystem to address safety concerns associated with patients self-scheduling appointments via the online patient portal and through the telephonic interactive voice response (IVR) system6. Driven in part by natural language processing (NLP) technology, the VSN ecosystem demonstrated success as a pilot, including multiple occasions where a patient’s life was saved, but the prospect of scaling it was met with prohibitive barriers. Limited accuracy in detecting high-acuity cases stood in the way of expansion at the digital front door. Additionally, for low-acuity online engagements, the VSN ecosystem left unaddressed the need to better align care offerings with patients’ intents and clinical appropriateness. Bridging this gap would require a more sophisticated tool.
The pursuit of a solution to overcome these challenges led to the Kaiser Permanente Intelligent Navigator (KPIN) platform; it represents a significant advancement in the organization’s care navigation infrastructure which adopts an omnichannel approach7. KPIN takes patient-submitted free text input and detects high-acuity cases as part of its clinical alert system (CAS); then it recommends booking options for timely and appropriate care as part of its clinical navigation system (CNS). It thus supports improved patient outcomes by reducing delays in care delivery and helps to optimize the allocation of healthcare resources, by leveraging emerging technologies8–10.
This manuscript presents the design, deployment, and evaluation of KPIN as an enhanced NLP-driven care navigation system, one that supplants the more limited technology previously supporting the online channel. We demonstrate the potential of KPIN to enhance care coordination, which promotes safety and efficiency, and to provide a positive patient experience at the same time.
These facets of care delivery reflect the mission of Kaiser Permanente as an integrated value-based healthcare system11. Operating 16 hospitals and 197 medical office buildings in Southern California, it provides a comprehensive, all-inclusive bundle of health services to its 4.9 M patients. With over 8000 physicians who prioritize care quality and the long-term well-being of patients, Kaiser Permanente is different from most U.S. healthcare organizations, as the latter typically adopt a volume-driven, fee-for-service model.
Results
KPIN replaces Microsoft’s Language Understanding Intelligence System (LUIS) technology and the VSN Query Report system for the online component of SCPMG’s omnichannel solution for care navigation. See Fig. 1 for an overview of the omnichannel strategy; it illustrates how the organization manages multiple patient-facing channels, all of which are guided by common databases to ensure operational consistency. KPIN’s roll-out to 4.9 M SCPMG patients began October 1, 2024, supporting the patient portal’s appointment booking workflow. KPIN matches the patient’s reason for visit with appropriate care offerings, while further streamlining the user-friendly interface. The IVR channel, another mechanism for self-booking, continues to be supported by the VSN Query Report system; and the Agent channel remains open, where a robust customer relationship management tool supports the organization’s live agents.
Fig. 1. Overview of the Omnichannel strategy.
KPIN supports the organization’s online channel, which is complemented by the Agent-assisted channel; while online self-service functionality encompasses an expansive set of options, live Agents are available telephonically (24 h a day, 7 days a week) to assist with care navigation. Integration with common databases ensures all channels adhere to consistent guidelines.
The patient portal explicitly advises patients not to seek emergency care through the online booking platform. It instructs them to call 911 or visit the nearest hospital for medical or mental health emergencies. These advisory statements appear on the interface before patients interact with KPIN. We recognize that patients may not adhere to such instructions, a point which underscores the importance of KPIN’s CAS.
Patient demographics
Table 1 summarizes the demographics of the unique 1,045,904 patients that interacted with KPIN from October 1, 2024, to March 1, 2025. 35.47% of patients fell in the 30–49 age bracket, which was the largest by volume. The 18–29 and 50–64 age brackets captured 17.08% and 16.73% of patients, respectively; and 12.52% were 65 or older, which suggests a narrowing of a digital divide impacting the elderly12. 59.93% of patients were female, 40.01% were male, and 0.06% identified as non-binary or preferred not to specify their gender. For ethnic groups, the Hispanic segment was the largest with 36.93%, followed by the Caucasian population at 30.04%, and other ethnic groups made up the remaining 33.03%.
Table 1.
Demographic Information for unique KPIN users (n = 1,045,904)
| Age (Years) | Count | Patients (%) |
|---|---|---|
| <18 | 190,400 | 18.20% |
| 18–29 | 178,576 | 17.08% |
| 30–49 | 370,937 | 35.47% |
| 50–64 | 174,989 | 16.73% |
| 65+ | 131,002 | 12.52% |
| Gender | ||
|---|---|---|
| Male | 418,355 | 40.01% |
| Female | 626,961 | 59.93% |
| Other | 588 | 0.06% |
| Ethnic Background | ||
|---|---|---|
| Hispanic/Latino | 386,198 | 36.93% |
| White/Caucasian | 314,201 | 30.04% |
| Asian/Pacific Islander | 123,564 | 11.81% |
| Black/African Americans | 78,465 | 7.50% |
| Other/Decline to state | 143,476 | 13.72% |
Model Performance
The performance of KPIN’s models, including those supporting CAS and CNS, are robust across all pathways. Table 2 summarizes the performance metrics for the CAS models. For detecting high-acuity symptoms, such as chest pain, the accuracy is 96% (95% CI: 93.7–98.0%), with precision and recall of 97.5% (95% CI: 95.8–99.0%) and 96.0% (95% CI: 93.8–97.9%), respectively. The F1-Score is 96.7% (95% CI: 94.5–98.5%). Table 3 presents the performance metrics for the CNS models, which achieves an accuracy of 81.9% (95% CI: 80.0–83.6%), a precision of 85.6% (95% CI: 84.0–87.2%), a recall of 81.9% (95% CI: 80.1–83.7%), and an F1-Score of 82.8% (95% CI: 81.1–84.5%). Figure 2 illustrates the AUC values for the CAS models and the CNS models at 0.977 and 0.889, respectively.
Table 2.
Model metrics for the CAS
| Metric | Value | 95% CI (Lower, Upper) |
|---|---|---|
| Accuracy | 0.960 | (0.937, 0.980) |
| Precision | 0.975 | (0.958, 0.990) |
| Recall | 0.960 | (0.938, 0.979) |
| F1-Score | 0.967 | (0.945, 0.985) |
Table 3.
Model metrics for the CNS
| Metric | Value | 95% CI (Lower, Upper) |
|---|---|---|
| Accuracy | 0.819 | (0.800, 0.836) |
| Precision | 0.856 | (0.840, 0.872) |
| Recall | 0.819 | (0.801, 0.837) |
| F1-Score | 0.828 | (0.811, 0.845) |
Fig. 2. Receiver Operating Characteristic (ROC) Curves for the Clinical Alert System (CAS) and the Clinical Navigation System (CNS).
The ROC curves illustrate the performance of CAS and CNS in distinguishing between relevant clinical outcomes. The area under the curve (AUC), a measure of overall model performance, is 0.977 (red solid line) for CAS and 0.889 (blue solid line) for CNS. The black dashed line represents the performance of a random classifier. Red Solid Line – CAS (AUC: 0.977). Blue Solid Line – CNS (AUC: 0.889). Black Dashed Line – Random Classifier.
The performance of the CAS models highlights their effectiveness in identifying potential critical cases and prompting patients to respond to questions before completing their booking. This high accuracy stems from rigorously defined critical case criteria, developed in collaboration with physician subject matter experts. This targeted approach aligns with literature emphasizing the importance of early identification of specific critical cases in such systems13.
In contrast, the CNS models are designed for broader generalization, trained to handle more diverse patient inputs and clinical pathways within the booking experience. This adaptability is crucial for maintaining real-world performance in care navigation on the digital channel14,15.
Utilization
From October 1, 2024, to March 1, 2025, KPIN facilitated 2,969,945 encounters. An encounter is defined as an interaction with KPIN, which begins when a patient enters and submits a ‘Reason for Visit,’’ as depicted in Supplementary Fig. 1. On average, KPIN processed 19,154 encounters per day, with the highest daily volume reaching 39,364 encounters. After submitting a ‘Reason for Visit,’’ a user may be presented with additional questions, to refine KPIN’s identification of the most appropriate care path, as illustrated in Supplementary Fig. 2 and Supplementary Fig. 3. An encounter is complete when KPIN produces its output, which is a set of recommended offerings.
The observed abandonment rate, which quantifies the extent of incomplete encounters, was 2.94% (IQR: 2.77–3.11%); this indicates that 97.06% of encounters resulted in recommendations for care. Under the LUIS system, each set of clarifying questions typically included 4–6 items, each displayed on a separate page. KPIN enhanced this workflow by limiting each page to three items or fewer and introducing a second page only when necessary to guide patients toward the appropriate clinical navigation pathway. This streamlined user interface was designed to reduce user abandonment rates.
KPIN integrates data (e.g., user age and gender) from multiple databases with the “Reason for Visit” input, to generate tailored clinical care offerings, which are safe and appropriate to book. A booking can be scheduling an appointment, or another clinically appropriate option (e.g., redirecting to the pharmacy page for medication refills, sending a secure message to a provider, going to a walk-in nurse clinic for a vaccination). Unlike most fee-for-service entities, our integrated value-based organization enables many reasons for visit to be addressed via options beyond scheduling an appointment; and this conserves access, yields efficiencies, and satisfies patients’ preference for convenience. We thus count the presentation of non-appointment offerings toward the successful booking rate, which is a novel metric arising from KPIN’s deployment. A successful booking is defined as a patient completing the KPIN process and either (1) being offered at least one appointment option and the patient selects one, or (2) being presented with an alternative, appropriate care option—such as sending a message—through which the patient’s reason for visit can be addressed. A non-successful booking is one where the patient is offered at least one appointment and does not book it.
The KPIN interface allows users to modify or re-enter their “Reason for Visit" at any stage, a behavior reflected in the data. Consequently, a single patient can engage in multiple duplicate encounters but ultimately settle on only one booking. We thus had to account for this type of behavior, which is typical of users in navigating web and app interfaces, in our assessment of the KPIN successful booking rate16. The adjusted successful booking rate, defined as the ratio of bookings within 10 min (numerator) to initial encounters (denominator), was 53.68%. The 10 min timeframe we applied to exclude duplicate encounters, which is the adjustment modifier, was determined through the “knee point” identification technique17, as detailed in the Methods section. See Fig. 3 for examples of how this method applies in various scenarios. Based on limited data from LUIS, the pre-KPIN successful booking rate was 34%.
Fig. 3. Examples of applying timestamp criteria with 10 min cut-off.
Patient A engaged in three encounters, none of which is a duplicate, because all of the encounters occurred outside of the 10 min cut-off. Patient B engaged in three encounters, one of which is a duplicate. She booked two appointments, neither of which is a duplicate, as they were distinct appointment time slots. The third encounter is considered a duplicate because it occurred within 10 min of the second and was not associated with a booked appointment. Duplicates are excluded from the calculation of the adjusted successful booking rate.
While the adjusted successful booking rate of 53.68% may appear unremarkable, it underscores the value in maintaining the robust integrated booking ecosystem in which the organization has invested. In instances where digital booking does not address the needs of the patient, he or she can be directed to the agent-assisted channel where an agent can provide the additional needed support (24 h a day, 7 days a week). Supported by consistent guidelines applied across all channels, the channels can thus operate complementarily.
Patient experience
KPIN users with a registered email address or phone number were eligible to participate in a satisfaction survey, accessible within one day of their encounter. Between July 2024 and February 2025, spanning three full months before and after KPIN’s go-live, 114,396 surveys were distributed, with response rates of 2.61% and 2.77% for the pre- and post-go-live periods, respectively. Users were asked to indicate their level of agreement with specific statements on a 5-point scale, as illustrated in Supplementary Table 1, with responses subsequently converted to percentage scores.
Two-sample z-tests showed statistically significant improvements in post-KPIN mean scores for overall satisfaction, level of effort, and ease of navigation, which were higher by 2.56% (two-tailed p = 0.0021), 1.94% (two-tailed p = 0.0162), and 3.21% (two-tailed p = 0.000196), respectively. Additionally, the survey included a free-text feedback prompt: “Please tell us in your own words what we did well and if there are any areas where we could improve.” A total of 1531 (50.746%) surveys included patient comments. A sentiment analysis of these comments was conducted, classifying feedback as positive, negative, or neutral. Pre-KPIN comments were classified as 58.798% positive, 39.771% negative, and 1.431% neutral, while post-KPIN comments were classified as 67.428% positive, 30.409% negative, and 2.163% neutral. Thus, we observed an 8.63 percentage point increase in positive sentiment after KPIN deployment (two-proportion z-test; two-tailed p = 0.0005). The independence assumption for the z-test is met, as no participants were surveyed in both the pre- and post-KPIN groups. The 8.63 percentage point increase for positive sentiment highlights KPIN’s beneficial impact on users’ experience for the online channel.
Discussion
While our findings highlight the significant impact and capabilities of the KPIN system, a limitation of our study is the lack of reliable pre-KPIN metrics for direct performance comparison. Metrics such as adjusted successful booking rates and abandonment rates prior to KPIN implementation were either unavailable or limited due to the constraints of the legacy system. In fact, there is a scarcity of research covering such metrics as they pertain to online patient portals in the healthcare industry. For SCPMG, the challenges associated with reliable data extraction were among the characteristics marking the legacy LUIS system as insufficient for meeting the organization’s evolving business objectives. The organization recognized the need to transition from a vendor-provided solution to a custom-built platform like KPIN, which better aligns with our integrated value-based model for care delivery. After deploying KPIN we learned of Microsoft’s plan to retire LUIS in October 2025, which means that KPIN would prove a particularly timely advancement18.
For a framework to better evaluate KPIN’s performance we may look to external contexts, including cross-industry metrics. For instance, it is estimated that only three out of ten consumers who attempt to book healthcare appointments online are successful, and this is because patients typically get redirected to call centers to complete the scheduling process19. KPIN is superior on this front, with a 53.68% adjusted successful booking rate. Also, high online abandonment rates have been noted as relates to other industries: e-shopping cart abandonment averages 70.19%20, while hotel website abandonment reaches 84.63%21. Recall that for KPIN the abandonment rate was a much smaller figure at 2.94%. A key point is that KPIN is complemented by the organization’s Agent-assisted channel, which runs 24/7; as not all needs can be met online, it is vitally important for other channels to deliver support. Additionally, KPIN sets a standard for the methods by which online care navigation systems may be assessed, in introducing the adjusted successful booking rate metric, coupled with the KPIN abandonment rate.
A technical limitation of KPIN today is its handling of vague entries in the ‘Reason for Visit’’ input (e.g., “feel bad”); without more detail the system will by default direct the patient to a generic pathway, which can potentially consume appointment slots that may not be appropriate. We are working to address this by enhancing the specificity of follow-up questions through additional model fine-tuning, which involves collaboration with physicians and operations stakeholders. Another limitation of KPIN is that it does not yet offer substantial support for departments outside of Primary Care. There are plans to expand to include Specialties (e.g., Urology, Oncology), but execution will require the dedication and bandwidth of physician subject matter experts from the corresponding clinical specialty fields. Their guidance is important for ensuring clinically safe model improvements, monitoring performance, and staying aligned with evolving patient needs. Also, expansion of KPIN must account for potential infrastructure constraints at the organizational level. As an internally built innovation and not a third-party vendor solution, KPIN places demands on the organization’s server infrastructure. Difficult decisions will be required to address competing technology priorities amid finite computing resources. Such challenges can hinder plans for KPIN expansion.
The above limitations notwithstanding, the robust performance of CAS highlights how the strategic application of emerging technologies can positively impact patient safety. Previous studies on applying NLP in support of assessing critical cases have primarily focused on analyzing unstructured notes in emergency department settings; few have implemented these systems for routine clinical practice22,23. Our findings support the integration of advanced NLP tools with patient self-booking workflows, to promote care that is safe and timely.
A critical success factor for KPIN has been the role of physician subject matter experts, without whose domain knowledge the training of the models could not have occurred. Also, these physicians are joined by nurses and other clinical staff, along with experts on the organization’s booking guidelines and clinical protocols, to conduct ongoing monitoring of KPIN performance. Additionally, teams of relationship managers solicit feedback on KPIN outputs from frontline physicians who experience firsthand how KPIN influences appointment booking practices; that feedback informs efforts to improve KPIN. This multidisciplinary approach, led by physicians, ensures KPIN outputs remain clinically safe and relevant.
KPIN facilitates the evolution of personal health records toward greater customization24. More importantly, the system empowers patients to take an active role in their healthcare, a behavior closely linked to improved clinical outcomes25,26. Moreover, as a patient-centric care navigation tool, KPIN provides the organization with actionable insights into patient behaviors, such as communication patterns and responsiveness to prompts. For instance, analyzing how patients articulate their symptoms on the patient portal provides knowledge for driving model refinement and for improving the online booking experience. KPIN thus enables an extensive understanding of patient interactions at the digital front door to fuel innovation, which was not possible with the previous vendor-provided solution. It contributes to SCPMG’s evolution as a “Learning Health System”27, one which is committed to continuously enhancing care through data integration, experiential learning, and evidence generation.
Although KPIN is a technological advancement, it is important to emphasize that its function is fundamentally rooted in enhancing patient-centered care navigation, safety and patients’ well-being. While the integration of KPIN into the patient portal streamlines the user experience, which patients recognized as helpful during the prototype development process, care quality and coordination are the core drivers of the system. The Supplementary Case Vignette provided in the Supplement exemplifies this commitment; it underscores the system’s proven ability to significantly impact patient outcomes. Such cases serve as sources of inspiration for clinical and technology leaders to collaboratively design and refine cutting-edge technology that prioritizes patients’ needs. Looking ahead, there is the potential to expand KPIN’s capabilities into more conversational experiences—to foster deeper interactions as part of a first-rate patient experience available within an integrated value-based care delivery system.
Methods
Architecture design
KPIN’s architecture integrates multiple models, including large language models and custom transformer models, as illustrated in Fig. 4. Its processes begin when a patient submits information through the ‘Reason for Visit’’ free-text field on the appointment booking page, which serves as the primary input to the pipeline. We define the action of submitting this information as the start of a KPIN encounter. The middle layers involve multiple processing steps, including translation from Spanish to English, if necessary, execution through CAS and CNS, and integration with various organizational databases to retrieve information such as patient age and gender, and clinical booking guidelines. The final output is a set of clinically appropriate offerings (e.g., video visits, phone visits, provider messaging) from which the patient can select; its presentation marks the completion of an encounter. An incomplete encounter is an instance of user abandonment from KPIN.
Fig. 4. Architectural overview of the KPIN system.
KPIN processes patient-entered “Reason for Visit" information through a sequential pipeline. It begins with the Clinical Alert System, which identifies high-priority symptoms. Where appropriate, Language Identification and (Spanish to English) Translation steps are applied. The Clinical Navigation System then determines the most appropriate pathway, incorporating additional inputs from internal databases to refine recommendations. Utilizing this structured pipeline, KPIN generates safe, clinically appropriate offerings, which include but are not limited to In-Person Visits, E-Visits, and Messaging a Provider.
In the initial step of KPIN, the ‘Reason for Visit’’ input is processed by CAS, which classifies cases into one of many Clinical Alert Pathways to distinguish those requiring urgent attention (Table 4). When necessary, patient inputs are translated from Spanish to English using a combination of Meta’s FastText language identification model and a fine-tuned version of the Opus-MT es-en translation model, developed by the Language Technology research group at the University of Helsinki. The latter has been fine-tuned on a curated dataset of 500 Spanish-English ‘Reason for Visit’’ pairs from 2024. Afterwards, 150 additional translation pairs were verified by a clinically trained Spanish-speaking physician to ensure accuracy and contextual relevance in supporting Spanish-speaking patients. This translation step ensures seamless processing within the KPIN system, whose functionality relies on English language inputs.
Table 4.
Examples Highlighting Clinical Alert and Clinical Navigation Pathways
| Clinical alert pathways | Examples |
|---|---|
| High-priority symptom pathways | Chest pain, abdominal pain |
| Clinical navigation pathways | |
|---|---|
| Symptom-based pathways | UTI, rash, cough, cold, flu |
| Maintenance pathways | Vaccines, medication request |
| General pathways | Physical exams, routine follow-ups |
| Specialty pathways | Referrals, X-rays, mammograms |
| Default pathways | Simple health concerns (Adult), general appointing needs (Pediatrics) |
Within CAS, inputs undergo a series of NLP steps, including custom regular expression matching, entity recognition and text classification, to evaluate the severity of any reported symptoms. Inputs with high-acuity characteristics trigger additional symptom verification questions designed to assess acuity. These questions are presented in a user-friendly format, with yes or no answer choices, with a maximum of three questions per page, to promote ease of use. If the patient affirms the presence of high-acuity characteristics, he or she is directed away from CNS and promptly advised to seek immediate attention through a nurse triage line for further evaluation.
Upon clearing CAS, the inputs are then passed through CNS. Here, the “Reason for Visit" and other inputs, including responses to occasional follow-up questions, are further classified into one of many Clinical Navigation Pathways, each designed to address varying clinical scenarios (Table 4). Furthermore, these pathways are dynamically filtered to account for specific patient demographic dimensions (e.g., age and gender) and other clinical information from internal databases, to generate personalized offerings while ensuring adherence to established clinical booking guidelines (Fig. 4). The resulting set of recommended offerings, or KPIN’s terminal output, corresponds with the pathway determined by the CNS to be best suited for the individual case. Patients are directed to the Default Pathways when deemed the most suitable option by the CNS.
Model training and validation
To optimize performance, custom few-shot transformer models were implemented as the foundation for both CAS and CNS. Few-shot learning enables strong generalization with less labeled data, which is particularly valuable in clinical applications where expert annotation is time-intensive28,29. Unlike traditional few-shot approaches that rely on prompt engineering, our method eliminates the need for explicit prompting. This design choice reduces inference latency, making it more suitable for real-time clinical applications where rapid decision-making is critical30–33. Additionally, prompt-free architectures improve model robustness by mitigating the variability introduced by prompt tuning34,35. Given KPIN’s requirement to process patient inputs within seconds, this approach was selected to ensure both rapid inference and high accuracy.
The models driving CAS and CNS were trained on datasets of 800 and 2000 clinically verified patient inputs, respectively, with an additional 200 and 500 messages reserved for validation, following an 80/20 split. The dataset, consisting of historical “Reason for Visit" entries from 2023–2024, was reviewed and labeled by a multidisciplinary team of physicians, nurses, and subject matter experts in clinical booking guidelines. This process ensured accuracy, diversity, and a comprehensive representation of relevant clinical scenarios. During development, the models were checked for potential biases along dimensions such as age, sex, ethnic background, and language.
Model performance for CAS and CNS was further assessed using a stratified independent dataset of 1000 additional “Reason for Visit" inputs. This dataset was stratified based on the resulting sets of clinically appropriate offerings, ensuring balanced representation across combinations. These encounters were collected post-KPIN launch and reviewed by physicians and subject matter experts to ensure accuracy and reliability. The results appear on Tables 2 and 3. Such continuous monitoring of model performance in production supports periodic retraining to maintain accuracy while aligning with the organization’s business objectives.
Data analytics
To calculate the adjusted successful booking rate, we accounted for the impact of human behavior, which can add to the total number of encounters but also concurrently introduce repeated identical encounters over a short period of time. Such duplication, while indicative of user engagement, can obscure the cognitive intent behind encounters with the digital platform. Adjusting the raw total volume of encounters required reviewing the timestamp of each encounter recorded in KPIN’s analytics platform. Criteria were established to differentiate between index encounters and duplicate encounters, with the latter excluded from the calculation. The classification criteria were as follows: (1) all encounters resulting in a booked appointment were classified as index encounters; (2) encounters not preceded by another encounter with an identical pathway in the prior 10 min and not followed by an appointment booked from an identical pathway within the subsequent 10 min were also considered index encounters; (3) all other encounters were identified as duplicates. Subject matter experts were given random samples to validate these scenarios.
The ten-minute cutoff was determined using the “knee point” detection methodology described by Satopaa et al17. By plotting the curve of duplicate encounters as a function of increasing time deltas (Fig. 5), the knee point was identified based on the maximum of the difference curve (Fig. 6). This difference curve captures the rate of change in duplicate detection over time, with its peak indicating the point where the marginal gain in identifying duplicates starts to decline rapidly. This inflection point was crucial in establishing the optimal balance between retaining meaningful user encounters and excluding redundancies. The slope calculation, which was central to this analysis, ensured a precise and reproducible determination of the cutoff point, ultimately leading to a more accurate adjusted successful booking rate metric.
Fig. 5. Visualization of duplicated encounters during patient encounters with KPIN.
The graph illustrates how the number of duplicate encounters removed changes across varying time deltas. At the 10 min indicator (vertical black dashed line), a total of 364,440 duplicate encounters were removed, represented by the red dot. The blue solid line demonstrates the cumulative volume of duplicate encounters removed as the allowable time delta increases. Vertical Black Dashed Line – 10 Min Indicator. Red Dot – Indicator that 364,440 Duplicate Encounters were Removed at 10 min. Blue Solid Line – Volume of Duplicate Encounters Removed Over Various Time Deltas.
Fig. 6. “Knee Point” identification analysis for encounters with a 10 min cut-off.
This figure illustrates the technique from Satopaa et al. for precise detection of inflection points in system behavior; the method applies the Kneedle algorithm for determining the optimal cutoff. The blue curve represents the normalized duplicates removed, while the red curve depicts the difference from an idealized linear trend. The dashed black line marks the knee point, corresponding to the maximum of the difference curve, indicating the time delta at which the marginal gain in duplicate detection begins to decline most rapidly. This inflection point defines the optimal cutoff for balancing meaningful encounters and redundancy reduction.
For patient experience metrics, inherent biases in survey responses persisted as a limitation. Additionally, the number of surveys administered was constrained by organizational efforts to mitigate survey fatigue. Specifically, KPIN users who had previously completed a patient experience survey in the last 6 months —either for KPIN or another SCPMG program—were excluded. Moreover, based on feedback from patient representatives, the organization designed the survey to be brief (i.e., consisting of no more than five items), to promote patient responses; this explains why a lengthier, more robust survey instrument was not adopted. This strategy for the organization sought to balance the collection of meaningful feedback with minimizing the burden of survey participation on its patients.
Supplementary information
Acknowledgements
This research did not receive any external funding. We extend our gratitude to the KPIT partners for their support in integrating our KPIN system onto the digital platform. We also sincerely acknowledge Ramin Davidoff, Executive Medical Director of SCPMG, for his executive support throughout this project.
Author contributions
D.N. and K.N. conceived the study. D.N., S.L., K.N., R.S., D.G., K.O., and M.M. ran the study. R.S., S.L., C.W., M.K., J.C., K.O., D.G., and O.O. led the design and construction of the models. D.N., S.L., and L.C. led testing and validation of the system in production. M.K., B.A., T.S., and F.J. performed data analysis. K.L., R.S., S.L., and D.N. wrote and edited the manuscript. All authors read and approved the final manuscript.
Data Availability
The data generated and analyzed in this study are not publicly available due to privacy reasons but are available from the corresponding author on reasonable request.
Code availability
The underlying code for this study is not publicly available for proprietary reasons.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41746-025-01838-1.
References
- 1.Reynolds, T. L., Ali, N. & Zheng, K. What do patients and caregivers want? A systematic review of user suggestions to improve patient portals. AMIA Annu. Symp. Proc.2020, 1070–1079 (2021). [PMC free article] [PubMed] [Google Scholar]
- 2.Rodriguez, J. A. & Lyles, C. R. Strengthening digital health equity by balancing techno-optimism and techno-skepticism through implementation science. NPJ Digit. Med.6, 203 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhang, J., Prunty, J. E., Charles, A. C. & Forder, J. Association between digital front doors and social care use for community-dwelling adults in England: cross-sectional study. J. Med. Internet Res.27, e53205 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Halton, R. Industry voices – generation Z is a game changer for healthcare. Fierce Healthcarehttps://www.fiercehealthcare.com/practices/industry-voices-generation-z-a-game-changer-for-healthcare (Fierce Healthcare, 2020).
- 5.Prasad, A. Digital patient experience in healthcare: a necessary game changer. Forbes Agency Councilhttps://www.forbes.com/councils/forbesagencycouncil/2022/11/17/digital-patient-experience-in-healthcare-a-necessary-game-changer/ (Forbes, 2022).
- 6.Nguyen, K. et al. A safety catch system for patient self-service appointment booking. NEJM Catal Innov Care Deliv. 6; 10.1056/CAT.24.0162 (NEJM, 2025).
- 7.Moreira, A., Alves, C., Machado, J. & Santos, M. F. An overview of omnichannel interaction in health care services. Mayo Clin. Proc. Digit. Health1, 77–93 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Betancor, P. K. et al. Efficient patient care in the digital age: online appointment scheduling in an ophthalmology practice. Digit Health10, 20552076241287083 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Esmaeilzadeh, P. Challenges and strategies for wide-scale artificial intelligence (AI) deployment in healthcare practices: A perspective for healthcare organizations. Artif. Intell. Med.151, 102861 (2024). [DOI] [PubMed] [Google Scholar]
- 10.Bhagat, S. V. & Kanyal, D. Navigating the future: the transformative impact of artificial intelligence on hospital management—a comprehensive review. Cureus16, e54518 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.van Hoorn, E. S., Ye, L., van Leeuwen, N., Raat, H. & Lingsma, H. F. Value-based care: a systematic literature review. Int J. Health Policy Manag13, 8038 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lopez de Coca, T., Moreno, L., Alacreu, M. & Sebastian-Morello, M. Bridging the generational digital divide in the healthcare environment. J. Pers. Med. 12, 1214 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Määttä, J. et al. Diagnostic performance, triage safety, and usability of a clinical decision support system within a university hospital emergency department: algorithm performance and usability study. JMIR Med. Inform.11, e46760 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rahman, S. et al. Generalization in healthcare ai: evaluation of a clinical large language model. Preprint at https://arxiv.org/abs/2402.10965 (2024).
- 15.Wilimitis, D. & Walsh, C. G. Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI2, e49023 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Brex. How to identify and prevent duplicate payments in accounts payable. Brexhttps://www.brex.com/spend-trends/accounting/prevent-duplicate-payments-in-accounts-payable (BREX, 2025).
- 17.Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a “Kneedle” in a haystack: detecting knee points in system behavior. In: Proc. 31st Int. Conf. Distrib. Comput. Syst. Workshops, 166–171 (ICDCSW, 2011).
- 18.Microsoft. What is language understanding (LUIS)? Microsofthttps://learn.microsoft.com/en-us/azure/ai-services/luis/what-is-luis (Microsoft, 2024).
- 19.Business Wire. Notable survey: 61% of patients skip medical appointments due to scheduling hassles. Business Wirehttps://www.businesswire.com/news/home/20221114005097/en/Notable-Survey-61-of-Patients-Skip-Medical-Appointments-Due-to-Scheduling-Hassles (Business Wire, 2022).
- 20.Baymard Institute. 49 Cart Abandonment Rate Statistics 2025. Baymard Institutehttps://baymard.com/lists/cart-abandonment-rate (Baymard Institute, 2025).
- 21.Travel Outlook. Online booking abandonment rate. Travel Outlookhttps://traveloutlook.com/capture-abandoned-bookings/ (Travel Outlook, 2025).
- 22.Masanneck, L. et al. Triage performance across large language models, ChatGPT, and untrained doctors in emergency medicine: Comparative study. J. Med. Internet Res.26, e53297 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stewart, J. et al. Applications of natural language processing at emergency department triage: A narrative review. PLoS ONE18, e0279953 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tang, P. C., Ash, J. S., Bates, D. W., Overhage, J. M. & Sands, D. Z. Personal health records: definitions, benefits, and strategies for overcoming barriers to adoption. J. Am. Med. Inform. Assoc. JAMIA13, 121–126 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hibbard, J. H. & Greene, J. What the evidence shows about patient activation: better health outcomes and care experiences; fewer data on costs. Health Aff.32, 207–214 (2013). [DOI] [PubMed] [Google Scholar]
- 26.Ezeamii, V. C. et al. Revolutionizing healthcare: how telemedicine is improving patient outcomes and expanding access to care. Cureus16, e63881 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Institute of Medicine (US) Roundtable on evidence-based medicine. The Learning Healthcare System: Workshop Summary (eds. Olsen, L. A., Aisner, D. & McGinnis, J. M.) (National Academies Press, 2007). [PubMed]
- 28.Oniani, D., Chandrasekar, P., Sivarajkumar, S. & Wang, Y. Few-shot learning for clinical natural language processing using siamese neural networks: algorithm development and validation study. JMIR AI2, e44293 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ge, Y., Guo, Y., Das, S., Al-Garadi, M. A. & Sarker, A. Few-shot learning for medical text: a review of advances, trends, and opportunities. J. Biomed. Inf.144, 104458 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Liu, H. et al. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS 2022) 142:1950–1965 (Curran Associates Inc., 2022).
- 31.Logan IV, R. L. et al. Cutting down on prompts and parameters: simple few-shot learning with language models. Preprint at https://arxiv.org/pdf/2106.13353 (2021).
- 32.Munkhbat, T. et al. Self-training elicits concise reasoning in large language models. Preprint at https://arxiv.org/html/2502.20122v2#:~:text=,training (2025).
- 33.Zou, J., Zhou, M., Li, T., Han, S. & Zhang, D. PromptIntern: saving inference costs by internalizing recurrent prompt during large language model fine-tuning. Preprint at https://arxiv.org/abs/2407.02211 (2024).
- 34.Zhao, T. Z., Wallace, E., Feng, S., Klein, D. & Singh, S. Calibrate Before Use: Improving Few-Shot Performance of Language Models. Preprint at https://arxiv.org/pdf/2102.09690 (2021).
- 35.Min, S., Lewis, M., Hajishirzi, H. & Zettlemoyer, L. Noisy Channel Language Model Prompting for Few-Shot Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) 1:5316–5330 (Association for Computational Linguistics, 2022).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data generated and analyzed in this study are not publicly available due to privacy reasons but are available from the corresponding author on reasonable request.
The underlying code for this study is not publicly available for proprietary reasons.






