Abstract
Introduction:
Unnecessary laboratory testing contributes to patient morbidity and healthcare waste. Despite prior attempts at curbing such overutilization, there remains opportunity for improvement using novel data-driven approaches. This study presents the development and early evaluation of a clinical decision support tool that uses a predictive model to help providers reduce low-yield, repetitive laboratory testing in hospitalized patients.
Methods:
We developed an EHR-embedded SMART on FHIR application that utilizes a laboratory test result prediction model based on historical laboratory data. A combination of semi-structured physician interviews, usability testing, and quantitative analysis on retrospective laboratory data were used to inform the tool’s development and evaluate its acceptability and potential clinical impact.
Key Results:
Physicians identified culture and lack of awareness of repeat orders as key drivers for overuse of inpatient blood testing. Users expressed an openness to a lab prediction model and 13/15 physicians believed the tool would alter their ordering practices. The application received a median System Usability Scale score of 75, corresponding to the 75th percentile of software tools. On average, physicians desired a prediction certainty of 85% before discontinuing a routine recurring laboratory order and a higher certainty of 90% before being alerted. Simulation on historical lab data indicates that filtering based on accepted thresholds could have reduced ~22% of repeat chemistry panels.
Conclusions:
The use of a predictive algorithm as a means to calculate the utility of a diagnostic test is a promising paradigm for curbing laboratory test overutilization. An EHR-embedded clinical decision support tool employing such a model is a novel and acceptable intervention with the potential to reduce low-yield, repetitive laboratory testing.
Keywords: Clinical Decision Support, Healthcare Utilization, Laboratory Testing, Prediction Algorithms, Laboratory Information Systems
1. Introduction
An estimated 10% to 50% of inpatient laboratory testing is medically unnecessary [1–5]. Unnecessary laboratory testing contributes to medical complications including iatrogenic anemia [6,7], infection, sleep disruption, worse patient experience, and increased healthcare costs [8]. Overtesting can also lead to a cascade of further downstream medical utilization in the case of false positives or incorrectly interpreted results [9–11]. More urgently, laboratory test overutilization also exacerbates the critical shortage of blood specimen collection tubes that has resulted from pandemic-related disruptions in the global supply chain [12].
In response to this need, there have been many efforts aimed at reducing laboratory testing overutilization [1,13]. Interventions include targeted education [14–17], creation of medical guidelines [18,19], checklists [20], and implementation of electronic health records (EHR)-based clinical decision support tools such as cost displays, order sets, and best practice advisory alerts [21–26]. While many of these interventions have yielded modestly positive results, there remain opportunities for improvement especially in hospitalized patients [19,27].
In this article, we present the development and early evaluation of a novel EHR-embedded clinical decision support tool that uses a predictive model to identify low-yield laboratory testing in hospitalized patients. The use of a predictive algorithm as a means to calculate the utility of a diagnostic test which is then presented to the ordering clinician is a new paradigm for addressing laboratory test overutilization. We use a combination of qualitative and quantitative methods to inform the tool’s development and determine its acceptability and potential clinical impact.
2. Materials and Methods
2.1. Overview
The following study was performed at an academic medical center. As part of a larger quality improvement effort to reduce redundant laboratory testing, a clinical decision support tool titled the “Recurring Orders Dashboard” was prototyped employing a simple model for predicting the result of repeat blood tests in hospitalized patients. We used a combination of qualitative and quantitative research methods to inform and evaluate the development of the tool and to better understand physician attitudes and behaviors surrounding laboratory test utilization and clinical prediction models. For this initial pilot, we chose to focus on the perspective of internal medicine physicians caring for patients on the general medicine wards, as this is our largest and most generally representative inpatient care setting.
2.2. Development of the Dashboard and Predictive Model
The “Recurring Orders Dashboard” is a clinical decision support tool embedded within an iframe in the Epic EHR (Epic Systems, Verona, WI). The application is programmed in Javascript and is launched from the EHR using the Substitutable Medical Applications and Reusable Technologies (SMART) on Fast Healthcare Interoperability Resources (FHIR) protocol. The application pulls real-time clinical data (including information about the patient, encounter, and prior laboratory test results) from the EHR using FHIR resources. A SQL query to the underlying EHR database is used to identify the “recurring” status of laboratory orders (e.g. “daily” or “every 12 hours”) as this information is currently unavailable via FHIR. An initial functional prototype of the application (Figure 1) was developed in close collaboration with clinical sponsors, which was iteratively improved through a combination of qualitative user testing and quantitative simulations described below.
Figure 1.

Screenshot of the first and second versions of the “Recurring Orders Dashboard.” Top (A): the first version of the dashboard shows pertinent data regarding the patient’s hospitalization including length of stay and number of blood draws. Prior blood test results are displayed graphically and numerically. A third column includes a prediction that the next blood draw will fall within the reference range. Bottom (B): the second iteration implements feedback from user interviews to include user interface refinements, additional information around the prediction model, and a recommended action (flagging predictably normal orders for review). Dates in these figures are redacted to preserve patient privacy.
The tool features a simple prediction model, which we call the Labogram, which estimates the probability that a repeat blood test will be within the reference range based solely on the number of preceding normal results observed in the prior 7 days [28,29]. For example, if a patient had three consecutive serum sodium measurements in the normal range within the past week, the system would predict an 89% probability that the following sodium would be in the normal range based on historical laboratory data patterns.
Probability estimates were calculated for the 27 most common venous blood tests (those components that comprise the comprehensive metabolic panel, complete blood count, and coagulation panel) using de-identified laboratory data from all inpatient encounters between 1 January 2020 and 31 December 2020 (Supplemental Table S1). This included a total of 6,592,275 lab results across 159,110 encounters and 68,974 unique patients. 78% of these orders were entered by physicians and 22% were entered by advanced practice providers (nurse practitioner or physician assistant). Blood tests being drawn for the first time in an encounter or those being repeated for the first time in 7 or more days were grouped together and similarly treated as “first-time” blood tests.
2.3. Qualitative Data Collection and Analysis
Between March 2022 and May 2022, 30-minute semi-structured interviews with 15 internal medicine physicians (6 resident and 9 attending physicians) were conducted via videoconferencing software by two authors (NR and SM). Physicians were recruited through a combination of convenience and snowball sampling methods [30] until thematic saturation was achieved [31].
The interview guide (Supplemental Material: Interview Guide) covered topics informing the development of the tool and assessing its usability and acceptability. To identify key drivers, physicians were asked about their attitudes and behaviors around laboratory test utilization. To identify the tool’s acceptability, physicians were asked about attitudes and behaviors around medical decision making, clinical prediction models, and perceived challenges and benefits to using the proposed application. To inform the implementation of the predictive model, physicians were interviewed about their inherent decision-making and alerting thresholds. Finally, to assess the usability of the tool, interviewees participated in a short think-aloud session [32] in which they demoed a functional prototype and subsequently completed the 10-point Systems Usability Scale (SUS) survey [33,34]. In these demonstrations, users remotely operated a live prototype of the application in the EHR on real patient data and their interactions with the application were observed synchronously via videoconferencing screen sharing.
Interviews were recorded and transcribed. Thematic analysis was performed using a deductive approach with the content guide as an initial codebook. Excerpts and notations were reduced to a matrix format following the rapid analytic procedure methodology [35,36].
2.4. Quantitative Data Collection and Analysis
To quantify the tool’s potential clinical impact, we applied the median decision threshold from physician interviews to 3-months of laboratory data and calculated what proportion of commonly recurring laboratory test and order panels would have been flagged for review. We performed these simulations on de-identified laboratory data from inpatient encounters between 1 June 2021 and 31 August 2021. This dataset is distinct from the 2020 laboratory data used to calculate the initial model, and includes a total of 1,071,079 lab results across 46,609 encounters and 27,381 unique patients.
2.5. Research Ethics
Verbal informed consent was obtained by all interviewed participants including consent to record the interview and consent to publish anonymized excerpts. The Stanford University Institutional Review Board approved the secondary use of de-identified electronic health records laboratory data for this project and granted a non-research determination for the qualitative research interviews.
3. Results
3.1. Dashboard Prototype and Predictive Model
We developed an EHR-embedded application targeting repeat laboratory testing and released an initial functional prototype for user testing and iterative development in March 2022. The application is launched as a standalone “activity” within the EHR. Once opened, it is displayed in its own tab (alongside other activities such as “Orders” or “Results”), and typically takes about 5 – 10 seconds to load. The dashboard (Figure 1A) highlights elements of the patient’s current hospitalization including length of stay and number of blood draws, and displays all currently ordered repeat laboratory tests. Order panels (e.g. metabolic panel) can be expanded to view all laboratory test components (e.g. sodium, potassium, creatinine, etc.). A graphical and numerical summary of recent laboratory test results is provided as well as an estimate on the probability that the next test will be in the normal range. While much of this information is available throughout the patient record, we present it together with the results of our probability model to give context and assist the clinician with its interpretation. The probability estimate is provided by the Labogram model described earlier in the Methods section. Supplemental Table S1 shows model estimates for common laboratory components at our institution.
Based on insights derived from our user interviews (described below), iterative changes were made to the tool including an interpretation of the prediction (flagging orders above the prediction threshold for review), more information on the methodology of the predictive model, and improvements to the user interface (Figure 1B).
3.2. Qualitative Research Results
3.2.1. Themes around laboratory test utilization
Key themes around physician attitudes and behaviors towards laboratory test utilization are summarized in Table 1. The concepts most commonly reported as contributing to laboratory test overutilization included culture, convenience, a desire to not “miss anything,” and lack of awareness around recurring orders.
Table 1.
Summary of key qualitative research findings around clinical prediction models and decision-making and alerting thresholds.
| Perceived factors contributing to overuse of repeat laboratory testing | ||
|
| ||
| Theme | Respondents | Key Quotes |
|
| ||
| Culture | 9/15 (60%) | “There is this culture of ‘what are the morning labs?’” “Feels as though you are expected to report something on rounds and often the only objective data that you have are a CBC and a BMP” |
| Convenience | 8/15 (53%) | “Don’t have to think about ordering labs everyday” “I put in a CBC daily… for 99 times” “You have to be kind of deliberate about de-escalating orders… oftentimes the admitting resident or person has already set in motion this train of ordering 99 AM labs…and it’s actually more work for you…to stop that train as opposed to letting it go.” “Part of it [is that] having all those standing orders is convenient so we don’t have to think about ordering labs every day” |
| Don’t want to miss anything | 8/15 (53%) | “We Don’t miss anything” “Part of it is [so that] we Don’t miss anything” “Really useful to have the morning labs before rounds… if they’re not ready… you could really end up delaying care or just interrupting the workflow for patients” |
| Forgetfulness/Lack of awareness | 5/15 (33%) | “Just order daily standing labs and then forget about it” “When things are pretty stable we tend to forget about [repeating labs]” |
|
| ||
| Necessary information to trust a clinical decision support tool | ||
|
| ||
| Theme | Respondents | Key Quotes |
|
| ||
| Evidence of validation | 10/15 (67%) | “Link to a high impact paper” |
| Real-world examples (e.g. observe performance in realtime) | 4/15 (27%) | “See consistently accurate predictions [in practice]” |
| Description of how the tool works | 4/15 (27%) | “Explanation of how the algorithm ends up with this prediction” |
| Endorsement by trusted source | 3/15 (20%) | “Like a grand rounds [presentation]… from [someone] trusted in the field” |
|
| ||
| Preference for simple model versus machine learning model | ||
|
| ||
| Theme | Respondents | Key Quotes |
|
| ||
| Prefers machine learning model | 5/15 (33%) | “I would prefer a more complicated model… that looks at all historical values, medications…” “It’s more dynamic and it allows you to incorporate more parameters” “I would have a more favorable attitude towards something that incorporates more patient specific factors” |
| Prefers simpler model | 3/15 (20%) | “I like that I can understand it” “Distrust… related to more complex algorithms” |
| No preference | 7/15 (47%) | “As long as the model is validated” “I Don’t have any particular hesitation around [a model]” |
3.2.2. Themes around prediction models, decision-making, and alert thresholds
Additional key qualitative research themes around physician behaviors and attitudes towards predictive algorithms, medical decision making, and alerting thresholds are summarized in Table 1. With regards to predictive models, physicians frequently expressed a desire for evidence of validation such as through published scientific articles or prospectively through their own practice and experience with the tool. A few physicians expressed a desire to know the model’s inner workings. Physicians were also open to the use of machine learning (ML) models in daily practice. Only 3 of 15 respondents preferred a non-ML model. The remainder had either no preference (7/15) for model methodology or preferred an ML model (5/15).
To inform the decision threshold for the predictive model portion of the dashboard, physicians were also surveyed with regards to their own medical decision-making practices. Participants were asked to estimate at what probability threshold they would likely discontinue routine laboratory testing in a stable patient. Apart from one outlier, responses ranged between 80% and 99% with a median of 85% (Figure 2A). Similarly, physicians were also surveyed at what probability threshold the predictive model should alert them of a potentially low-yield repeat blood test. Most responses clustered around 90% to 95%, with a median value of 90% (Figure 2B).
Figure 2.

Decision-making and alert thresholds derived from user interviews. Physicians were asked (A) at what probability of falling within the normal reference range they would discontinue routine daily blood tests for a stable patient and (B) at what probability they would like to be alerted by the tool. The dotted, light-blue line indicates the median value (85% and 90%, respectively).
3.2.3. Themes around usability
Emerging themes from the think-aloud product demonstration sessions are summarized in Table 2. Users approved of the graphical display of recent laboratory results. Users also expressed a desire for additional information around the prediction and a recommended action or interpretation stemming from the prediction, which were incorporated into the second version of the application.
Table 2.
Summary of key qualitative findings around usability, acceptability and perceived benefits and challenges to the implementation and use of the proposed tool.
| Themes around usability | ||
|
| ||
| Themes | Respondents | Key Quotes |
|
| ||
| Like the graphical display | 6/15 (40%) | “I like the graphical representation” “I like the presentation of the information in a graphical format especially in addition to the numeric values” |
| Confusion around what “predicted normal” means | 5/15 (33%) | “What does predicted normal mean? Is it this lab or the next one?” |
| Want to know more information about the model | 4/15 (27%) | “What I would want as a user is a little info tab… so I can know what [the prediction] means and then know how It’s being calculated.” |
| Desire interpretation or recommended action | 2/15 (13%) | “I Don’t actually know how I would use this, …it would be hard for me to interpret… like would I necessarily trust the 97% more than the 93%?” “Interpret the number in a more useful way as opposed to just giving a number…” “Suggest the person to [discontinue the order]… like ‘red’, ‘yellow’, ‘green’ grades” |
|
| ||
| Themes around tool acceptability, benefits, and challenges | ||
|
| ||
| Theme | Respondents | Key Quotes |
|
| ||
| This tool will alter my ordering practices | 13/15 (87%) | “I do think it would actually influence lab ordering practices… mainly for those population of stable patients who are in prolonged hospitalization” “I would order probably 2 to 3 less [lab tests per day]…” |
| Other benefits | ||
| More confident discontinuing labs | 3/15 (20%) | “Having that predictive model … would actually just be like a lot more data to back up your assertion that It’s okay not to monitor” |
| Educational tool | 2/15 (13%) | “I do try to bring up with my trainees on rounds when it seems like labs are not necessary, as frequently anymore… and so [the tool] could just kind of more systematically flag people to bring up on rounds.” |
| Challenges | ||
| Alert fatigue | 5/15 (33%) | “Another red exclamation point” |
| Workflow integration | 4/15 (27%) | “I think it can be very impactful if if It’s done in a way where it can just you know, integrate with what I’m doing already… that’s always the challenge” |
| Cognitive/time burden | 4/15 (27%) | “The challenge is always… provider awareness of having people learn how to incorporate this into their workflow and rounding” “Lengthens the workflow” “We’re already inundated with numbers” |
| Missing necessary blood test results | 3/15 (20%) | “[Might] dissuade me or a trainee from ordering a lab that actually might be helpful, delaying care by not having the labs at a certain time in the morning because they haven’t been ordered.” |
| Limitations | ||
| Not nuanced enough | 6/15 (40%) | “Not including the patient-specific factors could make this misleading and falsely reassuring” “A good model should look at all historical values and medications” |
| Normal versus stable | 3/15 (20%) | “I’d prefer if it was a predicted stable measure” “There’s a lot of people that have a low level anemia that… may show up as abnormal per the reference ranges, but [are] clinically stable.” |
The median Systems Usability Scale score amongst interviewed physicians was 75, with a mean score was 74.5 and a standard deviation of 15. Figure 3 illustrates the distribution of SUS scores and their interpretation based on meta-analyses of thousands of software tools performed by Bangor et al. [37,38]. Based on these prior studies, our median score of 75 corresponds to the user adjective of “Good” and scores amongst the 75th percentile of software tools.
Figure 3.

Top: Distribution of scores derived from the Systems Usability Scale survey issued to participants after completing the product demonstration. Bottom: A guide to interpreting the SUS score based on thousands of scores from industry software. Our tool’s median SUS score (75) is represented by the dotted black line and corresponds to an adjective of “Good” and falls amongst the 75th percentile of software tools.
3.2.4. Themes around acceptability
Elicited themes around the tool’s acceptability are summarized in Table 2. Of the 15 participants, 13 interviewed physicians believed the tool would alter their ordering practices. A few physicians proposed the dashboard would be a useful educational tool to foster discussion on rounds regarding healthcare utilization. However, challenges and limitations with regards to the tool were identified, including alert fatigue, concern for additional cognitive and time burden, workflow integration challenges, and the possibility of missing clinically significant abnormal blood test results. Furthermore, several participants voiced concern around the simple nature of the Labogram model and a few physicians noted that often in chronically ill patients, the key distinction is not whether a laboratory test will be normal or abnormal but whether it is stable from baseline.
3.3. Lab Utilization Simulation
Applying the median decision threshold of 85% derived from the physician interviews, we performed a series of analyses on 3 months of laboratory data from 2021 to simulate the potential effect of the tool on laboratory utilization. Simulations were performed at the test component level (e.g. “serum sodium”) for components of the basic metabolic panel. The main findings of the retrospective lab simulation study are summarized in Supplemental Table S2. Nearly 118,000 serum sodium tests were run over the 3-month period, of which nearly 74,000 were repeated within one week. At a threshold of 85%, the tool flags 36,634 (or 31%) of these sodium tests for consideration of discontinuation.
Although the model provides predictions at the laboratory test component, much of routine repeat blood testing is done via order panels. As a result, consideration was made as to how to apply this prediction to an order panel to provide a recommendation. For this purpose, we propose using a two-part threshold: the number of lab components comprising the panel which exceed the decision threshold of 85%. Below is a demonstration of that concept applied to the most commonly repeated laboratory test at our institution, the basic metabolic panel (BMP). In the 3-month period of lab data considered, there were 46,197 basic metabolic panel orders, of which nearly 40,000 were repeated within 7 days of a prior one. We calculated how many of these panels would be flagged for discontinuation as a function of the number of individual components that are above the decision threshold of 85% (Figure 4). For this order panel, we chose a threshold of 6 or more out of 8 components predicted to be normal with a probability of 85% or greater. At this threshold, 8,624 (22%) BMPs would be flagged for discontinuation.
Figure 4.

The proportion of repeat basic metabolic panel (BMP) orders flagged for discontinuation as a function of the number of components above the prediction threshold of 85%.
4. Discussion
We demonstrate the user-centered development and evaluation of a decision support tool to reduce low-yield repeat blood testing that employs a laboratory test result prediction model. Our user study indicates that the use of a predictive model is a feasible and acceptable paradigm for reducing low-yield repetitive laboratory testing in hospitalized patients. The model used in our tool, which we call the Labogram, is akin to an antibiogram. Much like an antibiogram reports pre-test probability of antibiotic sensitivity based on historical laboratory data, the Labogram model provides the pre-test probability a repeat blood test will be in the reference range, again based on an institution’s historical laboratory data.
Part education and part EHR-based clinical decision support, the tool builds on prior efforts to address key drivers of laboratory over-utilization including culture, lack of awareness of recurring orders, and a desire to “not miss something” [16,17,22,24,25]. By employing a prediction algorithm, the tool provides more context for the clinician to determine the utility of a repeat laboratory test.
This is a new perspective for tackling the issue of laboratory test overutilization that differs from prior interventions. Contrast this, for example, with implementations of Choosing Wisely or other value-based care initiatives often, which rely on a rules-based algorithm for guiding healthcare resource utilization [27], or for example, interventions that highlight the cost of a diagnostic test [26]. This framework can also be used to employ arbitrarily complex and personalized algorithms in the future, including machine learning laboratory test prediction models that can drive even more personalized and dynamic guidance [39].
4.1. Attitudes and Behaviors around Laboratory Testing
Our qualitative research efforts contribute to the literature surrounding physician attitudes and behaviors towards laboratory testing utilization and clinical prediction models. Physicians commonly indicated a sense of “culture” and “convenience” as the biggest contributors to overutilization of repeat laboratory testing as well as “a desire not to miss anything,” consistent with prior studies [14,40,41]. Physicians pointed to a “lack of awareness” of recurring repeating blood test orders as a major contributor, which has been targeted previously with attempts to limit the number of future repeat blood tests that can be ordered [23,42]. Our dashboard addresses many of these key drivers: listing all recurring laboratory orders and relevant patient encounter information increases awareness of recurring blood tests while the predictive model component challenges the cultural norm of daily laboratory testing.
4.2. User-Derived Thresholds for Medical Decision Making and Alerting
Additionally, the threshold at which a repeat blood test should be flagged for physician review has not been well defined. We addressed this question based on insights from user interviews coupled with simulations on retrospective laboratory data, adding to the growing body of literature around threshold implementation of clinical prediction models [44–48].
4.3. Usability and Acceptability
The tool’s median SUS score of 75 places it in the 75th percentile for software tools, which is significantly higher than other health IT systems, such as the EHR, which averages a usability score of 46 [43]. Findings from user interviews indicate high acceptability of the tool, with 13 of 15 interviewed users believing it will alter their ordering practices. User feedback such as desire for more information about the prediction model and desire for a recommended action, informed the subsequent iteration of the application.
4.4. Laboratory Test Components versus Order Panels
One important challenge in the implementation of this tool was consideration of panel orders (e.g. basic metabolic panel). The Labogram provides a prediction at the level of the individual laboratory test component (e.g. serum sodium). However, the unit of action in clinical practice is at the level of the laboratory testing order, which includes laboratory test panels that are composed of several test components. At our institution, laboratory panels account for the majority of repeat routine testing in hospitalized patients. As a result, careful consideration had to be made with regards to the alerting threshold. For example, in a panel with 8 components such as the basic metabolic panel, even if every component has a probability of 95% of falling within the reference range (i.e., a healthy patient), the naive probability that the entire panel will fall within the reference range is only 0.95^8 or 66%.
A potential approach may be to consider the panel in its totality as a single test (“normal” if each of its components fall in the reference range) and apply the same Labogram methodology for calculating panel-level probability estimates. However, we found that order panels were too infrequently normal in their totality to make meaningful predictions using this approach. Furthermore, many order panels contain redundant information, and not all test components contribute equally to the information gained from performing the test [49,50]. Therefore, we propose an alert threshold for the order panel that is a function of the number of individual test components above a certain probability, in this case 85%, which provides a balance between sensitivity and specificity comparable with the decision-making thresholds elicited from physician interviews.
4.5. Limitations and Future Work
While there are many generalizable findings from this study, including the mixed methodology framework for developing and evaluating a clinical decision support tool and the general concept behind the Labogram algorithm, not all qualitative research insights will be transferable to other settings. The Labogram provides a practical and intuitive estimate of the pre-test probability that a repeat test result will be normal, but may not be as accurate as more complex models. For example, in its current form, the algorithm does not incorporate clinical information aside from historical laboratory test results in its prediction. This is an important limitation to highlight in the change management campaign ahead of the tool’s broader, future implementation. We envision future versions of this tool that employ machine learning models which integrate additional clinical context to provide further personalized predictions. In the meantime, the Labogram model may also be tailored to specific practice settings by including only laboratory tests from those settings. For example, calculating a separate model for surgical patients and general medicine patients, or acute care patients and critical care patients, etc. (Supplemental Material: Additional Analyses).
While the proposed tool estimates whether repeat test results will fall within a normal range, the clinically relevant finding in repeat testing is often the trend rather than the “normalcy” of a laboratory test value [51]. This is best illustrated in patients with chronically abnormal test results in whom even an abnormal test result may be of low clinical utility as long as it demonstrates stability or a favorable trend
Additionally, this pilot study focuses on the perspective of internal medicine physicians, which was the initial target user group. However, as we plan for wider implementation of the tool, a variety of clinical perspectives will be elicited to ensure a tool that the application is broadly effective.
Future work beyond the pre-implementation stage will need to include mature workflow integration and a prospective evaluation of the tool’s effectiveness. In its current form, the tool is a standalone “activity” within the EHR. As such, it can be missed or bypassed by users. We envision incorporating the tool’s predictions into the user’s “Patient List” view, where an additional column displays the number of active recurring orders for each patient, and how many of them are identified by the algorithm as potentially low-yield. In this proposed integration, clicking on the column launches the dashboard in a separate tab (Supplemental Figure S1).
5. Conclusion
Repeat laboratory testing in hospitalized patients is pervasive and a significant source of low-value care with consequences for both the patient and the health system. We present a quantitative and qualitative approach to the development and evaluation of an EHR-embedded application utilizing a prediction algorithm for reducing repeat low-yield laboratory testing in hospitalized patients. Our study demonstrates that the use of a prediction algorithm is an acceptable and feasible paradigm for addressing laboratory test overutilization. Qualitative research revealed that physicians desired a median certainty of 85% before discontinuing recurring blood tests on a stable patient, and 90% before being alerted by a decision support tool. Furthermore, culture, convenience, a desire to “not miss anything,” and lack of awareness of recurring orders are key drivers of laboratory test overutilization.
Supplementary Material
Acknowledgments
The authors acknowledge Laura Holdsworth for her initial consultation on qualitative analysis methodology.
Funding
This study was partly supported through the Stanford Health Care Cost Savings Reinvestment Program and the Office of the Chief Medical Information Officer at Stanford Health Care. Naveed Rabbani was supported by the Stanford Department of Pediatrics’ Fellowship in Clinical Informatics. Jonathan H Chen was supported in part by the NIH/National Library of Medicine Award R56LM013365, the Stanford Artificial Intelligence in Medicine and Imaging and Human-Centered Artificial Intelligence (AMIA-HAI) Partnership Grant, Stanford Aging and Ethnogeriatrics Research Center (under NIH/National Institute on Aging grant P30AG059307), the Stanford Clinical Excellence Research Center (CERC), the Doris Duke Charitable Foundation “Covid-19 Fund to Retain Clinical Scientists”, and Google, Inc research collaboration to leverage EHR data for predicting clinical outcomes.
Abbreviations:
- EHR
Electronic Health Records
- FHIR
Fast Healthcare Interoperability Resources
- SMART
Substitutable Medical Applications and Reusable Technologies
- BMP
Basic Metabolic Panel
- CBC
Complete Blood Count
- SUS
Systems Usability Scale
Footnotes
Competing Interests
Jonathan H Chen is the co-founder of Reaction Explorer LLC, which develops and licenses organic chemistry software. He has received consulting fees from Sutton Pierce and Younker Hyde MacFarlane PLLC.
Ethics Statement
The Stanford University Institutional Review Board granted exempt status for this work under the grounds of quality improvement, protocol number 65067. Verbal informed consent was obtained by all interviewed participants including consent to record the interview and consent to use of anonymized excerpts in any resulting publications. De-identified electronic medical records laboratory data was used to create the predictive model and perform the presented retrospective analyses. Informed consent for these data was not required because the study used only secondary analysis of existing clinical data. Patient data was extracted and de-identified by the Stanford Medicine Research Data Repository. Use of this data was approved by the Institutional Review Board at Stanford University under protocol number 47618.
Consent for publication
Verbal informed consent was obtained from all interview participants to use anonymized excerpts and extracted themes in publications resulting from this work.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Availability of data and materials
The interview transcripts generated and analyzed during the study are not publicly available in order to protect the privacy of study participants. However, our interview guide is included in the supplemental materials. The de-identified EHR data analyzed during the study are provided by the Stanford Medicine Research Data Repository and are not publicly available due to compliance restrictions from our institution around the dissemination of high risk de-identified medical data. Aggregated calculations used to derive our prediction model are available in the supplemental material. Similarly, study code is publicly viewable at our research group’s github page https://github.com/HealthRex/CDSS/tree/master/scripts/Labogram.
References
- 1.Zhi M, Ding EL, Theisen-Toupal J, Whelan J, Arnaout R. The landscape of inappropriate laboratory testing: a 15-year meta-analysis. PLoS One. 2013. Nov 15;8(11):e78962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Huck A, Lewandrowski K. Utilization management in the clinical laboratory: an introduction and overview of the literature. Clin Chim Acta. 2014. Jan 1;427:111–7. [DOI] [PubMed] [Google Scholar]
- 3.van Walraven C, Raymond M. Population-based study of repeat laboratory testing. Clin Chem. 2003. Dec;49(12):1997–2005. [DOI] [PubMed] [Google Scholar]
- 4.Morgen EK, Naugler C. Inappropriate repeats of six common tests in a Canadian city: a population cohort study within a laboratory informatics framework. Am J Clin Pathol. 2015. Nov;144(5):704–12. [DOI] [PubMed] [Google Scholar]
- 5.Kandalam V, Lau CK, Guo M, Ma I, Naugler C. Inappropriate repeat testing of complete blood count (CBC) and electrolyte panels in inpatients from Alberta, Canada. Clin Biochem. 2020. Mar;77:32–5. [DOI] [PubMed] [Google Scholar]
- 6.Thavendiranathan P, Bagai A, Ebidia A, Detsky AS, Choudhry NK. Do blood tests cause anemia in hospitalized patients? The effect of diagnostic phlebotomy on hemoglobin and hematocrit levels. J Gen Intern Med. 2005. Jun;20(6):520–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Koch CG, Li L, Sun Z, Hixson ED, Tang AS, Phillips SC, et al. From Bad to Worse: Anemia on Admission and Hospital-Acquired Anemia. J Patient Saf. 2017. Dec;13(4):211–6. [DOI] [PubMed] [Google Scholar]
- 8.Shrank WH, Rogstad TL, Parekh N. Waste in the US Health Care System: Estimated Costs and Potential for Savings. JAMA. 2019. Oct 15;322(15):1501–9. [DOI] [PubMed] [Google Scholar]
- 9.Werner M Appropriate utilization and cost control of the hospital laboratory: panel testing and repeat orders. Clin Chim Acta. 1995. Jan 16;233(1–2):1–17. [DOI] [PubMed] [Google Scholar]
- 10.Deyo RA. Cascade effects of medical technology. Annu Rev Public Health. 2002;23:23–44. [DOI] [PubMed] [Google Scholar]
- 11.Bruce CR, Fetter JE, Blumenthal-Barby JS. Cascade effects in critical care medicine: a call for practice changes. Am J Respir Crit Care Med. 2013. Dec 15;188(12):1384–5. [DOI] [PubMed] [Google Scholar]
- 12.Blood Specimen Collection Tube Shortage: FAQs [Internet]. U.S. Food and Drug Administration. 2022. [cited 2022 May 17]. Available from: https://www.fda.gov/medical-devices/coronavirus-covid-19-and-medical-devices/blood-specimen-collection-tube-shortage-frequently-asked-questions [Google Scholar]
- 13.Hiscock H, Neely RJ, Warren H, Soon J, Georgiou A. Reducing Unnecessary Imaging and Pathology Tests: A Systematic Review. Pediatrics. 2018. Feb;141(2). [DOI] [PubMed] [Google Scholar]
- 14.Miyakis S, Karamanof G, Liontos M, Mountokalakis TD. Factors contributing to inappropriate ordering of tests in an academic medical department and the effect of an educational feedback strategy. Postgrad Med J. 2006. Dec;82(974):823–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Delgado-Corcoran C, Bodily S, Frank DU, Witte MK, Castillo R, Bratton SL. Reducing blood testing in pediatric patients after heart surgery: a quality improvement project. Pediatr Crit Care Med. 2014. Oct;15(8):756–61. [DOI] [PubMed] [Google Scholar]
- 16.Stammen LA, Stalmeijer RE, Paternotte E, Oudkerk Pool A, Driessen EW, Scheele F, et al. Training Physicians to Provide High-Value, Cost-Conscious Care: A Systematic Review. JAMA. 2015. Dec 8;314(22):2384–400. [DOI] [PubMed] [Google Scholar]
- 17.Tchou MJ, Tang Girdwood S, Wormser B, Poole M, Davis-Rodriguez S, Caldwell JT, et al. Reducing Electrolyte Testing in Hospitalized Children by Using Quality Improvement Methods. Pediatrics. 2018. May;141(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Levinson W, Kallewaard M, Bhatia RS, Wolfson D, Shortt S, Kerr EA, et al. “Choosing Wisely”: a growing international campaign. BMJ Qual Saf. 2015. Feb;24(2):167–74. [DOI] [PubMed] [Google Scholar]
- 19.Eaton KP, Levy K, Soong C, Pahwa AK, Petrilli C, Ziemba JB, et al. Evidence-Based Guidelines to Eliminate Repetitive Laboratory Testing. JAMA Intern Med. 2017. Dec 1;177(12):1833–9. [DOI] [PubMed] [Google Scholar]
- 20.Algaze CA, Wood M, Pageler NM, Sharek PJ, Longhurst CA, Shin AY. Use of a Checklist and Clinical Decision Support Tool Reduces Laboratory Use and Improves Cost. Pediatrics. 2016. Jan;137(1). [DOI] [PubMed] [Google Scholar]
- 21.Feldman LS, Shihab HM, Thiemann D, Yeh HC, Ardolino M, Mandell S, et al. Impact of providing fee data on laboratory test ordering: a controlled clinical trial. JAMA Intern Med. 2013. May 27;173(10):903–8. [DOI] [PubMed] [Google Scholar]
- 22.Felcher AH, Gold R, Mosen DM, Stoneburner AB. Decrease in unnecessary vitamin D testing using clinical decision support tools: making it harder to do the wrong thing. J Am Med Inform Assoc. 2017. Jul 1;24(4):776–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pageler NM, Franzon D, Longhurst CA, Wood M, Shin AY, Adams ES, et al. Embedding time-limited laboratory orders within computerized provider order entry reduces laboratory utilization. Pediatr Crit Care Med. 2013. May;14(4):413–9. [DOI] [PubMed] [Google Scholar]
- 24.Jun T, Kwang H, Mou E, Berube C, Bentley J, Shieh L, et al. An Electronic Best Practice Alert Based on Choosing Wisely Guidelines Reduces Thrombophilia Testing in the Outpatient Setting. J Gen Intern Med. 2019. Jan;34(1):29–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Klunk CJ, Barrett RE, Peterec SM, Blythe E, Brockett R, Kenney M, et al. An Initiative to Decrease Laboratory Testing in a NICU. Pediatrics. 2021. Jul;148(1). [DOI] [PubMed] [Google Scholar]
- 26.Silvestri MT, Bongiovanni TR, Glover JG, Gross CP. Impact of price display on provider ordering: A systematic review. J Hosp Med. 2016. Jan;11(1):65–76. [DOI] [PubMed] [Google Scholar]
- 27.Cliff BQ, Avanceña ALV, Hirth RA, Lee SYD. The Impact of Choosing Wisely Interventions on Low-Value Medical Services: A Systematic Review. Milbank Q. 2021. Dec;99(4):1024–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rabbani N The Labogram: A data analysis framework for characterizing and reducing redundant laboratory testing in pediatric hospital medicine. American Medical Informatics Association Clinical Informatics Conference; 2022 May 26; Houston, Texas. [Google Scholar]
- 29.Xu S, Hom J, Balasubramanian S, Schroeder LF, Najafi N, Roy S, et al. Prevalence and Predictability of Low-Yield Inpatient Laboratory Diagnostic Tests. JAMA Netw Open. 2019. Sep 4;2(9):e1910967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Palinkas LA, Horwitz SM, Green CA, Wisdom JP, Duan N, Hoagwood K. Purposeful Sampling for Qualitative Data Collection and Analysis in Mixed Method Implementation Research. Adm Policy Ment Health. 2015. Sep;42(5):533–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Morse JM. The Significance of Saturation. Qual Health Res. 1995. May 1;5(2):147–9. [Google Scholar]
- 32.Lewis C Using the “thinking-aloud” method in cognitive interface design [Internet]. Yorktown Heights: IBM TJ Watson Research Center; 1982. Available from: https://dominoweb.draco.res.ibm.com/reports/RC9265.pdf [Google Scholar]
- 33.Brooke J SUS: A “quick and dirty” usability scale. In: Usability Evaluation In Industry. Taylor & Francis; 1996. p. 189–94. [Google Scholar]
- 34.Brooke J SUS: a retrospective. J Usability Studies. 2013. Feb 1;8(2):29–40. [Google Scholar]
- 35.Qualitative Methods in Rapid Turn-Around Health Services Research [Internet]. 2022. [cited 2022 Feb 18]. Available from: https://www.hsrd.research.va.gov/for_researchers/cyber_seminars/archives/video_archive.cfm?SessionID=780
- 36.Hamilton AB, Finley EP. Qualitative methods in implementation research: An introduction. Psychiatry Res. 2019. Oct;280:112516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bangor A, Kortum PT, Miller JT. An Empirical Evaluation of the System Usability Scale. International Journal of Human–Computer Interaction. 2008. Jul 29;24(6):574–94. [Google Scholar]
- 38.Kortum PT, Bangor A. Usability Ratings for Everyday Products Measured With the System Usability Scale. International Journal of Human–Computer Interaction. 2013. Jan 1;29(2):67–76. [Google Scholar]
- 39.Rabbani N, Kim GYE, Suarez CJ, Chen JH. Applications of machine learning in routine laboratory medicine: Current state and future directions. Clin Biochem. 2022. May;103:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Houben PHH, Winkens RAG, van der Weijden T, Vossen RCRM, Naus AJM, Grol RPTM. Reasons for ordering laboratory tests and relationship with frequency of abnormal results. Scand J Prim Health Care. 2010. Mar;28(1):18–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sedrak MS, Patel MS, Ziemba JB, Murray D, Kim EJ, Dine CJ, et al. Residents’ self-report on why they order perceived unnecessary inpatient laboratory tests. J Hosp Med. 2016. Dec;11(12):869–72. [DOI] [PubMed] [Google Scholar]
- 42.Krasowski MD, Chudzik D, Dolezal A, Steussy B, Gailey MP, Koch B, et al. Promoting improved utilization of laboratory testing through changes in an electronic medical record: experience at an academic medical center. BMC Med Inform Decis Mak. 2015. Feb 22;15:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Melnick ER, Dyrbye LN, Sinsky CA, Trockel M, West CP, Nedelec L, et al. The Association Between Perceived Electronic Health Record Usability and Professional Burnout Among US Physicians. Mayo Clin Proc. 2020. Mar;95(3):476–87. [DOI] [PubMed] [Google Scholar]
- 44.Boland MV, Lehmann HP. A new method for determining physician decision thresholds using empiric, uncertain recommendations. BMC Med Inform Decis Mak. 2010. Apr 8;10:20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ebell MH, Locatelli I, Senn N. A novel approach to the determination of clinical decision thresholds. Evid Based Med. 2015. Apr;20(2):41–7. [DOI] [PubMed] [Google Scholar]
- 46.Wynants L, van Smeden M, McLernon DJ, Timmerman D, Steyerberg EW, Van Calster B, et al. Three myths about risk thresholds for prediction models. BMC Med. 2019. Oct 25;17(1):192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Olakotan OO, Yusof MM. Evaluating the alert appropriateness of clinical decision support systems in supporting clinical workflow. J Biomed Inform. 2020. Jun;106:103453. [DOI] [PubMed] [Google Scholar]
- 48.Patel BS, Steinberg E, Pfohl SR, Shah NH. Learning decision thresholds for risk stratification models from aggregate clinician behavior. J Am Med Inform Assoc. 2021. Sep 18;28(10):2258–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lidbury BA, Richardson AM, Badrick T. Assessment of machine-learning techniques on large pathology data sets to address assay redundancy in routine liver function test profiles. Diagnosis (Berl). 2015. Feb 1;2(1):41–51. [DOI] [PubMed] [Google Scholar]
- 50.Luo Y, Szolovits P, Dighe AS, Baron JM. Using Machine Learning to Predict Laboratory Test Results. Am J Clin Pathol. 2016. Jun 21;145(6):778–88. [DOI] [PubMed] [Google Scholar]
- 51.Aikens RC, Balasubramanian S, Chen JH. A Machine Learning Approach to Predicting the Stability of Inpatient Lab Test Results. AMIA Jt Summits Transl Sci Proc. 2019. May 6;2019:515–23. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The interview transcripts generated and analyzed during the study are not publicly available in order to protect the privacy of study participants. However, our interview guide is included in the supplemental materials. The de-identified EHR data analyzed during the study are provided by the Stanford Medicine Research Data Repository and are not publicly available due to compliance restrictions from our institution around the dissemination of high risk de-identified medical data. Aggregated calculations used to derive our prediction model are available in the supplemental material. Similarly, study code is publicly viewable at our research group’s github page https://github.com/HealthRex/CDSS/tree/master/scripts/Labogram.
