Abstract
Objective
Thyroid function tests are frequently overused. This systematic review aims to summarise the effectiveness of behaviour change interventions to reduce low-value thyroid testing and to identify theoretical foundations and contextual factors associated with their success.
Design
We conducted a comprehensive search of Medline, Embase, Scopus, and the Cochrane Library for randomised and non-randomised controlled trials as well as before-and-after studies. We followed PRISMA guidelines, critically appraised study quality, and applied the GRADE approach to assess certainty of evidence. We categorised interventions as soft (education, reminders, feedback, guidelines) or structural (change in funding, clinical decision support systems).
Results
We included 47 studies (54 interventions) including five randomised trials. Structural interventions, particularly clinical decision support systems, were the most common (n = 28). Most interventions reported a reduction in low-value thyroid testing (n = 52), with 40 of them having effects ≥ 20%. However, the certainty of evidence was very low to moderate. Among 49 interventions assessing volume reduction (test rates, expenditure), only two reported increased test rates. All 24 studies that measured improvement of care (appropriateness, shift in ordering pattern, coefficient of variation among physicians) indicated positive developments. Only four interventions referenced theoretical foundations or contextual factors.
Conclusions
Structural interventions, especially clinical decision support systems, were most effective in reducing thyroid testing. While most interventions showed positive effects, the certainty of evidence remains limited, highlighting the need for more high-quality studies to support robust clinical practice changes. Our results may inform targeted interventions to reduce low-value thyroid testing at national, regional, and local levels.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13643-026-03119-8.
Keywords: Systematic review, Thyroid function tests, Low-value care, De-implementation, Clinical decision support systems
Introduction
Thyroid function tests (TFTs) rank among the most frequently ordered laboratory tests worldwide [1], particularly thyroid-stimulating hormone (TSH) tests for diagnosing and managing thyroid disorders [2]. Alongside TSH, free hormone tests (free T3, T4) are often included in thyroid panels, further increasing the number of TFTs conducted [3]. For instance, approximately 10 million TFTs are performed annually in the UK at an estimated cost of £30 million [4, 5]. In Germany, about 30% of the adult population undergo thyroid function testing each year [6].
Despite the high volume of TFTs, the prevalence of thyroid dysfunction in Europe is only 3.8%, with 259.1 new cases per 100,000 people per year [7]. This discrepancy has raised concerns about the potential overdiagnosis and unnecessary treatment of conditions such as subclinical, asymptomatic hypothyroidism, which often resolves without intervention [8]. Unnecessary testing for suspected thyroid diseases appears to be widespread, frequently leading to additional and potentially avoidable medical procedures [9].
Clinical practice guidelines recommend against routine TSH screening for asymptomatic adults without known thyroid disease (consensus-based evidence, [10, 11]). Additionally, Free T3 or T4 tests are not advised for screening hypothyroidism unless pituitary or hypothalamic disease is suspected or known (consensus-based evidence, [12, 13]). Instead, TSH measurement is the most reliable method for detecting common forms of hypothyroidism and hyperthyroidism [14–16]. Despite the Choosing Wisely (CW) initiative’s aims to implement these guidelines, significant challenges remain in effectively translating them into practice.
Recent research identified a considerable amount of low-value care in German claims data, with 24.8% to 35.5% of TFTs deemed inappropriate [17, 18]. A bi-national study from Canada and the UK showed that approximately one-third of adult patients without a clear indication for testing underwent at least one TSH test within a 2-year period [19]. Similarly, a study from France suggests that inappropriate TFT ordering remains common [20]. TFTs therefore represent an emblematic example of low-value diagnostic testing, characterised by high baseline use, limited alignment with disease prevalence, and the potential to trigger downstream testing cascades. De-implementation of such unnecessary tests is frequently discussed in international literature [21]. Several systematic reviews have shown that targeted de-implementation strategies can effectively reduce low-value care [6]. Various interventions, including educational programmes for healthcare providers [22, 23], reminders [24, 25], decision support systems [26, 27], and stricter regulatory policies [28, 29], have been proposed to reduce unnecessary TFTs in different healthcare settings. These interventions can optimise clinical laboratory stewardship, contribute to cost savings, and improve healthcare resource allocation [3, 30].
Theoretical frameworks are essential for informing interventions aimed at de-implementing low-value care by identifying key elements that need to be addressed [31]. For instance, the Theoretical Domains Framework (TDF) explores barriers and facilitators for behaviour change [32], while the Choosing Wisely De-Implementation Framework systematically reduces low-value care [33]. A systematic review published by Zhelev et al. in 2016 provided an overview of behaviour change interventions to reduce the volume of TFTs ordered [34]. However, due to the studies’ poor methodological quality and reporting, strong conclusions could not be drawn, nor could specific intervention types be recommended. Furthermore, the theoretical foundations and contextual factors that contribute to the successful implementation of the interventions have not yet been thoroughly investigated. Since the publication of that review, a substantial number of new research related to interventions aimed at reducing the ordering of TFTs has emerged. In particular, as the latest studies included in the earlier review were published in 2014, before the widespread adoption of digital ordering systems and the increasing use of digital interventions, technological developments also need to be considered. In light of both the limitations of the earlier review and the growing body of relevant research, we are conducting a new systematic review, building on the work of Zhelev et al. [34].
Our review aims to identify effective strategies and their contexts by examining a series of interventions targeting the reduction of TFT orders. We address the following research questions (RQ):
What is the effectiveness of behaviour change interventions in reducing the ordering of TFTs?
Which theoretical foundations are used to explain the mechanisms underlying the interventions and which contextual factors are associated with the success of interventions aimed at improving evidence-based thyroid diagnostics?
In doing so, we build on the scope of the previous review: RQ1 was retained for continuity, while RQ2 was newly introduced to enhance the understanding of how and why interventions may work in different settings, drawing on information available in the included studies.
Methods
We first manually searched for well-conducted systematic reviews on the same topic and identified one published in 2016 [34]. The review was assessed for methodological quality and risk of bias, showing moderate quality according to the AMSTAR 2 tool [35, 36] and a low risk of bias (RoB) evaluated by the ROBIS tool (RoB In systematic reviews, [37]). Assessments are available in Additional files 1 and 2.
Our review was guided by the Cochrane methodology and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement [38–40]. The PRISMA checklist is provided in Additional file 3.
Registration
We prospectively registered the review in PROSPERO (CRD42023492441). Changes to the information provided at registration are reported in Additional file 4.
Data sources, searches, and selection
We conducted a systematic literature search in line with the previous review by Zhelev et al. [34], using the same databases: MEDLINE (Ovid), Embase (Ovid), and the Cochrane Central Register of Controlled Trials (CENTRAL, the Cochrane Library). Searches were performed on the 21 st of November 2023. To increase comprehensiveness, we extended the methodology by also searching Scopus, screening the first 300 references from Google Scholar, hand-searching the reference lists of included articles, and contacting experts in the field to identify any additional relevant studies. We limited the search period for MEDLINE, Embase, and CENTRAL to articles published between the 1 st of January 2014 and the 21 st of November 2023, as articles published before 2014 were already screened for eligibility in the earlier review [34]. For Scopus and Google Scholar, no publication date limits were applied, as these databases were not included in the earlier review. We performed an update of the search on the 7th of July 2024.
We applied the same search strategy as Zhelev et al. [34], consisting of search terms related to (1) thyroid function tests and (2) inappropriate testing (Additional file 5). We included studies involving adults receiving TFTs in inpatient, outpatient, or emergency care settings. Eligible interventions comprised behaviour change interventions targeting physicians. Studies were required to compare these interventions to usual care and report outcomes related to test volume, appropriateness, costs, or patient health. We included randomised controlled trials (RCTs), non-randomised controlled studies, and before-and-after studies providing comparative data. We excluded studies that lacked a comparator, did not provide relevant outcome data, focused only on thyroid tests as part of a broader panel without disaggregated results, or were cross-sectional studies, opinion pieces, dissertations, or meeting abstracts. Only studies published in English or German were considered. All inclusion and exclusion criteria for assessing the effectiveness of the interventions (RQ1) followed those of the prior review and are outlined in Table 1.
Table 1.
Inclusion and exclusion criteria of studies based on the PICO framework (Population, Intervention, Control, Outcome)
| Attribute | Inclusion criteria | Exclusion criteria |
|---|---|---|
| Population | Adults receiving thyroid function tests | |
| Intervention |
Behaviour change intervention types [41, 42]: Educational interventions Guideline and protocol development and implementation Changes to funding policy Reminders of existing guidelines and protocols Clinical decision support systems, including test request forms and computer-based decision support Audit and feedback |
|
| Control | Usual care | |
| Outcome |
Change in the total number of thyroid function tests Number of inappropriately ordered thyroid function tests Test-related expenditure Health benefits to individual patients |
Studies encompassing ordered thyroid function tests along with other laboratory tests but reporting only the average effect (across all tests) |
| Study type |
Randomised controlled trials Non-randomised controlled studies Before-and-after studies providing comparative data on at least one of the outcome measures |
Cross-sectional studies Editorials and opinions Studies without comparative data Dissertations and meeting abstracts |
| Setting | Inpatient care, outpatient care, emergency department | |
| Language | English or German | All other languages |
To address our additional RQ2, we supplemented the prior methodology by searching grey literature and conducting targeted searches for additional publications by the authors of the included interventions. This was done to determine whether potential contextual factors and theoretical backgrounds were described in the development and implementation of the interventions under review. We imported all search results into Endnote software and removed any duplicates before proceeding to the screening phase [43]. Based on the pre-defined eligibility criteria, two review authors (CP, MH) independently conducted title-abstract and full-text screening. Discrepancies were resolved by consensus and by consulting a third researcher (VV). No automation tools were used in the process.
Data extraction and analysis
Data from the included studies were extracted by means of piloted data-extraction tables by two independent researchers (CP, MH). A list of all data items and detailed descriptions can be found in Additional file 6. Any discrepancies between the researchers were discussed until consensus was reached. For all identified RCTs, we contacted study authors via email in case of missing study protocols.
We used relative change (improvement/deterioration) as the standardised outcome metric, chosen due to the heterogeneity in outcomes and reporting. Following the approach outlined by Zhelev et al. [34], we set a threshold of ± 20% to interpret a relative change as large, indicating a substantial change in testing behaviour rather than minor variation. The outcomes included (1) changes in the total number of thyroid function tests, (2) test-related expenditure, (3) the number of inappropriately ordered thyroid function tests (appropriateness), (4) the pattern of ordering, and (5) the coefficient of variation (CoV) among physicians. We extracted the direction of the effect (positive/negative subject to the desired outcome measure). Confidence intervals and differences in means were extracted when reported. All calculated values are indicated as such.
We refrained from conducting a meta-analysis due to anticipated clinical heterogeneity among the studies. Specifically, we expected variability in intervention types, timing, and outcome measures, such as shifts towards TSH ordering, test volumes per population unit, or counts of laboratory tests per provider. Given these anticipated differences, a quantitative synthesis was not planned. Instead, we employed a narrative synthesis method. The analysis framework relied on an existing typology of behaviour change intervention types [41, 42], encompassing the following categories: (1) clinical decision support systems (CDSS), (2) changes to funding policy, (3) educational interventions, (4) reminders of existing guidelines and protocols, (5) guideline and protocol development and implementation, and (6) audit and feedback [34]. Furthermore, we introduced a grouping for the interventions and outcomes to support descriptive analysis and improve clarity of reporting (visualised in Additional file 7). We grouped the six types of interventions into structural and soft interventions based on their influence on physician behaviour. While we were guided by terminologies used in previous literature (active and soft [44], strict and soft [45], structural [46]), we developed the final categorisation to best reflect the characteristics of the interventions identified. The five outcome measures were divided into two categories: volume reduction (test rates, expenditure) and improvement of care (appropriateness, pattern, CoV).
Structural interventions directly influence physicians at the point of care and are integrated into their routine workflow, making them difficult to bypass. These interventions include changes in funding, where modifications to financial structures or incentives directly affect decision-making, and CDSS, which are embedded tools that directly guide or restrict physicians' choices during patient care. CDSS include alerts, changes in the existing order form, cost displays and reflex testing, respectively, as automatic discharge of tests. Soft interventions are those that provide physicians with informational resources or guidance outside of the immediate care setting, i.e. reminders, education, guidelines/protocols, and audit/feedback. These interventions are less directly tied to the point of care and may require active engagement from physicians to be used effectively, such as through educational meetings or reminder messages. They may also include tools that are not necessarily part of the standard working routine, like memorandum pocket cards or guidelines.
For RQ2, we extracted information on theoretical foundations and contextual factors when explicitly reported in the included studies or related publications. However, reporting was sparse and inconsistent, leaving little scope for synthesis or categorisation. Therefore, findings are presented narratively, with reference to specific theoretical models or quality improvement frameworks where available.
Bias assessment and certainty of evidence
We used the Cochrane RoB 2.0 tool (RoB 2) for (cluster) RCTs and the ROBINS-I tool (RoB In Non-randomised Studies—of Interventions) for non-randomised studies of interventions (NRSI [47, 48]). RoB figures were created for each outcome domain separately using the robvis application [49].
The interventions in our review were evaluated using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach, following Murad et al.’s guide for complex interventions [50, 51]. We graded the evidence both by outcome—assessing the overall effectiveness of intervention bundles from a policymaker/payer perspective—and by grouped interventions and outcomes. This dual grading approach aimed to inform policymakers on the efficacy of intervention bundles and to guide practitioners on the most effective components for patient care. To minimise potential publication bias, we expanded the databases included in the initial review, performed a grey literature search, and looked into registered trials (EU Clinical Trials Register clinicaltrialsregister.eu/ctr-search/search, WHO trialsearch.who.int/Default.aspx). All assessments were performed independently by two reviewers (CP and MH). Any disagreements were resolved by consensus involving a third reviewer (GG).
Results
Study selection
From 2782 records screened, we identified 21 additional studies beyond those reported by Zhelev et al. [34], leading to a total of 47 unique studies included in this review. The updated search revealed no additional studies (213 new records screened). All identified studies were completed. Most of the identified studies used a before-and-after design (n = 34, including time series analysis), followed by nine non-randomised controlled studies and five (cluster) RCTs, two of which had a registration record. We contacted the authors of the remaining three studies and received a response from one of them. The study selection process is presented in Fig. 1. A list of all articles excluded after full-text screening and reasons for exclusion can be found in Additional file 8.
Fig. 1.
PRISMA flowchart of the systematic literature search. TFTs, thyroid function tests
Study characteristics
Most studies were conducted in the USA (n = 16), Canada (n = 8), and the UK (n = 6). They were published between 1979 and 2022. The majority of the studies were conducted in outpatient (n = 21) and inpatient care (n = 14). The remainder were conducted in both or in emergency departments (n = 12, Tables 2 and 3). Most of the studies were performed at a single medical site (n = 29, Table 2).
Table 2.
Characteristics of the included studies
| Study and country | Year | Study design | Setting | Target tests | Thyroid tests |
|---|---|---|---|---|---|
| Studies identified in present review (n = 21) | |||||
| Bateman et al., Canada [52] | 2019 | Before and after, single site | Inpatient rehabilitation centre | TSH + Vitamin D | TSH |
| Bejjanki et al., USA [53] | 2018 | Before and after, single site | Academic medical centre | 17 laboratory tests | TSH, FT4 |
| Bellodi et al., Italy [54] | 2017 | Controlled study, multiple sites | 3 hospitals (10 wards) | 8 laboratory tests | FT4 |
| Bradshaw et al., USA [55] | 2021 | Before and after, single site | Academic Medical Centre | TFTs only | TSH, FT3, FT4 |
| Caldarelli et al., Italy [56] | 2017 | Before and after, single site | Hospital (clinical laboratory) | TFTs only | TSH, FT3, FT4 |
| Chami et al., Canada [45] | 2021 | Time series analysis with control group, multiple sites | Outpatient laboratories | 8 laboratory tests | TSH |
| Dalal et al., USA [57] | 2017 | Before and after, single site | Urban teaching hospital | TFTs only | TSH, FT3, FT4 |
| Delvaux et al., Belgium [58] | 2020 | Cluster RCT | 72 primary care centres | 17 laboratory tests | TSH |
| Elrewini et al., Saudi Arabia [59] | 2022 | Before and after, single site | Armed Forces Hospital | TFTs only | TSH |
| Gilmour et al., Canada [30] | 2017 | Before and after, single site | Academic ambulatory hospital | TFTs only | TSH, FT3, FT4 |
| Janssens et al., Netherlands [44] | 2015 | Before and after, single site | General care and teaching hospital | 82 laboratory tests | TSH, FT4 |
| Krouss et al., USA [60] | 2022 | Interrupted time series, multiple sites | 11 hospitals and over 70 ambulatory centres | TFTs only | T3, FT3 |
| Leis et al., Canada [61] | 2019 | Before and after, single site | Coronary Care Unit (teaching hospital) | TFTs only | TSH |
| Leung et al., USA [62] | 2017 | Before and after, single site | Resident clinic in an outpatient clinic | 70 laboratory tests | TSH, FT3, FT4 |
| MacPherson et al., Australia [63] | 2005 | Before and after, single site | Pre-admission clinic | 8 pathology tests + various investigations | TFTs (unspecified) |
| Muris et al., Netherlands [64] | 2021 | Before and after, multiple sites | 57 general practices | 22 laboratory tests | TSH, FT4 |
| Notas et al., Greece [65] | 2018 | Before and after, single site | Tertiary teaching hospital | TFTs only | TSH, FT3, FT4, TGAb TPOAb |
| Salinas et al., Spain [66] | 2016 | Before and after, multiple sites | Public University Hospital + 9 primary care centres | 13 laboratory tests | TSH, FT4 |
| Sue et al., USA [67] | 2019 | Before and after, multiple sites | All outpatients within urban tertiary/quaternary care academic health system | TFTs only | T3 |
| Taher et al., Canada [68] | 2020 | Before and after, single site | Tertiary hospital | TFTs only | FT3, FT4 |
| Wintemute et al., Canada [69] | 2019 | Controlled study, multiple sites | 6 family health teams | TFTs only | TSH |
| Studies included in previous review (n = 27) | |||||
| Adlan et al., UK [70] | 2011 | Before and after, single site | Medical Assessment Unit (hospital, acutely ill patients) | TFTs only | TFTs (TSH, FT4, TPOAb, TRAb) |
| Baker et al., UK [71] | 2003 | Cluster RCT | 33 general practices | 5 laboratory test groups | TFTs (TSH, FT4) |
| Berwick and Coltin, USA [23] | 1986 | Controlled cross-over, multiple sites | 3 ambulatory centres (same health maintenance organization) | 13 laboratory and imaging tests | T4 |
| Chu et al., Australia [72] | 2013 | Before and after, single site | tertiary teaching hospital emergency department | 23 laboratory tests | TFTs (unspecified) |
| Cipullo and Mostoufizadeh, USA [73] | 1996 | Before and after, single site | Community hospital | 20 laboratory tests and preoperative testing (unspecified) | TFTs (T3RU) |
| Daucourt et al., France [25] | 2000 | Cluster RCT | General and psychiatric hospitals | TFTs only | TFTs (TSH, FT3, FT4, TRH) |
| Dowling et al., USA [74] | 1989 | Before and after, single site | Inner city community health centre | TSH + CBC | TSH |
| Emerson and Emerson, USA [26] | 2001 | Before and after, single site | University medical centre | All laboratory tests | TSH, T3, FT4, T4, FTI/T3RU |
| Feldkamp and Carey, USA [75] | 1996 | Before and after, single site | Metropolitan hospital and 22 satellite clinics | TFTs only | TSH, T3, T4, FTI/T3RU |
| Gama et al., UK [76] | 1991 | Controlled study, single site | District general hospital | 5 laboratory test groups + others | TFTs (TSH, FT4) |
| Grivell et al., Australia [77] | 1981 | Before and after, single site | Tertiary-care community hospital | 55 laboratory tests | T4 |
| Hardwick et al., Canada [29] | 1982 | Before and after, multiple sites | Outpatient laboratories | TFTs only | T3, T4, ETR |
| Horn et al., USA [78] | 2013 | Interrupted time series with control group, multiple sites | Alliance of 5 multispecialty group practices | 27 laboratory tests | TSH |
| Larsson et al., Sweden [22] | 1999 | Before and after, multiple sites | 19 primary care centres | 14 laboratory test groups | TSH, T3, FT4 |
| Mindemark and Larsson, Sweden (follow up) [79] | 2009 | Before and after, multiple sites | 16 primary healthcare centres | 12 laboratory test groups | TSH, T3, T4, FT4 |
| Nightingale et al., UK [80] | 1994 | Before and after, single site | Supra-regional liver unit (teaching hospital) | Various laboratory tests | TSH |
| Rhyne and Gehlbach, USA [81] | 1979 | Before and after, single site | Family Medicine group practice | TFTs only | TFTs (T3RU and T4) |
| Schectman et al., USA [82] | 1991 | Controlled study, single site | Primary care health maintenance organization practice | TFTs only | TFTs (TSH, T3-RIA, T3RU, T4) |
| Stuart et al., Australia [83] | 2002 | Before and after, single site | Public hospital emergency department | 14 laboratory and 10 imaging tests | TFTs (unspecified) |
| Thomas et al., UK [24] | 2006 | Cluster RCT | 85 primary-care practices | 9 laboratory tests | TSH |
| Tierney et al., USA [27] | 1988 | RCT, single site | Academic internal medicine practice | 8 laboratory tests | TSH |
| Tomlin et al., New Zealand [84] | 2011 | Controlled study, multiple sites | New Zealand primary care | 8 laboratory tests | TSH, FT3, FT4 |
| Toubert et al., France [85] | 2000 | Before and after, single site | Teaching hospital | TFTs only | TSH, FT3, FT4, TPOAb, TGAb, TRAb |
| Van Walraven et al., Canada [28] | 1998 | Interrupted time series, multiple sites | All clinical laboratories (not based in hospitals) | 7 laboratory tests | TSH, T3RU, T4 |
| Vidal-Trecan et al., France [86] | 2003 | Before and after, multiple sites | 50 university hospitals | TFTs only | TSH, T3, FT3, T4, FT4 |
| Willis and Datta, UK [87] | 2013 | Before and after, single site | Medical admissions unit in a district general hospital | 3 laboratory test groups | TFTs (unspecified) |
| Wong et al., USA [88] | 1983 | Controlled study, single site | University teaching hospital | 6 laboratory tests | TSH, T3RU,T3-RIA, T4-RIA |
CBC complete blood count, ETR effective thyroxine ratio, FT4 free thyroxine, FT3 free triiodothyronine, FTI free T4 index or free thyroxine index, RCT randomised controlled trial, RIA radioimmunoassay, TFTs thyroid function tests, TGAb thyroglobulin antibodies, TPOAb thyroid peroxidase antibodies, TSH thyroid stimulating hormone (thyrotropin), T3 triiodothyronine, T3RU triiodothyronine resin uptake, TRH TSH-releasing hormone, TRAb TSH-receptor antibodies
Table 3.
Interventions of the included studies
| Study and country | Setting | Soft interventions | Structural interventions | |||||
|---|---|---|---|---|---|---|---|---|
| Audit and feedback | Educational programmes | Guidelines and protocols | Reminders | Changes to funding | CDSS | Description of decision tool | ||
| Studies identified in present review (n = 21) | ||||||||
| Bateman et al., Canada [52] | Inpatient | X | X | - | ||||
| Bejjanki et al., USA [53] | Inpatient | X | Alert | |||||
| Bellodi et al., Italy [54] | Inpatient | X | Alert | |||||
| Bradshaw et al., USA [55] | Inpatient + ED | X | X | X | Alert | |||
| Caldarelli et al., Italy [56] | Outpatient | X | Reflex/Discharge | |||||
| Chami et al., Canada [45] | Outpatient | X | Change of order form | |||||
| Dalal et al., USA [57] | Inpatient | X | Reflex/Discharge | |||||
| Delvaux et al., Belgium [58] | Outpatient | X | Alert | |||||
| Elrewini et al., Saudi Arabia [59] | Unspecified | X | X | X | Alert | |||
| Gilmour et al., Canada [30] | Outpatient | X | X | Reflex/Discharge | ||||
| Janssens et al., Netherlands [44] | Inpatient + outpatient | X | - | |||||
| Krouss et al., USA [60] | Inpatient + outpatient | X | Alert | |||||
| Leis et al., Canada [61] | Inpatient | X | Change of order form | |||||
| Leung et al., USA [62] | Inpatient | X | X | - | ||||
| MacPherson et al., Australia [63] | Inpatient | X | X | Change of order form | ||||
| Muris et al., Netherlands [64] | Outpatient | X | Cost display | |||||
| Notas et al., Greece [65] | Inpatient + outpatient | X | Reflex/Automatic discharge | |||||
| Salinas et al., Spain [66] | Inpatient + outpatient | X | Reflex/Automatic discharge | |||||
| Sue et al., USA [67] | Outpatient | X | Alert | |||||
| Taher et al., Canada [68] | Inpatient + outpatient | X | Change of order form + Reflex/Automatic discharge | |||||
| Wintemute et al., Canada [69] | Outpatient | X | X | X | - | |||
| Studies included in previous review (n = 27) | ||||||||
| Adlan et al., UK [70] | Inpatient | X | - | |||||
| Baker et al., UK [71] | Outpatient | X | X | - | ||||
| Berwick and Coltin, USA PCF$ [23] | Outpatient | X | - | |||||
| Berwick and Coltin, USA PCFY [23] | Outpatient | X | - | |||||
| Berwick and Coltin, USA TSE [23] | Outpatient | X | - | |||||
| Chu et al., Australia [72] | ED | X | Change of order form | |||||
| Cipullo and Mostoufizadeh, USA [73] | Inpatient | X | - | |||||
| Daucourt et al., France MPC [25] | Inpatient | X | - | |||||
| Daucourt et al., France TRF [25] | Inpatient | X | Change of order form | |||||
| Daucourt et al., France Both [25] | Inpatient | X | X | Change of order form | ||||
| Dowling et al., USA [74] | Outpatient | X | X | - | ||||
| Emerson and Emerson, USA [26] | Outpatient | X | Change of order form | |||||
| Feldkamp and Carey, USA [75] | Inpatient + outpatient | X | Reflex/Automatic discharge | |||||
| Gama et al., UK [76] | Inpatient + outpatient | X | - | |||||
| Grivell et al., Australia [77] | Inpatient | X | - | |||||
| Hardwick et al., Canada [29] | Outpatient | X | X | - | ||||
| Horn et al., USA [78] | Outpatient | X | Cost display | |||||
| Larsson et al., Sweden [22] | Outpatient | X | - | |||||
| Mindemark and Larsson, Sweden (follow up) [79] | Outpatient | X | - | |||||
| Nightingale et al., UK [80] | Inpatient | X | X | X | Change of order form | |||
| Rhyne and Gehlbach, USA [81] | Outpatient | X | X | - | ||||
| Schectman et al., USA Reminder + Feedback [82] | Outpatient | X | X | X | - | |||
| Schectman et al., USA Reminder [82] | Outpatient | X | X | - | ||||
| Stuart et al., Australia [83] | ED | X | X | X | - | |||
| Thomas et al., UK Feedback [24] | Outpatient | X | - | |||||
| Thomas et al., UK Reminder [24] | Outpatient | X | - | |||||
| Thomas et al., UK Both [24] | Outpatient | X | X | - | ||||
| Tierney et al., USA [27] | Outpatient | X | Alert | |||||
| Tomlin et al., New Zealand [84] | Outpatient | X | X | X | - | |||
| Toubert et al., France [85] | Inpatient + outpatient | X | X | - | ||||
| Van Walraven et al., Canada [28] | Outpatient | X | X | X | Change of order form | |||
| Vidal-Trecan et al., France [86] | Inpatient + outpatient | X | X | X | Change of order form | |||
| Willis and Datta, UK [87] | Inpatient | X | X | - | ||||
| Wong et al., USA [88] | Inpatient | X | X | Change of order form | ||||
Multiple interventions per study (indicated in study column) or multiple types of intervention per intervention (multi-component interventions) possible. Alerts including best practice pop-ups, test suggestions or information (e.g. test rates, disease probabilities)
CDSS clinical decision support system, ED emergency department, MPC memorandum pocket card, TRF test request form, TSE test-specific education, PCFY peer comparison feedback on yield of tests, PCF$ peer comparison feedback on cost of test use
The average total observation period of the studies was 25 months, with an average of 12-month preintervention phase and a 14-month intervention/postintervention period (additional information on study characteristics is provided in Additional file 9). In total, 19 studies targeted TFTs only, while the remaining 28 studies aimed to reduce a broader number of laboratory and imaging tests. A total of 54 distinct interventions were performed within the 47 studies (Table 2).
While the previous review identified soft interventions (education, guidelines/protocols, reminders, and audit/feedback) as the most common types [34], our review revealed that most of the studies introduced structural interventions (CDSS and changes in funding, n = 30). Of these, only two studies (no newly identified studies) assessed changes to funding (Table 3). Therefore, CDSS is the most widely used type (n = 28), with 17 of the 21 newly identified studies employing CDSS. The most common CDSS involved changes of the order form (n = 12), alerts (n = 8), or reflex with automatic discharge (n = 7). Alerts included information about best practice, suggestions for testing and discharge, or probabilities of thyroid disease. Additionally, two studies displayed the costs of the tests being ordered. Among the soft interventions, guidelines/protocols (n = 18) and education (n = 16) were most commonly employed, followed by audit/feedback (n = 14) and reminders (n = 9). In total, 19 interventions consisted solely of soft components, 25 involved structural changes, and ten combined both components (Table 3).
Bias assessment
The majority of the studies did not have a control group (n = 33). Following an approach outlined by HTACG, we refrained from conducting a formal RoB assessment for these studies, as their inherent lack of internal validity is unlikely to be changed by a RoB assessment [89].
Two RCTs were of low RoB; the remaining three had some concerns due to potential selection bias. Primarily due to the risk of confounding, the RoB in controlled studies (n = 9) was critical in three studies (n = 4 outcomes), serious in six studies (n = 8 outcomes), and moderate in one study. The outcomes within a study showed the same RoB for each domain due to the relation of the outcome measures. The full RoB assessment can be found in Additional file 10.
Effectiveness of the interventions
Of the 54 interventions, the majority showed a positive direction of change (n = 52), a considerable number had effects ≥ 20% (n = 40), and many reported significant changes (n = 29, with 19 not reported; Table 4) (RQ1). Further synthesis was impracticable due to the heterogeneity of measures, tests, and reporting standards. Only 12 interventions reported confidence intervals, while the difference in means was even less common (n = 7). Relative change ranged from a 32% decrease to a 172% improvement (additional information on study results and reported effect measures is provided in Additional file 11). We refrained from computing means due to the variability of measures within the outcomes. The most frequently assessed outcome was the number or rates of tests (n = 43), with 41 showing a positive effect. A large proportion of these had effects of ≥ 20% (n = 30). Significant changes were reported in 22 of the 43 interventions, while 17 did not report significance. Expenditure outcomes were less common (n = 9), but all showed positive changes, with significant changes reported in two interventions (six not reported). Relative improvements ≥ 20% were shown by five interventions assessing the expenditure. A total of 49 interventions assessed volume-related outcomes (test numbers/rates, expenditure; Table 4).
Table 4.
Results of the interventions
| Study and country | Type of Study | Intervention | Direction of effect | Relative change | Significant effect | Notes on outcome measure |
|---|---|---|---|---|---|---|
| Appropriateness | ||||||
| Daucourt et al., France MPC [25] | RCT | Reminder + Decision tool | + | −* | − | Proportion of TFTs ordered in accordance with guidelines |
| Daucourt et al., France TRF [25] | RCT | Reminder + Decision tool | + | + * | + | See above |
| Daucourt et al., France Both [25] | RCT | Reminder + Decision tool | + | + * | NR | See above |
| Delvaux et al., Belgium [58] | RCT | Decision tool | + | − | + | Appropriate tests/total orders |
| Schectman et al., USA Reminder + Feedback [82] | Controlled | Reminder + Feedback | + | + * | NR | Compliance rate with TFT protocol |
| Schectman et al., USA Reminder [82] | Controlled | Reminder | + | + * | + | See above |
| Caldarelli et al., Italy [56] | Uncontrolled | Decision tool | + | −* | NR | Prescriptive appropriateness through the ratios TSH/FT4, TSH/FT3 and the ratio “TSH Reflex”/TSH |
| Dowling et al., USA [74] | Uncontrolled | Education + Feedback | + | + * | − | Indicated TSH/visit |
| Elrewini et al., Saudi Arabia [59] | Uncontrolled | Education + Guidelines | + | − | + | Unnecessary requests/total TSH requests |
| Feldkamp and Carey, USA [75] | Uncontrolled | Decision tool | + | + * | NR | Shift towards TSH |
| Leis et al., Canada [61] | Uncontrolled | Decision tool | + | + * | + | Proportion of not indicated physician TSH orders |
| Nightingale et al., UK [80] | Uncontrolled | Guidelines + Decision tool + Feedback | + | + * | NR | Patients requiring an investigation/patients tested |
| Rhyne and Gehlbach, USA [81] | Uncontrolled | Education + Guidelines | + | + * | − | High/low indication test proportion |
| Toubert et al., France [85] | Uncontrolled | Guidelines + Reminders | + | + * | + | Frequency of appropriate use of thyroid function tests |
| Coefficient of variation (CoV) | ||||||
| Berwick and Coltin, USA PCF$ [23] | Controlled | Feedback | + | + | NR | CoV of rate of test use among physicians within centres |
| Berwick and Coltin, USA PCFY [23] | Controlled | Feedback | + | + | NR | See above |
| Berwick and Coltin, USA TSE [23] | Controlled | Education | + | − | NR | See above |
| Expenditure | ||||||
| Tierney et al., USA [27] | RCT | Decision tool | + | − | − | Charges per visit |
| Tomlin et al., New Zealand [84] | Controlled | Education + Guidelines + Feedback | + | − | NR | |
| Bejjanki et al., USA [53] | Uncontrolled | Decision tool | + | + * | NR | Cost savings from reducing duplicates |
| Caldarelli et al., Italy [56] | Uncontrolled | Decision tool | + | −* | NR | |
| Elrewini et al., Saudi Arabia [59] | Uncontrolled | Education + Guidelines | + | + * | NR | Cost spent on the unnecessary requests of TSH tests |
| Hardwick et al., Canada [29] | Uncontrolled | Guidelines + Change in Funding | + | + | NR | Change in total expected costs |
| Janssens et al., Netherlands [44] | Uncontrolled | Guidelines | + | + * | NR | |
| Leung et al., USA [62] | Uncontrolled | Education + Reminder | + | NR | + | Change of laboratory costs |
| Stuart et al., Australia [83] | Uncontrolled | Education + Guidelines + Feedback | + | + | + | Mean costs per patient |
| Pattern | ||||||
| Tomlin et al., New Zealand [84] | Controlled | Education + Guidelines + Feedback | + | + | + | Shift towards TSH |
| Wong et al., USA [88] | Controlled | Guidelines + Decision tool | + | + * | + | Sought to reduce complete thyroid panels |
| Emerson and Emerson, USA [26] | Uncontrolled | Decision tool | + | + * | + | Shift towards FT4 and thyroid cascade |
| Hardwick et al., Canada [29] | Uncontrolled | Guidelines + Change in Funding | + | + * | NR | Sought to reduce proportion of T3 tests |
| Larsson et al., Sweden [22] | Uncontrolled | Education | + | −* |
TSH/TFTs: + T3/TSH: + T4/TSH: − |
Shift towards TSH (primary care centres) |
| Larsson et al., Sweden [22] | Uncontrolled | Education | + | + * | NR | Shift towards TSH (individual physicians) |
| Mindemark and Larsson, Sweden (follow up) [79] | Uncontrolled | Education | + |
T3/TSH + * T4 + FT4/ TSH −* |
− | Median of physicians; shift towards TSH |
| Toubert et al., France [85] | Uncontrolled | Guidelines + Reminders | + | + * | NR | Shift towards TSH |
| Van Walraven et al., Canada [28] | Uncontrolled | Guidelines + Change in Funding + Decision tool | + | + | + | Shift towards TSH |
| Vidal-Trecan et al., France [86] | Uncontrolled | Education + Guidelines + Reminders + Decision tool | + | − | NR | Shift towards TSH |
| Test numbers or rates | ||||||
| Baker et al., UK [71] | RCT | Guidelines + Feedback | + | −* | − | Tests per 1,000 patients |
| Thomas et al., UK Feedback [24] | RCT | Feedback | + | −* | + | Tests per 10,000 patients |
| Thomas et al., UK Reminder [24] | RCT | Reminder | + | −* | + | See above |
| Thomas et al., UK Both [24] | RCT | Reminder + Feedback | + | NR | NR | See above |
| Bellodi et al., Italy [54] | Controlled | Decision tool | + |
Delta: − Cento: + Ferrara: − |
NR | Number of laboratory tests requested by wards |
| Berwick and Coltin, USA PCF$ [23] | Controlled | Feedback | + | − | NR | Tests per 1,000 encounters per physician |
| Berwick and Coltin, USA PCFY [23] | Controlled | Feedback | − | + | NR | See above |
| Berwick and Coltin, USA TSE [23] | Controlled | Education | + | − | NR | See above |
| Chami et al., Canada [45] | Controlled | Decision tool | + | − | − | Number of thyroid tests |
| Gama et al., UK [76] | Controlled | Feedback | + | + |
I: + C: − |
Tests per outpatient visit |
| Horn et al., USA [78] | Controlled | Decision tool | + | NA | − | Monthly orders per 1,000 patients |
| Schectman et al., USA [82] | Controlled | Reminder + Feedback | + | −* | + | Number of TFTs per patient; Feedback and Non-Feedback group combined |
| Tomlin et al., New Zealand [84] | Controlled | Education + Guidelines + Feedback | + |
TSH: − FT3/FT4: + |
+ | Tests per year per GP |
| Wintemute et al., Canada [69] | Controlled | Guidelines + Feedback | + | − | + | |
| Wong et al., USA [88] | Controlled | Guidelines + Decision tool | + |
TSH and T3RIA: + T3RU and T4RIA: − |
NR | Tests per month |
| Adlan et al., UK [70] | Uncontrolled | Guidelines | + | + * | + | Proportion of admitted patients offered TFTs |
| Bateman et al., Canada [52] | Uncontrolled | Education + Feedback | + | + | NR | Proportion of admitted patients offered TFTs |
| Bejjanki et al., USA [53] | Uncontrolled | Decision tool | + |
FT4: − TSH: + |
FT4: − TSH: + |
Percentage change in the number of inpatient duplicate orders |
| Bejjanki et al., USA [53] | Uncontrolled | Decision tool | + |
FT4: − TSH: + |
FT4: − TSH: + |
Odds of percentage duplicate |
| Bradshaw et al., USA [55] | Uncontrolled | Decision tool | + | + |
TSH: − FT4: + |
Number of inappropriate TSH tests ordered; FT3 excluded due to low baseline numbers |
| Caldarelli et al., Italy [56] | Uncontrolled | Decision tool | + |
TSH: − FT4: − FT3: + |
NR | Number of thyroid tests |
| Chu et al., Australia [72] | Uncontrolled | Decision tool | + | + * | + | Number of tests ordered per 100 ED presentations |
| Cipullo and Mostoufizadeh, USA [73] | Uncontrolled | Guidelines | + | − | NR | Tests/discharge |
| Dalal et al., USA [57] | Uncontrolled | Decision tool | + | + | + | Number of tests of fT3 and fT4 orders per total TSH orders |
| Dowling et al., USA [74] | Uncontrolled | Education + Feedback | + | + * | − | Rates of ordering TSH tests per visit |
| Emerson and Emerson, USA [26] | Uncontrolled | Decision tool | + | + * | + | Test sets ordered (significance for total TFTs) |
| Feldkamp and Carey, USA [75] | Uncontrolled | Decision tool | + |
TSH: −* T4: + * T3RU: + * |
NR | Tests per 1,000 patients (T3 not reported) |
| Gilmour et al., Canada [30] | Uncontrolled | Education + Decision tool | + | + | + | Median number of tests performed (FT3 and FT4; TSH used for initial appropriateness) |
| Grivell et al., Australia [77] | Uncontrolled | Feedback | − | + * | NR | Tests per 1,000 patients |
| Hardwick et al., Canada [29] | Uncontrolled | Guidelines + Change in Funding | + | + * | NR | |
| Janssens et al., Netherlands [44] | Uncontrolled | Guidelines | + | + * | NR | |
| Krouss et al., USA [60] | Uncontrolled | Decision tool | + | + * | + | Orders per 1,000 patient days (inpatient)/per 1,000 encounters (outpatient) |
| Leis et al., Canada [61] | Uncontrolled | Decision tool | + | + * | + | Patients with any TSH assay request/patients with physician-signed order |
| MacPherson et al., Australia [63] | Uncontrolled | Guidelines + Decision tool | + | + * | + | |
| Muris et al., Netherlands [64] | Uncontrolled | Decision tool | + | + * | + | Mean test ordering rate per 1,000 patients per month per general practice |
| Notas et al., Greece [65] | Uncontrolled | Decision tool | + | + | + | Number of TFTs per TSH ordered (FT4 and FT3) and per cent patients with TFT order, inpatients |
| Notas et al., Greece [65] | Uncontrolled | Decision tool | + | + * | NR | Number of TFTs per TSH ordered (FT4 and FT3), outpatients |
| Rhyne and Gehlbach, USA [81] | Uncontrolled | Education + Guidelines | + | −* | + | TFTs per 100 patients |
| Salinas et al., Spain [66] | Uncontrolled | Decision tool | + | + * | NR | Ratio of FT4/TSH |
| Sue et al., USA [67] | Uncontrolled | Decision tool | + | + | + | T3 laboratory tests/10,000 patients per week |
| Taher et al., Canada [68] | Uncontrolled | Decision tool | + | + | NR | Total number of fT4 and fT3 tests per month |
| Toubert et al., France [85] | Uncontrolled | Guidelines + Reminders | + | + * | NR | |
| Van Walraven et al., Canada [28] | Uncontrolled | Guidelines + Change in Funding + Decision tool | + |
TSH: − T4: + |
+ | Tests per 100,000 patients per month; comparison with expected values (T3RU not reported) |
| Vidal-Trecan et al., France [86] | Uncontrolled | Education + Guidelines + Reminders + Decision tool | + | − | NR | |
| Willis and Datta, UK [87] | Uncontrolled | Education + Guidelines | + | + * | + | Tests per admission |
Interventions sorted by outcome and type of study. Effects based on numerical results that can be found in Additional file 11. Deviations from standard outcome measure listed in last column
*Based on authors’ calculations (for values pre/postintervention see Additional file 11)
ED emergency department, FT4 free thyroxine, FT3 free triiodothyronine, GP general practitioner, MPC memorandum pocket card, NR not reported, TFTs thyroid function tests, TRF test request form, TSE test-specific education, TSH thyroid stimulating hormone (thyrotropin), T3 triiodothyronine, T3RU triiodothyronine resin uptake, PCFY peer comparison feedback on yield of tests, PCF$ peer comparison feedback on cost of test use, RCT randomised controlled trial, RIA radioimmunoassay
Improvement-related outcomes (appropriateness, pattern, and CoV) were assessed in 24 interventions. Appropriateness was frequently studied (n = 14), with all showing positive direction, many with effects ≥ 20% (n = 10), and significant changes in six studies (five not reported). Pattern changes were less common (n = 8), but all showed positive effects, with significant changes in some (n = 4, with three not reported). Effects of ≥ 20% were shown in three interventions. CoV outcomes were least assessed (n = 3), with no significant changes reported (two interventions showed relative improvements ≥ 20%). Over all outcomes and interventions, the results of structural interventions were slightly more positive, with 100% showing positive effects (61% significant) and 74% with large effect sizes, compared to combined and soft interventions (combined 100% positive (40% significant), 60% large effect; soft, 94% positive (44% significant), 47% large effects; Additional file 11).
To contextualise these findings, we next evaluated the certainty of evidence using GRADE. For structural interventions (CDSS, changes in funding), we found a significantly positive effect on the outcomes that measure improvement of care based on two cluster RCTs (n = 1 effect ≥ 20%, high certainty of evidence (CoE)). Four uncontrolled studies supported these findings (positive direction, n = 3 effects ≥ 20%, two significant). For volume reduction, one RCT indicated a trend towards reducing test rates (positive, not significant (NS), low CoE), with 16 non-randomised studies pointing in the same direction (positive direction, n = 12 effects ≥ 20%, n = 9 significant). For soft interventions, one cluster RCT indicated positive improvement of care (NS, moderate CoE), as well as ten non-randomised interventions (n = 9 effects ≥ 20%, four significant). Two cluster RCTs featuring four soft interventions indicated they could achieve volume reduction (two significant, moderate CoE). Of 19 non-randomised interventions, 17 showed a positive direction as well (n = 9 effects ≥ 20%, nine significant). Similarly, combined interventions showed positive effects on the improvement of care based on one cluster RCT (positive, effect ≥ 20%, NS, moderate CoE) and six non-randomised interventions (n = 3 effects ≥ 20%, n = 3 significant). No RCT assessed volume reduction, but eight non-randomised interventions showed a positive trend (n = 5 effects ≥ 20%, n = 3 significant). The full GRADE evidence profiles can be found in Additional file 7, organised by intervention type (Table 7.1) and outcome category (Table 7.2).
Most of the studies had unreported funding (n = 21), some were non-profit (n = 14), lacked a specific grant (n = 12), or had unclear funding (n = 1). Twenty-two studies reported ethics committee approval (n = 18) or stated that approval was not required (n = 4). Twenty-five studies did not report on ethics approval (Additional file 12 includes information on ethics approval, funding, and conflict of interest). Although we cannot entirely rule out an overestimation of the predominantly positive findings, there is no indication of significant publication bias. No relevant trials were found in the extended registry search. All evidence of non-randomised trials was rated to be of very low certainty (Additional file 7).
Theoretical foundations and contextual factors
Information on theoretical foundations and contextual factors provided by the included studies was sparse (RQ2). Four interventions reported on the theoretical foundations of their interventions, going further than conventional references to systematic reviews or guidelines (relevant text passages included in Additional file 9). We did not find additional literature reporting on theoretical foundations or contextual factors. Consideration of contextual factors beyond theoretical models was not explicitly mentioned in any included study.
Elrewini et al. (education + guidelines + retest alert) performed a root cause analysis to develop a corresponding action plan that implements the identified root causes [59]. Leis et al. (change of order form) used a simulated setting to assess unnecessary test ordering through a hypothetical patient scenario with a quasi-randomised sample of participants. They concluded that the presence of a checkbox influences ordering behaviour [61, 90]. Stuart et al. (feedback + education + guidelines) based the components of their intervention on the core elements of the PRECEDE framework (Predisposing, Reinforcing, and Enabling Causes in Educational Diagnosis and Evaluation, [41, 83]). Wintemute et al. (guidelines + feedback + reminder) based their choice of intervention on Rogers’ theory of diffusion of innovations, supplementing evidence-based recommendations with active reminders and local feedback ([69, 91], Additional file 9).
Additionally, three interventions performed Plan-Do-Study-Act (PDSA) cycles based on different approaches. Bateman et al. (feedback + education) applied a systematic approach using process measures and evaluation based on quality improvement literature [52, 92, 93]. Gilmour et al. (education + reflex testing) and Taher et al. (reflex testing) developed their interventions based on PDSA cycles using the model for improvement framework for continuous quality improvement by Provost et al. ( [30, 68, 94, 95], Additional file 9).
Discussion
Our review sought to evaluate interventions aimed at reducing unnecessary TFTs by reviewing and synthesising recent studies, building on the review by Zhelev et al. from 2016 [34]. We identified 21 new studies, contributing to a total of 47 unique studies included in our review. The interventions comprised soft (education, guidelines/protocols, reminders, and audit/feedback) and structural (CDSS and changes in funding) interventions. The synthesis of 54 interventions across the included studies revealed predominantly positive outcomes, with 52 interventions associated with reductions or improvements in at least one outcome. Most studies reported relative reductions of at least 20%. Restricting the evidence to the five included (cluster) RCTs reaffirmed this pattern, as all five trials reported some degree of beneficial impact. In this review, we applied the Cochrane methodology and the GRADE approach, enhancing the methodological rigour [38, 50]. Nevertheless, the overall certainty of evidence was rated as low, indicating that the observed effects should be interpreted with caution and primarily viewed as indicative of promising trends rather than definitive evidence of effectiveness.
We observed a shift towards structural interventions, particularly CDSS. Of the 21 newly identified studies, 17 employed some form of CDSS, with alerts being the most commonly reported. The clustered GRADE assessment suggested potential effectiveness of de-implementation interventions, where structural interventions showed slightly more compelling results (RQ1). RCT evidence for structural interventions indicated improvement of care (moderate CoE, n = 2 RCTs) and volume reduction (low CoE, n = 1 RCT). Though RCT evidence for soft interventions shows similar results (volume reduction n = 2 RCTs, improvement of care n = 1 RCT, both moderate CoE), observational evidence suggests a higher success rate for structural interventions (structural interventions: 100% positive, 61% significant; soft interventions: 94% positive, 44% significant). The increased use of structural interventions aligns with the expectation that direct approaches at the point of care may be more likely to yield a reduction in test orders compared to soft interventions [45]. A systematic review performed by Cliff et al. on the effectiveness of CW interventions concluded that structural interventions are more effective than soft approaches [21]. Similarly, a CDSS was found to be the most effective intervention in de-implementing low-value cancer care [96]. Further research is needed to evaluate the effectiveness of structural interventions, under various conditions such as system usability, integration with existing practices, and user engagement, which are often underreported [97]. In particular, it remains unclear how differences in CDSS design, implementation context, and integration into clinical workflows shape both the magnitude and durability of observed effects.
Next to potentially promising structural interventions, our study also identified soft interventions that, while less impactful, showed compelling effects in some settings. These findings suggest that both structural and soft interventions may be suitable options for reducing TFTs. Similarly, research by Kobewka et al., which examined interventions aimed at reducing all sorts of laboratory test utilisation, found positive results across all types of interventions [98]. In particular, soft interventions can be considered in settings where profound structural interventions are not feasible. Regardless of the specific intervention type, a recent overview of reviews by Kien et al. indicates that de-implementation strategies are effective across various low-value services [6]. This overview complements our review and can inform decision-makers about the range of interventions available for reducing low-value care, as well as the potential for de-implementation strategies beyond individual indications.
Further, our review sought to identify theoretical foundations considered during implementation and contextual factors that are associated with the effectiveness of the interventions (RQ2). However, reporting on these aspects was sparse in the identified studies and grey literature. Only a few studies reported using frameworks to guide their interventions. This limits our understanding of the mechanisms driving the observed effects and reduces the ability to replicate successful interventions in different settings, as the lack of theory-driven design makes it difficult to explain how contextual and behavioural factors influence effectiveness. For example, Stuart et al. based the components of their intervention on the core elements of the PRECEDE framework thereby aligning components with identified determinants of behaviour ([83], significant positive effect with RD ≥ 20%). Similarly, Wintemute et al. drew on Rogers’ theory of diffusion of innovations to supplement evidence-based recommendations with active reminders and local feedback ([69], significant positive effect). These examples illustrate how theoretical models can strengthen intervention design by making explicit the mechanisms through which behaviour change is expected to occur. However, despite the availability of several de-implementation frameworks, they appear to be rarely applied in practice [99]. A more consistent use of such frameworks across interventions would not only facilitate comparability but also strengthen the theoretical grounding of de-implementation strategies. Integrating examples such as digital readiness or organisational culture within such frameworks could help clarify how contextual factors interact with intervention mechanisms. Policymakers should support evidence-based interventions built on robust theoretical foundations and evaluation frameworks to ensure effectiveness and lasting impact by avoiding inefficient components. Once solid evidence has been generated through primary studies, a realist review may help to fully understand how and why different components of the intervention(s) work in what contexts and for whom [100]. The success of such system-level changes depends heavily on contextual conditions, such as the availability and interoperability of local infrastructure, prevailing funding mechanisms, and regulatory frameworks. These factors determine whether an intervention can be feasibly implemented and sustained, and they should be carefully considered when transferring findings to other settings. Although some studies reported larger effects for changes in the clinical ordering system compared to soft interventions, it cannot be assumed that such approaches are simultaneously less expensive, despite the absence of recurring training sessions and evaluations [98]. While CDSS may prove cost-efficient over time through automation and scalability, they often demand substantial initial investments in digital infrastructure. In contrast, soft interventions are typically less costly to implement initially but may require ongoing efforts to sustain their effects. This trade-off should be carefully considered when designing de-implementation strategies. Cost-effectiveness analysis is necessary to evaluate the costs of interventions relative to their savings, as interventions aimed at reducing low-value care can themselves be resource-intensive.
Limitations
There are several limitations to this review that should be acknowledged. First, we included observational studies due to the complexity of the interventions. Observational studies are generally more prone to internal validity concerns compared to RCTs [38]. Second, we did not pool the effects of the interventions quantitatively because of the heterogeneity of the interventions and outcome measures. Instead, we focused on the evaluation of clustered results in order to give a concise overview. While the classification into structural or soft interventions is partly based on literature, the distinction is not always clear-cut, as some interventions span both categories or include borderline elements, and several interventions in this review explicitly combined soft and structural components. Alternative ways of grouping interventions and outcomes may therefore lead to different interpretations of the findings. In addition, conducting an extended mixed-methods synthesis integrating qualitative and quantitative evidence was beyond the scope of this review, given the sparse and inconsistent reporting of theoretical foundations and contextual factors. Third, we frequently observed small effects, which may limit the strength of the conclusions, though the overall trends were generally positive. Still, only 12 of the included interventions reported confidence intervals, limiting the ability to assess the reliability of effect estimates. The lack of confidence intervals makes it more difficult to determine the statistical robustness of reported changes, and increases the uncertainty surrounding the true effectiveness of the interventions. Furthermore, the majority of studies reported outcomes over relatively short timeframes, with few studies providing extensive follow-up. Consequently, it remains unclear whether reductions in TFT ordering were sustained once interventions ended or whether rebound effects occurred, for example due to alert fatigue, CDSS-related workflow integration issues, or system changes. Future research should address the long-term sustainability of de-implementation efforts. Fourth, most of the included studies were conducted in North America and Europe, particularly the USA, Canada, and the UK. This may limit the generalisability to health systems in other regions, especially those with low resources or limited digital infrastructure. In such settings, soft interventions may be more readily feasible than CDSS-based approaches that require specific digital and regulatory prerequisites. In addition, the interventions under study may not be readily transferable to other settings in the countries of interest. For example, in the German healthcare system, there are various electronic health data management systems and providers of practice management software, each with varying levels of interoperability. The context in which interventions are deployed could produce different outcomes based on the technical and regulatory landscape. Regulatory efforts are necessary in order to incorporate customised solutions across multiple institutions simultaneously. Fifth, the literature search was restricted to English and German publications. While no German-language studies were identified and several included studies originated from non-English-speaking countries, the exclusion of other languages may have resulted in the omission of a small number of relevant studies.
Regarding publication bias, research on TFT reduction is predominantly publicly funded, with fewer incentives for researchers to withhold information in comparison to pharmaceutical trials and related reviews. Any intervention aiming to reduce low-value TFT tests is likely to lead to positive results compared to usual care. Meanwhile, significant outcomes (31 interventions, 57%) were not reported in excessive frequency in relation to non-significant outcomes or outcomes with no reported level of significance. Thus, we concluded that publication bias is not a major concern. However, the predominance of non-RCTs led to a generally low CoE, limiting generalisability. Most controlled studies had a serious or critical RoB due to potential confounding, while observational studies were categorically classified as having a critical RoB. Last, we cannot rule out the possibility that our findings are influenced by selective reporting of outcomes. However, similar to the issue of potential publication bias, selective reporting is unlikely to pose a significant problem, as the body of evidence includes enough compelling and consistently positive results. The identified limitations coincide with those found in similar reviews and research on related low-value care topics [21, 96, 101]. Thus, the implications for policy should be interpreted with the appropriate caution, taking into account the GRADE results (Additional file 7).
Conclusion
The evidence on de-implementation strategies for TFT ordering suggests that behaviour change interventions have the potential to significantly reduce excessive thyroid function testing. Particularly CDSS appear to be associated with promising results, though most studies are of high or critical risk of bias. If these findings hold in more rigorous trials, this would strengthen the evidence base for feasible workflow modifications to improve care. Policy and practice could then consider implementing controlled TFT reduction as a means to enhance appropriateness and possibly reduce costs. Continued research on cost-effectiveness will be essential to inform large-scale implementation.
Future research should focus on developing well-designed interventions based on a solid theoretical foundation and higher methodological rigour. In particular, RCTs and the use of hybrid effectiveness-implementation designs would allow more reliable evaluation and applicability. By systematically reporting contextual factors and mechanisms of change, future studies can strengthen the evidence base and support the replication of potentially effective de-implementation strategies across settings. In the short term, standardised cluster RCTs across diverse contexts could test the effectiveness of CDSS, while medium-term studies should assess the sustainability of their effects. In the longer term, theory-based realist syntheses may help clarify what works, for whom, and why. This review can help inform the design of such interventions by identifying which specific interventions or components may be associated with greater effectiveness. Ongoing evaluation of these studies can identify the mechanisms of change and facilitate the replication of successful de-implementation interventions across various settings.
Supplementary Information
Additional file 1. Additional file 1 includes the AMSTAR assessment of the review by Zhelev et al [34].
Additional file 2. Additional file 2 includes the ROBIS assessment of the review by Zhelev et al. [34].
Additional file 3. Additional file 3 includes the PRISMA 2020 Checklist.
Additional file 4. Additional file 4 includes changes made to the information provided at registration.
Additional file 5. Additional file 5 includes the search strategies in Embase, Medline, Scopus, Cochrane, and Google Scholar.
Additional file 6. Additional file 6 includes the list of data items extracted in the review.
Additional file 7. Additional file 7 includes the full GRADE assessment of the interventions.
Additional file 8. Additional file 8 includes the information on all articles excluded after full-text screening, including reason for exclusion.
Additional file 9. Additional file 9 includes additional information on study characteristics, i.e. reported outcomes, reporting on theoretical foundations, and study period.
Additional file 10. Additional file 10 includes the visualisation of the Risk of Bias assessment for the (cluster) RCTs and controlled studies.
Additional file 11. Additional file 11 includes additional information on study results, i.e. the outcome values (pre/postintervention), notes on outcome measures and statistical indicators (confidence interval, p-value, relative reduction, difference in means).
Additional file 12. Additional file 12 includes additional information on study characteristics, i.e. funding and reported conflict of interest.
Acknowledgements
We are grateful to Zhivko Zhelev for his invaluable insights and critical analyses in the prior review, which served as a foundation for the results presented in this article. We thank him for his guidance and continued support throughout the process.
Abbreviations
- CDSS
Clinical decision support systems
- CoE
Certainty of evidence
- CoV
Coefficient of variation
- CW
Choosing Wisely
- GRADE
Grading of Recommendations, Assessment, Development, and Evaluations
- NRSI
Non-randomised studies of interventions
- NS
Not significant
- PDSA
Plan-Do-Study-Act
- PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
- RCT
Randomised controlled trial
- RQ
Research question
- RoB
Risk of bias
- RoB 2
RoB 2.0 tool
- ROBINS-I
RoB In Non-randomised Studies—of Interventions
- ROBIS
RoB In Systematic reviews tool
- T3
Triiodothyronine
- T4
Thyroxine
- TDF
Theoretical Domains Framework
- TFT
Thyroid function test
- TSH
Thyroid-stimulating hormone
Authors’ contributions
CP and MH contributed to the design, analysis, and interpretation and drafted the manuscript. GG contributed to the design, analysis, and interpretation. VV contributed to the design and interpretation and finalised the manuscript. All authors read and approved the final manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. The systematic review was funded by the Federal Joint Committee (GBA), the highest body of self-administration in the German healthcare system. Funding code 01VSF19038 [102]. The funders were not involved in the development of the review.
Data availability
All data generated or analysed during this study are included in this published article and its supplementary information files.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Carolina Pioch and Meik Hildebrandt should be considered joint first author.
References
- 1.Horton S, Fleming KA, Kuti M, Looi L-M, Pai SA, Sayed S, et al. The top 25 laboratory tests by volume and revenue in five different countries. Am J Clin Pathol. 2019;151(5):446–51. [DOI] [PubMed] [Google Scholar]
- 2.Dufour DR. Laboratory tests of thyroid function: uses and limitations. Endocrinol Metab Clin North Am. 2007;36(3):579–94. [DOI] [PubMed] [Google Scholar]
- 3.Kluesner JK, Beckman DJ, Tate JM, Beauvais AA, Kravchenko MI, Wardian JL, et al. Analysis of current thyroid function test ordering practices. J Eval Clin Pract. 2018;24(2):347–52. [DOI] [PubMed] [Google Scholar]
- 4.Beckett GJ, Toft AD. First-line thyroid function tests - - TSH alone is not enough. Clin Endocrinol. 2003;58(1):20–1. [DOI] [PubMed] [Google Scholar]
- 5.Premawardhana LD. Thyroid testing in acutely ill patients may be an expensive distraction. Biochemia medica. 2017;27(2):300–7. [DOI] [PMC free article] [PubMed]
- 6.Kien C, Daxenbichler J, Titscher V, Baenziger J, Klingenstein P, Naef R et al. Effectiveness of de-implementation of low-value healthcare practices: an overview of systematic reviews. Implement sci: IS. 2024;19(1):56. [DOI] [PMC free article] [PubMed]
- 7.Garmendia Madariaga A, Santos Palacios S, Guillén-Grima F, Galofré JC. The incidence and prevalence of thyroid dysfunction in Europe: a meta-analysis. J Clin Endocrinol Metab. 2014;99(3):923–31. [DOI] [PubMed] [Google Scholar]
- 8.El Kawkgi OM, Brito JP. Screening for thyroid dysfunction: prevention of overdiagnosis and overtreatment. CMAJ. 2019;191(46):E1260–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hueber S, Biermann V, Tomandl J, Warkentin L, Schedlbauer A, Tauchmann H, et al. Consequences of early thyroid ultrasound on subsequent tests, morbidity and costs: an explorative analysis of routine health data from German ambulatory care. BMJ Open. 2023;13(3):e059016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schübel J, Voigt K, Uebel T. Elevated TSH values in primary care: DEGAM-Guideline. AWMF-Register-Nr. 053–046; 2023 [cited 2024 Jul 24]. Available from: https://register.awmf.org/assets/guidelines/053_D_Ges_fuer_Allgemeinmedizin_und_Familienmedizin/053-046k-eng_S2k_Erhoehter-TSH-Wert-in-der-Hausarztpraxis_2023-07.pdf.
- 11.Cure C. Screening for thyroid dysfunction: do not routinely order TSH in all patients: Canadian Task Force on Preventive Health Care; 2024 [cited 2024 Jul 24]. Available from: URL: https://canadiantaskforce.ca/screening-for-thyroid-dysfunction-do-not-routinely-order-tsh-in-all-patients/.
- 12.Canadian Society of Endocrinology and Metabolism. Five things patients and physicians should question: endocrinology and metabolism - Choosing wisely Canada; 2020 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/recommendation/endocrinology-and-metabolism/.
- 13.Choosing Wisely Australia. The Endocrine Society of Australia: recommendations; 2024 [cited 2024 Jul 1]. Available from: URL: https://www.choosingwisely.org.au/recommendations/esa5.
- 14.Canadian Society of Endocrinology and Metabolism. CSEM review and response: thyroid testing and management; 2024 [cited 2024 Jul 24]. Available from: URL: https://www.endo-metab.ca/cpgs-qi/thyroid-testing.
- 15.Gupta S, Verma M, Gupta AK, Kaur A, kaur V, Singh K. Are we using thyroid function tests appropriately? Indian J Clin Biochem. 2011;26(2):178–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Baranek H, Lee J. Less is more with T and T4: Choosing Wisely Canada; 2018 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/less-t3-t4/.
- 17.Hildebrandt M, Pioch C, Dammertz L, Ihle P, Nothacker M, Schneider U et al. Quantifying low-value care in Germany: an observational study using statutory health insurance data from 2018 to 2021. Value Health. 2024;28(6):884–93. [DOI] [PubMed]
- 18.Pioch C, Neubert A, Dammertz L, Ermann H, Hildebrandt M, Ihle P, et al. Selecting indicators for the measurement of low-value care using German claims data: a three-round modified Delphi panel. PLoS One. 2025;20(2):e0314864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Crampton N, Kalia S, Del Giudice ME, Wintemute K, Sullivan F, Aliarzadeh B, et al. Over-use of thyroid testing in Canadian and UK primary care in frequent attenders: a cross-sectional study. Int J Clin Pract. 2021;75(6):e14144. [DOI] [PubMed] [Google Scholar]
- 20.Berthe E, Bencheqroun S, Mentaverri R. Recommendations for improved clinical practices for total thyroxine (T4) assay. J Appl Lab Med. 2025;10(3):764–7. [DOI] [PubMed] [Google Scholar]
- 21.Cliff BQ, Avanceña ALV, Hirth RA, Lee S-Y. The impact of choosing wisely interventions on low-value medical services: a systematic review. Milbank Q. 2021;99(4):1024–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Larsson A, Biom S, Wernroth ML, Hultén G, Tryding N. Effects of an education programme to change clinical laboratory testing habits in primary care. Scand J Prim Health Care. 1999;17(4):238–43. [DOI] [PubMed] [Google Scholar]
- 23.Berwick DM, Coltin KL. Feedback reduces test use in a health maintenance organization. J Am Med Assoc. 1986;255(11):1450–4. [PubMed] [Google Scholar]
- 24.Thomas RE, Croal BL, Ramsay C, Eccles M, Grimshaw J. Effect of enhanced feedback and brief educational reminder messages on laboratory test requesting in primary care: a cluster randomised trial. Lancet. 2006;367(9527):1990–6. [DOI] [PubMed] [Google Scholar]
- 25.Daucourt V, Saillour-Glénisson F, Michel P, Jutand MA, Abouelfath A. A multicenter cluster randomized controlled trial of strategies to improve thyroid function testing. Med Care. 2003;41(3):432–41. [DOI] [PubMed] [Google Scholar]
- 26.Emerson JF, Emerson SS. The impact of requisition design on laboratory utilization. Am J Clin Pathol. 2001;116(6):879–84. [DOI] [PubMed] [Google Scholar]
- 27.Tierney WM, McDonald CJ, Hui SL, Martin DK. Computer predictions of abnormal test results. Effects on outpatient testing. J Am Med Assoc. 1988;259(8):1194–8. [PubMed] [Google Scholar]
- 28.van Walraven C, Goel V, Chan B. Effect of population-based interventions on laboratory utilization: a time-series analysis. J Am Med Assoc. 1998;280(23):2028–33. [DOI] [PubMed] [Google Scholar]
- 29.Hardwick DF, Morrison JI, Tydeman J, Cassidy PA, Chase WH. Structuring complexity of testing: a process oriented approach to limiting unnecessary laboratory use. Am J Med Technol. 1982;48(7):605–8. [PubMed] [Google Scholar]
- 30.Gilmour JA, Weisman A, Orlov S, Goldberg RJ, Goldberg A, Baranek H, et al. Promoting resource stewardship: reducing inappropriate free thyroid hormone testing. J Eval Clin Pract. 2017;23(3):670–5. [DOI] [PubMed] [Google Scholar]
- 31.French SD, Green SE, O'Connor DA, McKenzie JE, Francis JJ, Michie S et al. Developing theory-informed behaviour change interventions to implement evidence into practice: a systematic approach using the Theoretical Domains Framework. Implement Sci: IS. 2012;7(1):38. [DOI] [PMC free article] [PubMed]
- 32.Gangathimmaiah V, Drever N, Evans R, Moodley N, Sen Gupta T, Cardona M et al. What works for and what hinders deimplementation of low-value care in emergency medicine practice? A scoping review. BMJ open. 2023;13(11):e072762. [DOI] [PMC free article] [PubMed]
- 33.Grimshaw JM, Patey AM, Kirkham KR, Hall A, Dowling SK, Rodondi N et al. De-implementing wisely: developing the evidence base to reduce low-value care. BMJ Qual Saf. 2020;29(5):409–17. [DOI] [PMC free article] [PubMed]
- 34.Zhelev Z, Abbott R, Rogers M, Fleming S, Patterson A, Hamilton WT et al. Effectiveness of interventions to reduce ordering of thyroid function tests: a systematic review. BMJ open. 2016;6(6):e010065. [DOI] [PMC free article] [PubMed]
- 35.Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wittich L, Tsatsaronis C, Kuklinski D, Schöner L, Steinbeck V, Busse R, et al. Patient-reported outcome measures as an intervention: a comprehensive overview of systematic reviews on the effects of feedback. Value Health. 2024;27(10):1436–53. [DOI] [PubMed] [Google Scholar]
- 37.Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Paige MJ et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023): Cochrane; 2023. Available from: URL: www.training.cochrane.org/handbook.
- 39.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical research ed). 2021;372:n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. [DOI] [PMC free article] [PubMed]
- 41.Solomon DH, Hashimoto H, Daltroy L, Liang MH. Techniques to improve physicians' use of diagnostic tests: a new conceptual framework. J Am Med Assoc. 1998;280(23):2020–7. [DOI] [PubMed]
- 42.Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ. 1995;153(10):1423–31. [PMC free article] [PubMed]
- 43.The EndNote Team. EndNote X9 (64-bit). Philadelphia (PA): Clarivate; 2013. Available from: https://endnote.com.
- 44.Janssens PMW, Staring W, Winkelman K, Krist G. Active intervention in hospital test request panels pays. Clin Chem Lab Med. 2015;53(5):731–42. [DOI] [PubMed] [Google Scholar]
- 45.Chami N, Li Y, Weir S, Wright JG, Kantarevic J. Effect of strict and soft policy interventions on laboratory diagnostic testing in Ontario, Canada: a Bayesian structural time series analysis. Health policy. 2021;125(2):254–60. [DOI] [PubMed]
- 46.Brown AF, Ma GX, Miranda J, Eng E, Castille D, Brockie T et al. Structural interventions to reduce and eliminate health disparities. Am J Pub Health. 2019;109(S1):S72–8. [DOI] [PMC free article] [PubMed]
- 47.Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898. [DOI] [PubMed] [Google Scholar]
- 48.Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): an R package and Shiny web app for visualizing risk-of-bias assessments. Res Synth Methods. 2021;12(1):55–61. [DOI] [PubMed] [Google Scholar]
- 50.Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94. [DOI] [PubMed]
- 51.Murad MH, Almasri J, Alsawas M, Farah W. Grading the quality of evidence in complex interventions: a guide for evidence-based practitioners. Evid Based Med. 2017;22(1):20–2. [DOI] [PubMed] [Google Scholar]
- 52.Bateman EA, Gob A, Chin-Yee I, MacKenzie HM. Reducing waste: a guidelines-based approach to reducing inappropriate vitamin D and TSH testing in the inpatient rehabilitation setting. BMJ Open Qual. 2019;8(4):e000674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bejjanki H, Mramba LK, Beal SG, Radhakrishnan N, Bishnoi R, Shah C, et al. The role of a best practice alert in the electronic medical record in reducing repetitive lab tests. ClinicoEconomics and outcomes research : CEOR. 2018;10:611–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bellodi E, Vagnoni E, Bonvento B, Lamma E. Economic and organizational impact of a clinical decision support system on laboratory test ordering. BMC Med Inform Decis Mak. 2017;17(1):179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bradshaw AB, Bonnecaze AK, Burns CA, Beardsley JR. Impact of an interprofessional collaborative quality improvement initiative to decrease inappropriate thyroid function testing. Hosp Pharm. 2021;56(5):481–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Caldarelli G, Troiano G, Rosadini D, Nante N. Adoption of TSH reflex algorithm in an Italian clinical laboratory. Annali di igiene : medicina preventiva e di comunita. 2017;29(4):317–22. [DOI] [PubMed] [Google Scholar]
- 57.Dalal S, Bhesania S, Silber S, Mehta P. Use of electronic clinical decision support and hard stops to decrease unnecessary thyroid function testing. BMJ Quality Improvement Reports. 2017;6(1):u223041. w8346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Delvaux N, Piessens V, Burghgraeve T, Mamouris P, Vaes B, Stichele RV, et al. Clinical decision support improves the appropriateness of laboratory test ordering in primary care without increasing diagnostic error: the ELMO cluster randomized trial. Implement Sci IS. 2020;15(1):100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Elrewini AM, Zubair M, Afridi NK, Dildar MT, Javed H, Alwalah SM. To determine the effectiveness of different interventions to reduce unnecessary requests of serum thyroid stimulating hormone levels in a hospital. The Professional Medical Journal. 2022;29(05):686–92. [Google Scholar]
- 60.Krouss M, Israilov S, Alaiev D, Hupart K, Da Shin W, Mestari N, et al. Free the T3: implementation of best practice advisory to reduce unnecessary orders. Am J Med. 2022;135(12):1437–42. [DOI] [PubMed] [Google Scholar]
- 61.Leis B, Frost A, Bryce R, Lyon AW, Coverett K. Altering standard admission order sets to promote clinical laboratory stewardship: a cohort quality improvement study. BMJ Quality & Safety. 2019;28(10):846–52. [DOI] [PubMed] [Google Scholar]
- 62.Leung E, Song S, Al-Abboud O, Shams S, English J, Naji W, et al. An educational intervention to increase awareness reduces unnecessary laboratory testing in an internal medicine resident-run clinic. Journal of community hospital internal medicine perspectives. 2017;7(3):168–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.MacPherson RD, Reeve SA, Stewart TV, Cunningham AES, Craven ML, Fox G, et al. Effective strategy to guide pathology test ordering in surgical patients. ANZ J Surg. 2005;75(3):138–43. [DOI] [PubMed] [Google Scholar]
- 64.Muris DMJ, Molenaers M, Nguyen T, Bergmans PWMP, van Acker BAC, Krekels MME, et al. Effect of a price display intervention on laboratory test ordering behavior of general practitioners. BMC Fam Pract. 2021;22(1):242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Notas G, Kampa M, Malliaraki N, Petrodaskalaki M, Papavasileiou S, Castanas E. Implementation of thyroid function tests algorithms by clinical laboratories: a four-year experience of good clinical and diagnostic practice in a tertiary hospital in Greece. Eur J Intern Med. 2018;54:81–6. [DOI] [PubMed] [Google Scholar]
- 66.Salinas M, López-Garrigós M, Flores E, Leiva-Salinas M, Asencio A, Lugo J, et al. Managing inappropriate requests of laboratory tests: from detection to monitoring. Am J Manag Care. 2016;22(9):e311–6. [PubMed] [Google Scholar]
- 67.Sue LY, Kim JE, Oza H, Chong T, Woo HE, Cheng EM, et al. Reducing inappropriate serum T3 laboratory test ordering in patients with treated hypothyroidism. Endocr Pract. 2019;25(12):1312–6. [DOI] [PubMed] [Google Scholar]
- 68.Taher J, Beriault DR, Yip D, Tahir S, Hicks LK, Gilmour JA. Reducing free thyroid hormone testing through multiple plan-do-study-act cycles. Clin Biochem. 2020;81:41–6. [DOI] [PubMed] [Google Scholar]
- 69.Wintemute K, Greiver M, McIsaac W, Del Elisabeth GM, Sullivan F, Aliarzadeh B, et al. Choosing Wisely Canada campaign associated with less overuse of thyroid testing Retrospective parallel cohort study. Can Fam Phys. 2019;65(11):E487–96. [PMC free article] [PubMed] [Google Scholar]
- 70.Adlan MA, Neel V, Lakra SS, Bondugulapati LNR, Premawardhana LDKE. Targeted thyroid testing in acute illness: achieving success through audit. J Endocrinol Invest. 2011;34(8 SUPPL.):e210–3. [DOI] [PubMed] [Google Scholar]
- 71.Baker R, Smith JF, Lambert PC. Randomised controlled trial of the effectiveness of feedback in improving test ordering in general practice. Scand J Prim Health Care. 2003;21(4):219–23. [DOI] [PubMed] [Google Scholar]
- 72.Chu KH, Wagholikar AS, Greenslade JH, O’Dwyer JA, Brown AF. Sustained reductions in emergency department laboratory test orders: impact of a simple intervention. Postgrad Med J. 2013;89(1056):566–71. [DOI] [PubMed] [Google Scholar]
- 73.Cipullo JA, Mostoufizadeh M. Bringing order to test orders: one lab’s story. CAP today. 1996;10(1):20–2. [PubMed] [Google Scholar]
- 74.Dowling PT, Alfonsi G, Brown MI, Culpepper L. An education program to reduce unnecessary laboratory tests by residents. J Med Educ. 1989;64(7):410–2. [DOI] [PubMed] [Google Scholar]
- 75.Feldkamp CS, Carey JL. An algorithmic approach to thyroid function testing in a managed care setting: 3-year experience. Am J Clin Pathol. 1996;105(1):11–6. [DOI] [PubMed] [Google Scholar]
- 76.Gama R, Nightingale PG, Broughton PM, Peters M, Bradby GV, Berg J, et al. Feedback of laboratory usage and cost data to clinicians: does it alter requesting behaviour? Ann Clin Biochem. 1991;28(Pt 2):143–9. [DOI] [PubMed] [Google Scholar]
- 77.Grivell AR, Forgie HJ, Fraser CG, Berry MN. Effect of feedback to clinical staff of information on clinical biochemistry requesting patterns. Clin Chem. 1981;27(10):1717–20. [PubMed] [Google Scholar]
- 78.Horn DM, Koplan KE, Senese MD, Orav EJ, Sequist TD. The impact of cost displays on primary care physician laboratory test ordering. J Gen Int Med. 2014 [cited 20131121//]; 29(5):708–14. [DOI] [PMC free article] [PubMed]
- 79.Mindemark M, Larsson A. Long-term effects of an education programme on the optimal use of clinical chemistry testing in primary health care. Scand J Clin Lab Invest. 2009;69(4):481–6. [DOI] [PubMed] [Google Scholar]
- 80.Nightingale PG, Peters M, Mutimer D, Neuberger JM. Effects of a computerised protocol management system on ordering of clinical tests. Quality in health care : QHC. 1994;3(1):23–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Rhyne RL, Gehlbach SH. Effects of an educational feedback strategy on physician utilization of thyroid function panels. J Fam Pract. 1979;8(5):1003–7. [PubMed] [Google Scholar]
- 82.Schectman JM, Elinsky EG, Pawlson LG. Effect of education and feedback on thyroid function testing strategies of primary care clinicians. Arch Intern Med. 1991;151(11):2163–6. [PubMed] [Google Scholar]
- 83.Stuart PJ, Crooks S, Porton M. An interventional program for diagnostic testing in the emergency department. Med J Aust. 2002;177(3):131–4. [DOI] [PubMed] [Google Scholar]
- 84.Tomlin A, Dovey S, Gauld R, Tilyard M. Better use of primary care laboratory services following interventions to ‘market’ clinical guidelines in New Zealand: a controlled before-and-after study. BMJ Qual Saf. 2011;20(3):282–90. [DOI] [PubMed] [Google Scholar]
- 85.Toubert ME, Chevret S, Cassinat B, Schlageter MH, Beressi JP, Rain JD. From guidelines to hospital practice: reducing inappropriate ordering of thyroid hormone and antibody tests. Eur J Endocrinol. 2000;142(6):605–10. [DOI] [PubMed] [Google Scholar]
- 86.Vidal-Trécan G, Toubert ME, Coste J, Paycha F, Durand-Zaleski I, Fulla Y, et al. Reducing the number of T3 orders in the Paris hospital network: towards better appropriatness of thyroid function test prescription. Ann Endocrinol. 2003;64(3):210–5. [PubMed] [Google Scholar]
- 87.Willis EA, Datta BN. Effect of an educational intervention on requesting behaviour by a medical admission unit. Ann Clin Biochem. 2013;50(2):166–8. [DOI] [PubMed] [Google Scholar]
- 88.Wong ET, McCarron MM, Shaw ST. Ordering of laboratory tests in a teaching hospital: can it be improved? JAMA. 1983;249(22):3076–80. [PubMed] [Google Scholar]
- 89.European Commission. Member State Coordination Group on HTA (HTACG). Guidance on the validity of clinical studies for joint clinical assessments. Directorate General for Health and Food Safety. 2024. Available from: https://health.ec.europa.eu/publications/guidance-validity-clinical-studies-joint-clinical-assessments_en.
- 90.Leis B, Frost A, Bryce R, Coverett K. Standard admission order sets promote ordering of unnecessary investigations: a quasi-randomised evaluation in a simulated setting. BMJ Quality & Safety. 2017;26(11):938–40. [DOI] [PubMed] [Google Scholar]
- 91.Rogers E. Diffusion of innovations. New York: THe Free Press; 1995. [Google Scholar]
- 92.van Walraven C, Naylor CD. Do we know what inappropriate laboratory utilization is? A systematic review of laboratory clinical audits. JAMA. 1998;280(6):550–8. [DOI] [PubMed] [Google Scholar]
- 93.Hulscher MEJL, Laurant MGH, Grol RPTM. Process evaluation on quality improvement interventions. Qual Saf Healthc. 2003;12(1):40–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Provost LP, Murray SK. The health care data guide: learning from data for improvement. Second edition. Hoboken, NJ: John Wiley & Sons; 2022.
- 95.Institute for Healthcare Improvement. How to improve: model for improvement; 2024 [cited 2024 Jul 25]. Available from: URL: https://www.ihi.org/resources/how-improve-model-improvement.
- 96.Alishahi Tabriz A, Turner K, Clary A, Hong Y-R, Nguyen OT, Wei G et al. De-implementing low-value care in cancer care delivery: a systematic review. Implementation Sci. 2022;17(1):24. [DOI] [PMC free article] [PubMed]
- 97.Mucha H, Robert S, Breitschwerdt R, Fellmann M. Usability of clinical decision support systems. Z Arb Wiss. 2023;77(1):92–101.
- 98.Kobewka DM, Ronksley PE, McKay JA, Forster AJ, van Walraven C. Influence of educational, audit and feedback, system based, and incentive and penalty interventions to reduce laboratory test utilization: a systematic review. Clin Chem Lab Med. 2015;53(2):157–83. [DOI] [PubMed] [Google Scholar]
- 99.Nilsen P, Ingvarsson S, Hasson H, Thiele Schwarz U von, Augustsson H. Theories, models, and frameworks for de-implementation of low-value care: a scoping review of the literature. Implement Res Pract. 2020;1:2633489520953762. [DOI] [PMC free article] [PubMed]
- 100.Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist review--a new method of systematic review designed for complex policy interventions. J health serv res policy. 2005;10 Suppl 1:21–34. [DOI] [PubMed]
- 101.Augustsson H, Casales Morici B, Hasson H, von Thiele Schwarz U, Schalling SK, Ingvarsson S, et al. National governance of de-implementation of low-value care: a qualitative study in Sweden. Health Res Policy Syst. 2022;20(1):92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Gemeinsamer Bundesauschuss (GBA). IndiQ – Entwicklung eines Tools zur Messung von Indikationsqualität in Routinedaten und Identifikation von Handlungsbedarfen und -strategien - G-BA Innovationsfonds; 2024 [cited 2024 Aug 2]. Available from: URL: https://innovationsfonds.g-ba.de/projekte/versorgungsforschung/indiq-entwicklung-eines-tools-zur-messung-von-indikationsqualitaet-in-routinedaten-und-identifikation-von-handlungsbedarfen-und-strategien.325.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Additional file 1 includes the AMSTAR assessment of the review by Zhelev et al [34].
Additional file 2. Additional file 2 includes the ROBIS assessment of the review by Zhelev et al. [34].
Additional file 3. Additional file 3 includes the PRISMA 2020 Checklist.
Additional file 4. Additional file 4 includes changes made to the information provided at registration.
Additional file 5. Additional file 5 includes the search strategies in Embase, Medline, Scopus, Cochrane, and Google Scholar.
Additional file 6. Additional file 6 includes the list of data items extracted in the review.
Additional file 7. Additional file 7 includes the full GRADE assessment of the interventions.
Additional file 8. Additional file 8 includes the information on all articles excluded after full-text screening, including reason for exclusion.
Additional file 9. Additional file 9 includes additional information on study characteristics, i.e. reported outcomes, reporting on theoretical foundations, and study period.
Additional file 10. Additional file 10 includes the visualisation of the Risk of Bias assessment for the (cluster) RCTs and controlled studies.
Additional file 11. Additional file 11 includes additional information on study results, i.e. the outcome values (pre/postintervention), notes on outcome measures and statistical indicators (confidence interval, p-value, relative reduction, difference in means).
Additional file 12. Additional file 12 includes additional information on study characteristics, i.e. funding and reported conflict of interest.
Data Availability Statement
All data generated or analysed during this study are included in this published article and its supplementary information files.

