Effectiveness and mechanisms of interventions to reduce low-value thyroid function tests: a systematic review

Carolina Pioch; Meik Hildebrandt; Gregor Goetz; Verena Vogt

doi:10.1186/s13643-026-03119-8

. 2026 Feb 25;15:111. doi: 10.1186/s13643-026-03119-8

Effectiveness and mechanisms of interventions to reduce low-value thyroid function tests: a systematic review

Carolina Pioch ^1,^✉,^#, Meik Hildebrandt ^1,^2,^#, Gregor Goetz ³, Verena Vogt ²

PMCID: PMC13040701 PMID: 41742315

Abstract

Objective

Thyroid function tests are frequently overused. This systematic review aims to summarise the effectiveness of behaviour change interventions to reduce low-value thyroid testing and to identify theoretical foundations and contextual factors associated with their success.

Design

We conducted a comprehensive search of Medline, Embase, Scopus, and the Cochrane Library for randomised and non-randomised controlled trials as well as before-and-after studies. We followed PRISMA guidelines, critically appraised study quality, and applied the GRADE approach to assess certainty of evidence. We categorised interventions as soft (education, reminders, feedback, guidelines) or structural (change in funding, clinical decision support systems).

Results

We included 47 studies (54 interventions) including five randomised trials. Structural interventions, particularly clinical decision support systems, were the most common (n = 28). Most interventions reported a reduction in low-value thyroid testing (n = 52), with 40 of them having effects ≥ 20%. However, the certainty of evidence was very low to moderate. Among 49 interventions assessing volume reduction (test rates, expenditure), only two reported increased test rates. All 24 studies that measured improvement of care (appropriateness, shift in ordering pattern, coefficient of variation among physicians) indicated positive developments. Only four interventions referenced theoretical foundations or contextual factors.

Conclusions

Structural interventions, especially clinical decision support systems, were most effective in reducing thyroid testing. While most interventions showed positive effects, the certainty of evidence remains limited, highlighting the need for more high-quality studies to support robust clinical practice changes. Our results may inform targeted interventions to reduce low-value thyroid testing at national, regional, and local levels.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-026-03119-8.

Keywords: Systematic review, Thyroid function tests, Low-value care, De-implementation, Clinical decision support systems

Introduction

Thyroid function tests (TFTs) rank among the most frequently ordered laboratory tests worldwide [1], particularly thyroid-stimulating hormone (TSH) tests for diagnosing and managing thyroid disorders [2]. Alongside TSH, free hormone tests (free T3, T4) are often included in thyroid panels, further increasing the number of TFTs conducted [3]. For instance, approximately 10 million TFTs are performed annually in the UK at an estimated cost of £30 million [4, 5]. In Germany, about 30% of the adult population undergo thyroid function testing each year [6].

Despite the high volume of TFTs, the prevalence of thyroid dysfunction in Europe is only 3.8%, with 259.1 new cases per 100,000 people per year [7]. This discrepancy has raised concerns about the potential overdiagnosis and unnecessary treatment of conditions such as subclinical, asymptomatic hypothyroidism, which often resolves without intervention [8]. Unnecessary testing for suspected thyroid diseases appears to be widespread, frequently leading to additional and potentially avoidable medical procedures [9].

Clinical practice guidelines recommend against routine TSH screening for asymptomatic adults without known thyroid disease (consensus-based evidence, [10, 11]). Additionally, Free T3 or T4 tests are not advised for screening hypothyroidism unless pituitary or hypothalamic disease is suspected or known (consensus-based evidence, [12, 13]). Instead, TSH measurement is the most reliable method for detecting common forms of hypothyroidism and hyperthyroidism [14–16]. Despite the Choosing Wisely (CW) initiative’s aims to implement these guidelines, significant challenges remain in effectively translating them into practice.

Recent research identified a considerable amount of low-value care in German claims data, with 24.8% to 35.5% of TFTs deemed inappropriate [17, 18]. A bi-national study from Canada and the UK showed that approximately one-third of adult patients without a clear indication for testing underwent at least one TSH test within a 2-year period [19]. Similarly, a study from France suggests that inappropriate TFT ordering remains common [20]. TFTs therefore represent an emblematic example of low-value diagnostic testing, characterised by high baseline use, limited alignment with disease prevalence, and the potential to trigger downstream testing cascades. De-implementation of such unnecessary tests is frequently discussed in international literature [21]. Several systematic reviews have shown that targeted de-implementation strategies can effectively reduce low-value care [6]. Various interventions, including educational programmes for healthcare providers [22, 23], reminders [24, 25], decision support systems [26, 27], and stricter regulatory policies [28, 29], have been proposed to reduce unnecessary TFTs in different healthcare settings. These interventions can optimise clinical laboratory stewardship, contribute to cost savings, and improve healthcare resource allocation [3, 30].

Theoretical frameworks are essential for informing interventions aimed at de-implementing low-value care by identifying key elements that need to be addressed [31]. For instance, the Theoretical Domains Framework (TDF) explores barriers and facilitators for behaviour change [32], while the Choosing Wisely De-Implementation Framework systematically reduces low-value care [33]. A systematic review published by Zhelev et al. in 2016 provided an overview of behaviour change interventions to reduce the volume of TFTs ordered [34]. However, due to the studies’ poor methodological quality and reporting, strong conclusions could not be drawn, nor could specific intervention types be recommended. Furthermore, the theoretical foundations and contextual factors that contribute to the successful implementation of the interventions have not yet been thoroughly investigated. Since the publication of that review, a substantial number of new research related to interventions aimed at reducing the ordering of TFTs has emerged. In particular, as the latest studies included in the earlier review were published in 2014, before the widespread adoption of digital ordering systems and the increasing use of digital interventions, technological developments also need to be considered. In light of both the limitations of the earlier review and the growing body of relevant research, we are conducting a new systematic review, building on the work of Zhelev et al. [34].

Our review aims to identify effective strategies and their contexts by examining a series of interventions targeting the reduction of TFT orders. We address the following research questions (RQ):

What is the effectiveness of behaviour change interventions in reducing the ordering of TFTs?
Which theoretical foundations are used to explain the mechanisms underlying the interventions and which contextual factors are associated with the success of interventions aimed at improving evidence-based thyroid diagnostics?

In doing so, we build on the scope of the previous review: RQ1 was retained for continuity, while RQ2 was newly introduced to enhance the understanding of how and why interventions may work in different settings, drawing on information available in the included studies.

Methods

We first manually searched for well-conducted systematic reviews on the same topic and identified one published in 2016 [34]. The review was assessed for methodological quality and risk of bias, showing moderate quality according to the AMSTAR 2 tool [35, 36] and a low risk of bias (RoB) evaluated by the ROBIS tool (RoB In systematic reviews, [37]). Assessments are available in Additional files 1 and 2.

Our review was guided by the Cochrane methodology and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement [38–40]. The PRISMA checklist is provided in Additional file 3.

Registration

We prospectively registered the review in PROSPERO (CRD42023492441). Changes to the information provided at registration are reported in Additional file 4.

Data sources, searches, and selection

We conducted a systematic literature search in line with the previous review by Zhelev et al. [34], using the same databases: MEDLINE (Ovid), Embase (Ovid), and the Cochrane Central Register of Controlled Trials (CENTRAL, the Cochrane Library). Searches were performed on the 21 st of November 2023. To increase comprehensiveness, we extended the methodology by also searching Scopus, screening the first 300 references from Google Scholar, hand-searching the reference lists of included articles, and contacting experts in the field to identify any additional relevant studies. We limited the search period for MEDLINE, Embase, and CENTRAL to articles published between the 1 st of January 2014 and the 21 st of November 2023, as articles published before 2014 were already screened for eligibility in the earlier review [34]. For Scopus and Google Scholar, no publication date limits were applied, as these databases were not included in the earlier review. We performed an update of the search on the 7th of July 2024.

We applied the same search strategy as Zhelev et al. [34], consisting of search terms related to (1) thyroid function tests and (2) inappropriate testing (Additional file 5). We included studies involving adults receiving TFTs in inpatient, outpatient, or emergency care settings. Eligible interventions comprised behaviour change interventions targeting physicians. Studies were required to compare these interventions to usual care and report outcomes related to test volume, appropriateness, costs, or patient health. We included randomised controlled trials (RCTs), non-randomised controlled studies, and before-and-after studies providing comparative data. We excluded studies that lacked a comparator, did not provide relevant outcome data, focused only on thyroid tests as part of a broader panel without disaggregated results, or were cross-sectional studies, opinion pieces, dissertations, or meeting abstracts. Only studies published in English or German were considered. All inclusion and exclusion criteria for assessing the effectiveness of the interventions (RQ1) followed those of the prior review and are outlined in Table 1.

Table 1.

Inclusion and exclusion criteria of studies based on the PICO framework (Population, Intervention, Control, Outcome)

Attribute	Inclusion criteria	Exclusion criteria
Population	Adults receiving thyroid function tests
Intervention	Behaviour change intervention types [41, 42]: Educational interventions Guideline and protocol development and implementation Changes to funding policy Reminders of existing guidelines and protocols Clinical decision support systems, including test request forms and computer-based decision support Audit and feedback
Control	Usual care
Outcome	Change in the total number of thyroid function tests Number of inappropriately ordered thyroid function tests Test-related expenditure Health benefits to individual patients	Studies encompassing ordered thyroid function tests along with other laboratory tests but reporting only the average effect (across all tests)
Study type	Randomised controlled trials Non-randomised controlled studies Before-and-after studies providing comparative data on at least one of the outcome measures	Cross-sectional studies Editorials and opinions Studies without comparative data Dissertations and meeting abstracts
Setting	Inpatient care, outpatient care, emergency department
Language	English or German	All other languages

Open in a new tab

To address our additional RQ2, we supplemented the prior methodology by searching grey literature and conducting targeted searches for additional publications by the authors of the included interventions. This was done to determine whether potential contextual factors and theoretical backgrounds were described in the development and implementation of the interventions under review. We imported all search results into Endnote software and removed any duplicates before proceeding to the screening phase [43]. Based on the pre-defined eligibility criteria, two review authors (CP, MH) independently conducted title-abstract and full-text screening. Discrepancies were resolved by consensus and by consulting a third researcher (VV). No automation tools were used in the process.

Data extraction and analysis

Data from the included studies were extracted by means of piloted data-extraction tables by two independent researchers (CP, MH). A list of all data items and detailed descriptions can be found in Additional file 6. Any discrepancies between the researchers were discussed until consensus was reached. For all identified RCTs, we contacted study authors via email in case of missing study protocols.

We used relative change (improvement/deterioration) as the standardised outcome metric, chosen due to the heterogeneity in outcomes and reporting. Following the approach outlined by Zhelev et al. [34], we set a threshold of ± 20% to interpret a relative change as large, indicating a substantial change in testing behaviour rather than minor variation. The outcomes included (1) changes in the total number of thyroid function tests, (2) test-related expenditure, (3) the number of inappropriately ordered thyroid function tests (appropriateness), (4) the pattern of ordering, and (5) the coefficient of variation (CoV) among physicians. We extracted the direction of the effect (positive/negative subject to the desired outcome measure). Confidence intervals and differences in means were extracted when reported. All calculated values are indicated as such.

We refrained from conducting a meta-analysis due to anticipated clinical heterogeneity among the studies. Specifically, we expected variability in intervention types, timing, and outcome measures, such as shifts towards TSH ordering, test volumes per population unit, or counts of laboratory tests per provider. Given these anticipated differences, a quantitative synthesis was not planned. Instead, we employed a narrative synthesis method. The analysis framework relied on an existing typology of behaviour change intervention types [41, 42], encompassing the following categories: (1) clinical decision support systems (CDSS), (2) changes to funding policy, (3) educational interventions, (4) reminders of existing guidelines and protocols, (5) guideline and protocol development and implementation, and (6) audit and feedback [34]. Furthermore, we introduced a grouping for the interventions and outcomes to support descriptive analysis and improve clarity of reporting (visualised in Additional file 7). We grouped the six types of interventions into structural and soft interventions based on their influence on physician behaviour. While we were guided by terminologies used in previous literature (active and soft [44], strict and soft [45], structural [46]), we developed the final categorisation to best reflect the characteristics of the interventions identified. The five outcome measures were divided into two categories: volume reduction (test rates, expenditure) and improvement of care (appropriateness, pattern, CoV).

Structural interventions directly influence physicians at the point of care and are integrated into their routine workflow, making them difficult to bypass. These interventions include changes in funding, where modifications to financial structures or incentives directly affect decision-making, and CDSS, which are embedded tools that directly guide or restrict physicians' choices during patient care. CDSS include alerts, changes in the existing order form, cost displays and reflex testing, respectively, as automatic discharge of tests. Soft interventions are those that provide physicians with informational resources or guidance outside of the immediate care setting, i.e. reminders, education, guidelines/protocols, and audit/feedback. These interventions are less directly tied to the point of care and may require active engagement from physicians to be used effectively, such as through educational meetings or reminder messages. They may also include tools that are not necessarily part of the standard working routine, like memorandum pocket cards or guidelines.

For RQ2, we extracted information on theoretical foundations and contextual factors when explicitly reported in the included studies or related publications. However, reporting was sparse and inconsistent, leaving little scope for synthesis or categorisation. Therefore, findings are presented narratively, with reference to specific theoretical models or quality improvement frameworks where available.

Bias assessment and certainty of evidence

We used the Cochrane RoB 2.0 tool (RoB 2) for (cluster) RCTs and the ROBINS-I tool (RoB In Non-randomised Studies—of Interventions) for non-randomised studies of interventions (NRSI [47, 48]). RoB figures were created for each outcome domain separately using the robvis application [49].

The interventions in our review were evaluated using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach, following Murad et al.’s guide for complex interventions [50, 51]. We graded the evidence both by outcome—assessing the overall effectiveness of intervention bundles from a policymaker/payer perspective—and by grouped interventions and outcomes. This dual grading approach aimed to inform policymakers on the efficacy of intervention bundles and to guide practitioners on the most effective components for patient care. To minimise potential publication bias, we expanded the databases included in the initial review, performed a grey literature search, and looked into registered trials (EU Clinical Trials Register clinicaltrialsregister.eu/ctr-search/search, WHO trialsearch.who.int/Default.aspx). All assessments were performed independently by two reviewers (CP and MH). Any disagreements were resolved by consensus involving a third reviewer (GG).

Results

Study selection

From 2782 records screened, we identified 21 additional studies beyond those reported by Zhelev et al. [34], leading to a total of 47 unique studies included in this review. The updated search revealed no additional studies (213 new records screened). All identified studies were completed. Most of the identified studies used a before-and-after design (n = 34, including time series analysis), followed by nine non-randomised controlled studies and five (cluster) RCTs, two of which had a registration record. We contacted the authors of the remaining three studies and received a response from one of them. The study selection process is presented in Fig. 1. A list of all articles excluded after full-text screening and reasons for exclusion can be found in Additional file 8.

Fig. 1 — PRISMA flowchart of the systematic literature search. TFTs, thyroid function tests

Study characteristics

Most studies were conducted in the USA (n = 16), Canada (n = 8), and the UK (n = 6). They were published between 1979 and 2022. The majority of the studies were conducted in outpatient (n = 21) and inpatient care (n = 14). The remainder were conducted in both or in emergency departments (n = 12, Tables 2 and 3). Most of the studies were performed at a single medical site (n = 29, Table 2).

Table 2.

Characteristics of the included studies

Study and country	Year	Study design	Setting	Target tests	Thyroid tests
Studies identified in present review (n = 21)
Bateman et al., Canada [52]	2019	Before and after, single site	Inpatient rehabilitation centre	TSH + Vitamin D	TSH
Bejjanki et al., USA [53]	2018	Before and after, single site	Academic medical centre	17 laboratory tests	TSH, FT4
Bellodi et al., Italy [54]	2017	Controlled study, multiple sites	3 hospitals (10 wards)	8 laboratory tests	FT4
Bradshaw et al., USA [55]	2021	Before and after, single site	Academic Medical Centre	TFTs only	TSH, FT3, FT4
Caldarelli et al., Italy [56]	2017	Before and after, single site	Hospital (clinical laboratory)	TFTs only	TSH, FT3, FT4
Chami et al., Canada [45]	2021	Time series analysis with control group, multiple sites	Outpatient laboratories	8 laboratory tests	TSH
Dalal et al., USA [57]	2017	Before and after, single site	Urban teaching hospital	TFTs only	TSH, FT3, FT4
Delvaux et al., Belgium [58]	2020	Cluster RCT	72 primary care centres	17 laboratory tests	TSH
Elrewini et al., Saudi Arabia [59]	2022	Before and after, single site	Armed Forces Hospital	TFTs only	TSH
Gilmour et al., Canada [30]	2017	Before and after, single site	Academic ambulatory hospital	TFTs only	TSH, FT3, FT4
Janssens et al., Netherlands [44]	2015	Before and after, single site	General care and teaching hospital	82 laboratory tests	TSH, FT4
Krouss et al., USA [60]	2022	Interrupted time series, multiple sites	11 hospitals and over 70 ambulatory centres	TFTs only	T3, FT3
Leis et al., Canada [61]	2019	Before and after, single site	Coronary Care Unit (teaching hospital)	TFTs only	TSH
Leung et al., USA [62]	2017	Before and after, single site	Resident clinic in an outpatient clinic	70 laboratory tests	TSH, FT3, FT4
MacPherson et al., Australia [63]	2005	Before and after, single site	Pre-admission clinic	8 pathology tests + various investigations	TFTs (unspecified)
Muris et al., Netherlands [64]	2021	Before and after, multiple sites	57 general practices	22 laboratory tests	TSH, FT4
Notas et al., Greece [65]	2018	Before and after, single site	Tertiary teaching hospital	TFTs only	TSH, FT3, FT4, TGAb TPOAb
Salinas et al., Spain [66]	2016	Before and after, multiple sites	Public University Hospital + 9 primary care centres	13 laboratory tests	TSH, FT4
Sue et al., USA [67]	2019	Before and after, multiple sites	All outpatients within urban tertiary/quaternary care academic health system	TFTs only	T3
Taher et al., Canada [68]	2020	Before and after, single site	Tertiary hospital	TFTs only	FT3, FT4
Wintemute et al., Canada [69]	2019	Controlled study, multiple sites	6 family health teams	TFTs only	TSH
Studies included in previous review (n = 27)
Adlan et al., UK [70]	2011	Before and after, single site	Medical Assessment Unit (hospital, acutely ill patients)	TFTs only	TFTs (TSH, FT4, TPOAb, TRAb)
Baker et al., UK [71]	2003	Cluster RCT	33 general practices	5 laboratory test groups	TFTs (TSH, FT4)
Berwick and Coltin, USA [23]	1986	Controlled cross-over, multiple sites	3 ambulatory centres (same health maintenance organization)	13 laboratory and imaging tests	T4
Chu et al., Australia [72]	2013	Before and after, single site	tertiary teaching hospital emergency department	23 laboratory tests	TFTs (unspecified)
Cipullo and Mostoufizadeh, USA [73]	1996	Before and after, single site	Community hospital	20 laboratory tests and preoperative testing (unspecified)	TFTs (T3RU)
Daucourt et al., France [25]	2000	Cluster RCT	General and psychiatric hospitals	TFTs only	TFTs (TSH, FT3, FT4, TRH)
Dowling et al., USA [74]	1989	Before and after, single site	Inner city community health centre	TSH + CBC	TSH
Emerson and Emerson, USA [26]	2001	Before and after, single site	University medical centre	All laboratory tests	TSH, T3, FT4, T4, FTI/T3RU
Feldkamp and Carey, USA [75]	1996	Before and after, single site	Metropolitan hospital and 22 satellite clinics	TFTs only	TSH, T3, T4, FTI/T3RU
Gama et al., UK [76]	1991	Controlled study, single site	District general hospital	5 laboratory test groups + others	TFTs (TSH, FT4)
Grivell et al., Australia [77]	1981	Before and after, single site	Tertiary-care community hospital	55 laboratory tests	T4
Hardwick et al., Canada [29]	1982	Before and after, multiple sites	Outpatient laboratories	TFTs only	T3, T4, ETR
Horn et al., USA [78]	2013	Interrupted time series with control group, multiple sites	Alliance of 5 multispecialty group practices	27 laboratory tests	TSH
Larsson et al., Sweden [22]	1999	Before and after, multiple sites	19 primary care centres	14 laboratory test groups	TSH, T3, FT4
Mindemark and Larsson, Sweden (follow up) [79]	2009	Before and after, multiple sites	16 primary healthcare centres	12 laboratory test groups	TSH, T3, T4, FT4
Nightingale et al., UK [80]	1994	Before and after, single site	Supra-regional liver unit (teaching hospital)	Various laboratory tests	TSH
Rhyne and Gehlbach, USA [81]	1979	Before and after, single site	Family Medicine group practice	TFTs only	TFTs (T3RU and T4)
Schectman et al., USA [82]	1991	Controlled study, single site	Primary care health maintenance organization practice	TFTs only	TFTs (TSH, T3-RIA, T3RU, T4)
Stuart et al., Australia [83]	2002	Before and after, single site	Public hospital emergency department	14 laboratory and 10 imaging tests	TFTs (unspecified)
Thomas et al., UK [24]	2006	Cluster RCT	85 primary-care practices	9 laboratory tests	TSH
Tierney et al., USA [27]	1988	RCT, single site	Academic internal medicine practice	8 laboratory tests	TSH
Tomlin et al., New Zealand [84]	2011	Controlled study, multiple sites	New Zealand primary care	8 laboratory tests	TSH, FT3, FT4
Toubert et al., France [85]	2000	Before and after, single site	Teaching hospital	TFTs only	TSH, FT3, FT4, TPOAb, TGAb, TRAb
Van Walraven et al., Canada [28]	1998	Interrupted time series, multiple sites	All clinical laboratories (not based in hospitals)	7 laboratory tests	TSH, T3RU, T4
Vidal-Trecan et al., France [86]	2003	Before and after, multiple sites	50 university hospitals	TFTs only	TSH, T3, FT3, T4, FT4
Willis and Datta, UK [87]	2013	Before and after, single site	Medical admissions unit in a district general hospital	3 laboratory test groups	TFTs (unspecified)
Wong et al., USA [88]	1983	Controlled study, single site	University teaching hospital	6 laboratory tests	TSH, T3RU,T3-RIA, T4-RIA

Open in a new tab

CBC complete blood count, ETR effective thyroxine ratio, FT4 free thyroxine, FT3 free triiodothyronine, FTI free T4 index or free thyroxine index, RCT randomised controlled trial, RIA radioimmunoassay, TFTs thyroid function tests, TGAb thyroglobulin antibodies, TPOAb thyroid peroxidase antibodies, TSH thyroid stimulating hormone (thyrotropin), T3 triiodothyronine, T3RU triiodothyronine resin uptake, TRH TSH-releasing hormone, TRAb TSH-receptor antibodies

Table 3.

Interventions of the included studies

Study and country	Setting	Soft interventions				Structural interventions
		Audit and feedback	Educational programmes	Guidelines and protocols	Reminders	Changes to funding	CDSS	Description of decision tool
Studies identified in present review (n = 21)
Bateman et al., Canada [52]	Inpatient	X	X					-
Bejjanki et al., USA [53]	Inpatient						X	Alert
Bellodi et al., Italy [54]	Inpatient						X	Alert
Bradshaw et al., USA [55]	Inpatient + ED		X	X			X	Alert
Caldarelli et al., Italy [56]	Outpatient						X	Reflex/Discharge
Chami et al., Canada [45]	Outpatient						X	Change of order form
Dalal et al., USA [57]	Inpatient						X	Reflex/Discharge
Delvaux et al., Belgium [58]	Outpatient						X	Alert
Elrewini et al., Saudi Arabia [59]	Unspecified		X	X			X	Alert
Gilmour et al., Canada [30]	Outpatient		X				X	Reflex/Discharge
Janssens et al., Netherlands [44]	Inpatient + outpatient			X				-
Krouss et al., USA [60]	Inpatient + outpatient						X	Alert
Leis et al., Canada [61]	Inpatient						X	Change of order form
Leung et al., USA [62]	Inpatient		X		X			-
MacPherson et al., Australia [63]	Inpatient			X			X	Change of order form
Muris et al., Netherlands [64]	Outpatient						X	Cost display
Notas et al., Greece [65]	Inpatient + outpatient						X	Reflex/Automatic discharge
Salinas et al., Spain [66]	Inpatient + outpatient						X	Reflex/Automatic discharge
Sue et al., USA [67]	Outpatient						X	Alert
Taher et al., Canada [68]	Inpatient + outpatient						X	Change of order form + Reflex/Automatic discharge
Wintemute et al., Canada [69]	Outpatient	X		X	X			-
Studies included in previous review (n = 27)
Adlan et al., UK [70]	Inpatient			X				-
Baker et al., UK [71]	Outpatient	X		X				-
Berwick and Coltin, USA PCF$ [23]	Outpatient	X						-
Berwick and Coltin, USA PCFY [23]	Outpatient	X						-
Berwick and Coltin, USA TSE [23]	Outpatient		X					-
Chu et al., Australia [72]	ED						X	Change of order form
Cipullo and Mostoufizadeh, USA [73]	Inpatient			X				-
Daucourt et al., France MPC [25]	Inpatient				X			-
Daucourt et al., France TRF [25]	Inpatient						X	Change of order form
Daucourt et al., France Both [25]	Inpatient				X		X	Change of order form
Dowling et al., USA [74]	Outpatient	X	X					-
Emerson and Emerson, USA [26]	Outpatient						X	Change of order form
Feldkamp and Carey, USA [75]	Inpatient + outpatient						X	Reflex/Automatic discharge
Gama et al., UK [76]	Inpatient + outpatient	X						-
Grivell et al., Australia [77]	Inpatient	X						-
Hardwick et al., Canada [29]	Outpatient			X		X		-
Horn et al., USA [78]	Outpatient						X	Cost display
Larsson et al., Sweden [22]	Outpatient		X					-
Mindemark and Larsson, Sweden (follow up) [79]	Outpatient		X					-
Nightingale et al., UK [80]	Inpatient	X		X			X	Change of order form
Rhyne and Gehlbach, USA [81]	Outpatient		X	X				-
Schectman et al., USA Reminder + Feedback [82]	Outpatient	X	X		X			-
Schectman et al., USA Reminder [82]	Outpatient		X		X			-
Stuart et al., Australia [83]	ED	X	X	X				-
Thomas et al., UK Feedback [24]	Outpatient	X						-
Thomas et al., UK Reminder [24]	Outpatient				X			-
Thomas et al., UK Both [24]	Outpatient	X			X			-
Tierney et al., USA [27]	Outpatient						X	Alert
Tomlin et al., New Zealand [84]	Outpatient	X	X	X				-
Toubert et al., France [85]	Inpatient + outpatient			X	X			-
Van Walraven et al., Canada [28]	Outpatient			X		X	X	Change of order form
Vidal-Trecan et al., France [86]	Inpatient + outpatient		X	X			X	Change of order form
Willis and Datta, UK [87]	Inpatient		X	X				-
Wong et al., USA [88]	Inpatient			X			X	Change of order form

Open in a new tab

Multiple interventions per study (indicated in study column) or multiple types of intervention per intervention (multi-component interventions) possible. Alerts including best practice pop-ups, test suggestions or information (e.g. test rates, disease probabilities)

CDSS clinical decision support system, ED emergency department, MPC memorandum pocket card, TRF test request form, TSE test-specific education, PCFY peer comparison feedback on yield of tests, PCF$ peer comparison feedback on cost of test use

The average total observation period of the studies was 25 months, with an average of 12-month preintervention phase and a 14-month intervention/postintervention period (additional information on study characteristics is provided in Additional file 9). In total, 19 studies targeted TFTs only, while the remaining 28 studies aimed to reduce a broader number of laboratory and imaging tests. A total of 54 distinct interventions were performed within the 47 studies (Table 2).

While the previous review identified soft interventions (education, guidelines/protocols, reminders, and audit/feedback) as the most common types [34], our review revealed that most of the studies introduced structural interventions (CDSS and changes in funding, n = 30). Of these, only two studies (no newly identified studies) assessed changes to funding (Table 3). Therefore, CDSS is the most widely used type (n = 28), with 17 of the 21 newly identified studies employing CDSS. The most common CDSS involved changes of the order form (n = 12), alerts (n = 8), or reflex with automatic discharge (n = 7). Alerts included information about best practice, suggestions for testing and discharge, or probabilities of thyroid disease. Additionally, two studies displayed the costs of the tests being ordered. Among the soft interventions, guidelines/protocols (n = 18) and education (n = 16) were most commonly employed, followed by audit/feedback (n = 14) and reminders (n = 9). In total, 19 interventions consisted solely of soft components, 25 involved structural changes, and ten combined both components (Table 3).

Bias assessment

The majority of the studies did not have a control group (n = 33). Following an approach outlined by HTACG, we refrained from conducting a formal RoB assessment for these studies, as their inherent lack of internal validity is unlikely to be changed by a RoB assessment [89].

Two RCTs were of low RoB; the remaining three had some concerns due to potential selection bias. Primarily due to the risk of confounding, the RoB in controlled studies (n = 9) was critical in three studies (n = 4 outcomes), serious in six studies (n = 8 outcomes), and moderate in one study. The outcomes within a study showed the same RoB for each domain due to the relation of the outcome measures. The full RoB assessment can be found in Additional file 10.

Effectiveness of the interventions

Of the 54 interventions, the majority showed a positive direction of change (n = 52), a considerable number had effects ≥ 20% (n = 40), and many reported significant changes (n = 29, with 19 not reported; Table 4) (RQ1). Further synthesis was impracticable due to the heterogeneity of measures, tests, and reporting standards. Only 12 interventions reported confidence intervals, while the difference in means was even less common (n = 7). Relative change ranged from a 32% decrease to a 172% improvement (additional information on study results and reported effect measures is provided in Additional file 11). We refrained from computing means due to the variability of measures within the outcomes. The most frequently assessed outcome was the number or rates of tests (n = 43), with 41 showing a positive effect. A large proportion of these had effects of ≥ 20% (n = 30). Significant changes were reported in 22 of the 43 interventions, while 17 did not report significance. Expenditure outcomes were less common (n = 9), but all showed positive changes, with significant changes reported in two interventions (six not reported). Relative improvements ≥ 20% were shown by five interventions assessing the expenditure. A total of 49 interventions assessed volume-related outcomes (test numbers/rates, expenditure; Table 4).

Table 4.

Results of the interventions

Study and country	Type of Study	Intervention	Direction of effect	Relative change $\geq 20 %$	Significant effect	Notes on outcome measure
Appropriateness
Daucourt et al., France MPC [25]	RCT	Reminder + Decision tool	+	−*	−	Proportion of TFTs ordered in accordance with guidelines
Daucourt et al., France TRF [25]	RCT	Reminder + Decision tool	+	+ *	+	See above
Daucourt et al., France Both [25]	RCT	Reminder + Decision tool	+	+ *	NR	See above
Delvaux et al., Belgium [58]	RCT	Decision tool	+	−	+	Appropriate tests/total orders
Schectman et al., USA Reminder + Feedback [82]	Controlled	Reminder + Feedback	+	+ *	NR	Compliance rate with TFT protocol
Schectman et al., USA Reminder [82]	Controlled	Reminder	+	+ *	+	See above
Caldarelli et al., Italy [56]	Uncontrolled	Decision tool	+	−*	NR	Prescriptive appropriateness through the ratios TSH/FT4, TSH/FT3 and the ratio “TSH Reflex”/TSH
Dowling et al., USA [74]	Uncontrolled	Education + Feedback	+	+ *	−	Indicated TSH/visit
Elrewini et al., Saudi Arabia [59]	Uncontrolled	Education + Guidelines	+	−	+	Unnecessary requests/total TSH requests
Feldkamp and Carey, USA [75]	Uncontrolled	Decision tool	+	+ *	NR	Shift towards TSH
Leis et al., Canada [61]	Uncontrolled	Decision tool	+	+ *	+	Proportion of not indicated physician TSH orders
Nightingale et al., UK [80]	Uncontrolled	Guidelines + Decision tool + Feedback	+	+ *	NR	Patients requiring an investigation/patients tested
Rhyne and Gehlbach, USA [81]	Uncontrolled	Education + Guidelines	+	+ *	−	High/low indication test proportion
Toubert et al., France [85]	Uncontrolled	Guidelines + Reminders	+	+ *	+	Frequency of appropriate use of thyroid function tests
Coefficient of variation (CoV)
Berwick and Coltin, USA PCF$ [23]	Controlled	Feedback	+	+	NR	CoV of rate of test use among physicians within centres
Berwick and Coltin, USA PCFY [23]	Controlled	Feedback	+	+	NR	See above
Berwick and Coltin, USA TSE [23]	Controlled	Education	+	−	NR	See above
Expenditure
Tierney et al., USA [27]	RCT	Decision tool	+	−	−	Charges per visit
Tomlin et al., New Zealand [84]	Controlled	Education + Guidelines + Feedback	+	−	NR
Bejjanki et al., USA [53]	Uncontrolled	Decision tool	+	+ *	NR	Cost savings from reducing duplicates
Caldarelli et al., Italy [56]	Uncontrolled	Decision tool	+	−*	NR
Elrewini et al., Saudi Arabia [59]	Uncontrolled	Education + Guidelines	+	+ *	NR	Cost spent on the unnecessary requests of TSH tests
Hardwick et al., Canada [29]	Uncontrolled	Guidelines + Change in Funding	+	+	NR	Change in total expected costs
Janssens et al., Netherlands [44]	Uncontrolled	Guidelines	+	+ *	NR
Leung et al., USA [62]	Uncontrolled	Education + Reminder	+	NR	+	Change of laboratory costs
Stuart et al., Australia [83]	Uncontrolled	Education + Guidelines + Feedback	+	+	+	Mean costs per patient
Pattern
Tomlin et al., New Zealand [84]	Controlled	Education + Guidelines + Feedback	+	+	+	Shift towards TSH
Wong et al., USA [88]	Controlled	Guidelines + Decision tool	+	+ *	+	Sought to reduce complete thyroid panels
Emerson and Emerson, USA [26]	Uncontrolled	Decision tool	+	+ *	+	Shift towards FT4 and thyroid cascade
Hardwick et al., Canada [29]	Uncontrolled	Guidelines + Change in Funding	+	+ *	NR	Sought to reduce proportion of T3 tests
Larsson et al., Sweden [22]	Uncontrolled	Education	+	−*	TSH/TFTs: + T3/TSH: + T4/TSH: −	Shift towards TSH (primary care centres)
Larsson et al., Sweden [22]	Uncontrolled	Education	+	+ *	NR	Shift towards TSH (individual physicians)
Mindemark and Larsson, Sweden (follow up) [79]	Uncontrolled	Education	+	T3/TSH + * T4 + FT4/ TSH −*	−	Median of physicians; shift towards TSH
Toubert et al., France [85]	Uncontrolled	Guidelines + Reminders	+	+ *	NR	Shift towards TSH
Van Walraven et al., Canada [28]	Uncontrolled	Guidelines + Change in Funding + Decision tool	+	+	+	Shift towards TSH
Vidal-Trecan et al., France [86]	Uncontrolled	Education + Guidelines + Reminders + Decision tool	+	−	NR	Shift towards TSH
Test numbers or rates
Baker et al., UK [71]	RCT	Guidelines + Feedback	+	−*	−	Tests per 1,000 patients
Thomas et al., UK Feedback [24]	RCT	Feedback	+	−*	+	Tests per 10,000 patients
Thomas et al., UK Reminder [24]	RCT	Reminder	+	−*	+	See above
Thomas et al., UK Both [24]	RCT	Reminder + Feedback	+	NR	NR	See above
Bellodi et al., Italy [54]	Controlled	Decision tool	+	Delta: − Cento: + Ferrara: −	NR	Number of laboratory tests requested by wards
Berwick and Coltin, USA PCF$ [23]	Controlled	Feedback	+	−	NR	Tests per 1,000 encounters per physician
Berwick and Coltin, USA PCFY [23]	Controlled	Feedback	−	+	NR	See above
Berwick and Coltin, USA TSE [23]	Controlled	Education	+	−	NR	See above
Chami et al., Canada [45]	Controlled	Decision tool	+	−	−	Number of thyroid tests
Gama et al., UK [76]	Controlled	Feedback	+	+	I: + C: −	Tests per outpatient visit
Horn et al., USA [78]	Controlled	Decision tool	+	NA	−	Monthly orders per 1,000 patients
Schectman et al., USA [82]	Controlled	Reminder + Feedback	+	−*	+	Number of TFTs per patient; Feedback and Non-Feedback group combined
Tomlin et al., New Zealand [84]	Controlled	Education + Guidelines + Feedback	+	TSH: − FT3/FT4: +	+	Tests per year per GP
Wintemute et al., Canada [69]	Controlled	Guidelines + Feedback	+	−	+
Wong et al., USA [88]	Controlled	Guidelines + Decision tool	+	TSH and T3RIA: + T3RU and T4RIA: −	NR	Tests per month
Adlan et al., UK [70]	Uncontrolled	Guidelines	+	+ *	+	Proportion of admitted patients offered TFTs
Bateman et al., Canada [52]	Uncontrolled	Education + Feedback	+	+	NR	Proportion of admitted patients offered TFTs
Bejjanki et al., USA [53]	Uncontrolled	Decision tool	+	FT4: − TSH: +	FT4: − TSH: +	Percentage change in the number of inpatient duplicate orders
Bejjanki et al., USA [53]	Uncontrolled	Decision tool	+	FT4: − TSH: +	FT4: − TSH: +	Odds of percentage duplicate
Bradshaw et al., USA [55]	Uncontrolled	Decision tool	+	+	TSH: − FT4: +	Number of inappropriate TSH tests ordered; FT3 excluded due to low baseline numbers
Caldarelli et al., Italy [56]	Uncontrolled	Decision tool	+	TSH: − FT4: − FT3: +	NR	Number of thyroid tests
Chu et al., Australia [72]	Uncontrolled	Decision tool	+	+ *	+	Number of tests ordered per 100 ED presentations
Cipullo and Mostoufizadeh, USA [73]	Uncontrolled	Guidelines	+	−	NR	Tests/discharge
Dalal et al., USA [57]	Uncontrolled	Decision tool	+	+	+	Number of tests of fT3 and fT4 orders per total TSH orders
Dowling et al., USA [74]	Uncontrolled	Education + Feedback	+	+ *	−	Rates of ordering TSH tests per visit
Emerson and Emerson, USA [26]	Uncontrolled	Decision tool	+	+ *	+	Test sets ordered (significance for total TFTs)
Feldkamp and Carey, USA [75]	Uncontrolled	Decision tool	+	TSH: −* T4: + * T3RU: + *	NR	Tests per 1,000 patients (T3 not reported)
Gilmour et al., Canada [30]	Uncontrolled	Education + Decision tool	+	+	+	Median number of tests performed (FT3 and FT4; TSH used for initial appropriateness)
Grivell et al., Australia [77]	Uncontrolled	Feedback	−	+ *	NR	Tests per 1,000 patients
Hardwick et al., Canada [29]	Uncontrolled	Guidelines + Change in Funding	+	+ *	NR
Janssens et al., Netherlands [44]	Uncontrolled	Guidelines	+	+ *	NR
Krouss et al., USA [60]	Uncontrolled	Decision tool	+	+ *	+	Orders per 1,000 patient days (inpatient)/per 1,000 encounters (outpatient)
Leis et al., Canada [61]	Uncontrolled	Decision tool	+	+ *	+	Patients with any TSH assay request/patients with physician-signed order
MacPherson et al., Australia [63]	Uncontrolled	Guidelines + Decision tool	+	+ *	+
Muris et al., Netherlands [64]	Uncontrolled	Decision tool	+	+ *	+	Mean test ordering rate per 1,000 patients per month per general practice
Notas et al., Greece [65]	Uncontrolled	Decision tool	+	+	+	Number of TFTs per TSH ordered (FT4 and FT3) and per cent patients with TFT order, inpatients
Notas et al., Greece [65]	Uncontrolled	Decision tool	+	+ *	NR	Number of TFTs per TSH ordered (FT4 and FT3), outpatients
Rhyne and Gehlbach, USA [81]	Uncontrolled	Education + Guidelines	+	−*	+	TFTs per 100 patients
Salinas et al., Spain [66]	Uncontrolled	Decision tool	+	+ *	NR	Ratio of FT4/TSH
Sue et al., USA [67]	Uncontrolled	Decision tool	+	+	+	T3 laboratory tests/10,000 patients per week
Taher et al., Canada [68]	Uncontrolled	Decision tool	+	+	NR	Total number of fT4 and fT3 tests per month
Toubert et al., France [85]	Uncontrolled	Guidelines + Reminders	+	+ *	NR
Van Walraven et al., Canada [28]	Uncontrolled	Guidelines + Change in Funding + Decision tool	+	TSH: − T4: +	+	Tests per 100,000 patients per month; comparison with expected values (T3RU not reported)
Vidal-Trecan et al., France [86]	Uncontrolled	Education + Guidelines + Reminders + Decision tool	+	−	NR
Willis and Datta, UK [87]	Uncontrolled	Education + Guidelines	+	+ *	+	Tests per admission

Open in a new tab

Interventions sorted by outcome and type of study. Effects based on numerical results that can be found in Additional file 11. Deviations from standard outcome measure listed in last column

^*Based on authors’ calculations (for values pre/postintervention see Additional file 11)

ED emergency department, FT4 free thyroxine, FT3 free triiodothyronine, GP general practitioner, MPC memorandum pocket card, NR not reported, TFTs thyroid function tests, TRF test request form, TSE test-specific education, TSH thyroid stimulating hormone (thyrotropin), T3 triiodothyronine, T3RU triiodothyronine resin uptake, PCFY peer comparison feedback on yield of tests, PCF$ peer comparison feedback on cost of test use, RCT randomised controlled trial, RIA radioimmunoassay

Improvement-related outcomes (appropriateness, pattern, and CoV) were assessed in 24 interventions. Appropriateness was frequently studied (n = 14), with all showing positive direction, many with effects ≥ 20% (n = 10), and significant changes in six studies (five not reported). Pattern changes were less common (n = 8), but all showed positive effects, with significant changes in some (n = 4, with three not reported). Effects of ≥ 20% were shown in three interventions. CoV outcomes were least assessed (n = 3), with no significant changes reported (two interventions showed relative improvements ≥ 20%). Over all outcomes and interventions, the results of structural interventions were slightly more positive, with 100% showing positive effects (61% significant) and 74% with large effect sizes, compared to combined and soft interventions (combined 100% positive (40% significant), 60% large effect; soft, 94% positive (44% significant), 47% large effects; Additional file 11).

To contextualise these findings, we next evaluated the certainty of evidence using GRADE. For structural interventions (CDSS, changes in funding), we found a significantly positive effect on the outcomes that measure improvement of care based on two cluster RCTs (n = 1 effect ≥ 20%, high certainty of evidence (CoE)). Four uncontrolled studies supported these findings (positive direction, n = 3 effects ≥ 20%, two significant). For volume reduction, one RCT indicated a trend towards reducing test rates (positive, not significant (NS), low CoE), with 16 non-randomised studies pointing in the same direction (positive direction, n = 12 effects ≥ 20%, n = 9 significant). For soft interventions, one cluster RCT indicated positive improvement of care (NS, moderate CoE), as well as ten non-randomised interventions (n = 9 effects ≥ 20%, four significant). Two cluster RCTs featuring four soft interventions indicated they could achieve volume reduction (two significant, moderate CoE). Of 19 non-randomised interventions, 17 showed a positive direction as well (n = 9 effects ≥ 20%, nine significant). Similarly, combined interventions showed positive effects on the improvement of care based on one cluster RCT (positive, effect ≥ 20%, NS, moderate CoE) and six non-randomised interventions (n = 3 effects ≥ 20%, n = 3 significant). No RCT assessed volume reduction, but eight non-randomised interventions showed a positive trend (n = 5 effects ≥ 20%, n = 3 significant). The full GRADE evidence profiles can be found in Additional file 7, organised by intervention type (Table 7.1) and outcome category (Table 7.2).

Most of the studies had unreported funding (n = 21), some were non-profit (n = 14), lacked a specific grant (n = 12), or had unclear funding (n = 1). Twenty-two studies reported ethics committee approval (n = 18) or stated that approval was not required (n = 4). Twenty-five studies did not report on ethics approval (Additional file 12 includes information on ethics approval, funding, and conflict of interest). Although we cannot entirely rule out an overestimation of the predominantly positive findings, there is no indication of significant publication bias. No relevant trials were found in the extended registry search. All evidence of non-randomised trials was rated to be of very low certainty (Additional file 7).

Theoretical foundations and contextual factors

Information on theoretical foundations and contextual factors provided by the included studies was sparse (RQ2). Four interventions reported on the theoretical foundations of their interventions, going further than conventional references to systematic reviews or guidelines (relevant text passages included in Additional file 9). We did not find additional literature reporting on theoretical foundations or contextual factors. Consideration of contextual factors beyond theoretical models was not explicitly mentioned in any included study.

Elrewini et al. (education + guidelines + retest alert) performed a root cause analysis to develop a corresponding action plan that implements the identified root causes [59]. Leis et al. (change of order form) used a simulated setting to assess unnecessary test ordering through a hypothetical patient scenario with a quasi-randomised sample of participants. They concluded that the presence of a checkbox influences ordering behaviour [61, 90]. Stuart et al. (feedback + education + guidelines) based the components of their intervention on the core elements of the PRECEDE framework (Predisposing, Reinforcing, and Enabling Causes in Educational Diagnosis and Evaluation, [41, 83]). Wintemute et al. (guidelines + feedback + reminder) based their choice of intervention on Rogers’ theory of diffusion of innovations, supplementing evidence-based recommendations with active reminders and local feedback ([69, 91], Additional file 9).

Additionally, three interventions performed Plan-Do-Study-Act (PDSA) cycles based on different approaches. Bateman et al. (feedback + education) applied a systematic approach using process measures and evaluation based on quality improvement literature [52, 92, 93]. Gilmour et al. (education + reflex testing) and Taher et al. (reflex testing) developed their interventions based on PDSA cycles using the model for improvement framework for continuous quality improvement by Provost et al. ( [30, 68, 94, 95], Additional file 9).

Discussion

Our review sought to evaluate interventions aimed at reducing unnecessary TFTs by reviewing and synthesising recent studies, building on the review by Zhelev et al. from 2016 [34]. We identified 21 new studies, contributing to a total of 47 unique studies included in our review. The interventions comprised soft (education, guidelines/protocols, reminders, and audit/feedback) and structural (CDSS and changes in funding) interventions. The synthesis of 54 interventions across the included studies revealed predominantly positive outcomes, with 52 interventions associated with reductions or improvements in at least one outcome. Most studies reported relative reductions of at least 20%. Restricting the evidence to the five included (cluster) RCTs reaffirmed this pattern, as all five trials reported some degree of beneficial impact. In this review, we applied the Cochrane methodology and the GRADE approach, enhancing the methodological rigour [38, 50]. Nevertheless, the overall certainty of evidence was rated as low, indicating that the observed effects should be interpreted with caution and primarily viewed as indicative of promising trends rather than definitive evidence of effectiveness.

We observed a shift towards structural interventions, particularly CDSS. Of the 21 newly identified studies, 17 employed some form of CDSS, with alerts being the most commonly reported. The clustered GRADE assessment suggested potential effectiveness of de-implementation interventions, where structural interventions showed slightly more compelling results (RQ1). RCT evidence for structural interventions indicated improvement of care (moderate CoE, n = 2 RCTs) and volume reduction (low CoE, n = 1 RCT). Though RCT evidence for soft interventions shows similar results (volume reduction n = 2 RCTs, improvement of care n = 1 RCT, both moderate CoE), observational evidence suggests a higher success rate for structural interventions (structural interventions: 100% positive, 61% significant; soft interventions: 94% positive, 44% significant). The increased use of structural interventions aligns with the expectation that direct approaches at the point of care may be more likely to yield a reduction in test orders compared to soft interventions [45]. A systematic review performed by Cliff et al. on the effectiveness of CW interventions concluded that structural interventions are more effective than soft approaches [21]. Similarly, a CDSS was found to be the most effective intervention in de-implementing low-value cancer care [96]. Further research is needed to evaluate the effectiveness of structural interventions, under various conditions such as system usability, integration with existing practices, and user engagement, which are often underreported [97]. In particular, it remains unclear how differences in CDSS design, implementation context, and integration into clinical workflows shape both the magnitude and durability of observed effects.

Next to potentially promising structural interventions, our study also identified soft interventions that, while less impactful, showed compelling effects in some settings. These findings suggest that both structural and soft interventions may be suitable options for reducing TFTs. Similarly, research by Kobewka et al., which examined interventions aimed at reducing all sorts of laboratory test utilisation, found positive results across all types of interventions [98]. In particular, soft interventions can be considered in settings where profound structural interventions are not feasible. Regardless of the specific intervention type, a recent overview of reviews by Kien et al. indicates that de-implementation strategies are effective across various low-value services [6]. This overview complements our review and can inform decision-makers about the range of interventions available for reducing low-value care, as well as the potential for de-implementation strategies beyond individual indications.

Further, our review sought to identify theoretical foundations considered during implementation and contextual factors that are associated with the effectiveness of the interventions (RQ2). However, reporting on these aspects was sparse in the identified studies and grey literature. Only a few studies reported using frameworks to guide their interventions. This limits our understanding of the mechanisms driving the observed effects and reduces the ability to replicate successful interventions in different settings, as the lack of theory-driven design makes it difficult to explain how contextual and behavioural factors influence effectiveness. For example, Stuart et al. based the components of their intervention on the core elements of the PRECEDE framework thereby aligning components with identified determinants of behaviour ([83], significant positive effect with RD ≥ 20%). Similarly, Wintemute et al. drew on Rogers’ theory of diffusion of innovations to supplement evidence-based recommendations with active reminders and local feedback ([69], significant positive effect). These examples illustrate how theoretical models can strengthen intervention design by making explicit the mechanisms through which behaviour change is expected to occur. However, despite the availability of several de-implementation frameworks, they appear to be rarely applied in practice [99]. A more consistent use of such frameworks across interventions would not only facilitate comparability but also strengthen the theoretical grounding of de-implementation strategies. Integrating examples such as digital readiness or organisational culture within such frameworks could help clarify how contextual factors interact with intervention mechanisms. Policymakers should support evidence-based interventions built on robust theoretical foundations and evaluation frameworks to ensure effectiveness and lasting impact by avoiding inefficient components. Once solid evidence has been generated through primary studies, a realist review may help to fully understand how and why different components of the intervention(s) work in what contexts and for whom [100]. The success of such system-level changes depends heavily on contextual conditions, such as the availability and interoperability of local infrastructure, prevailing funding mechanisms, and regulatory frameworks. These factors determine whether an intervention can be feasibly implemented and sustained, and they should be carefully considered when transferring findings to other settings. Although some studies reported larger effects for changes in the clinical ordering system compared to soft interventions, it cannot be assumed that such approaches are simultaneously less expensive, despite the absence of recurring training sessions and evaluations [98]. While CDSS may prove cost-efficient over time through automation and scalability, they often demand substantial initial investments in digital infrastructure. In contrast, soft interventions are typically less costly to implement initially but may require ongoing efforts to sustain their effects. This trade-off should be carefully considered when designing de-implementation strategies. Cost-effectiveness analysis is necessary to evaluate the costs of interventions relative to their savings, as interventions aimed at reducing low-value care can themselves be resource-intensive.

Limitations

There are several limitations to this review that should be acknowledged. First, we included observational studies due to the complexity of the interventions. Observational studies are generally more prone to internal validity concerns compared to RCTs [38]. Second, we did not pool the effects of the interventions quantitatively because of the heterogeneity of the interventions and outcome measures. Instead, we focused on the evaluation of clustered results in order to give a concise overview. While the classification into structural or soft interventions is partly based on literature, the distinction is not always clear-cut, as some interventions span both categories or include borderline elements, and several interventions in this review explicitly combined soft and structural components. Alternative ways of grouping interventions and outcomes may therefore lead to different interpretations of the findings. In addition, conducting an extended mixed-methods synthesis integrating qualitative and quantitative evidence was beyond the scope of this review, given the sparse and inconsistent reporting of theoretical foundations and contextual factors. Third, we frequently observed small effects, which may limit the strength of the conclusions, though the overall trends were generally positive. Still, only 12 of the included interventions reported confidence intervals, limiting the ability to assess the reliability of effect estimates. The lack of confidence intervals makes it more difficult to determine the statistical robustness of reported changes, and increases the uncertainty surrounding the true effectiveness of the interventions. Furthermore, the majority of studies reported outcomes over relatively short timeframes, with few studies providing extensive follow-up. Consequently, it remains unclear whether reductions in TFT ordering were sustained once interventions ended or whether rebound effects occurred, for example due to alert fatigue, CDSS-related workflow integration issues, or system changes. Future research should address the long-term sustainability of de-implementation efforts. Fourth, most of the included studies were conducted in North America and Europe, particularly the USA, Canada, and the UK. This may limit the generalisability to health systems in other regions, especially those with low resources or limited digital infrastructure. In such settings, soft interventions may be more readily feasible than CDSS-based approaches that require specific digital and regulatory prerequisites. In addition, the interventions under study may not be readily transferable to other settings in the countries of interest. For example, in the German healthcare system, there are various electronic health data management systems and providers of practice management software, each with varying levels of interoperability. The context in which interventions are deployed could produce different outcomes based on the technical and regulatory landscape. Regulatory efforts are necessary in order to incorporate customised solutions across multiple institutions simultaneously. Fifth, the literature search was restricted to English and German publications. While no German-language studies were identified and several included studies originated from non-English-speaking countries, the exclusion of other languages may have resulted in the omission of a small number of relevant studies.

Regarding publication bias, research on TFT reduction is predominantly publicly funded, with fewer incentives for researchers to withhold information in comparison to pharmaceutical trials and related reviews. Any intervention aiming to reduce low-value TFT tests is likely to lead to positive results compared to usual care. Meanwhile, significant outcomes (31 interventions, 57%) were not reported in excessive frequency in relation to non-significant outcomes or outcomes with no reported level of significance. Thus, we concluded that publication bias is not a major concern. However, the predominance of non-RCTs led to a generally low CoE, limiting generalisability. Most controlled studies had a serious or critical RoB due to potential confounding, while observational studies were categorically classified as having a critical RoB. Last, we cannot rule out the possibility that our findings are influenced by selective reporting of outcomes. However, similar to the issue of potential publication bias, selective reporting is unlikely to pose a significant problem, as the body of evidence includes enough compelling and consistently positive results. The identified limitations coincide with those found in similar reviews and research on related low-value care topics [21, 96, 101]. Thus, the implications for policy should be interpreted with the appropriate caution, taking into account the GRADE results (Additional file 7).

Conclusion

The evidence on de-implementation strategies for TFT ordering suggests that behaviour change interventions have the potential to significantly reduce excessive thyroid function testing. Particularly CDSS appear to be associated with promising results, though most studies are of high or critical risk of bias. If these findings hold in more rigorous trials, this would strengthen the evidence base for feasible workflow modifications to improve care. Policy and practice could then consider implementing controlled TFT reduction as a means to enhance appropriateness and possibly reduce costs. Continued research on cost-effectiveness will be essential to inform large-scale implementation.

Future research should focus on developing well-designed interventions based on a solid theoretical foundation and higher methodological rigour. In particular, RCTs and the use of hybrid effectiveness-implementation designs would allow more reliable evaluation and applicability. By systematically reporting contextual factors and mechanisms of change, future studies can strengthen the evidence base and support the replication of potentially effective de-implementation strategies across settings. In the short term, standardised cluster RCTs across diverse contexts could test the effectiveness of CDSS, while medium-term studies should assess the sustainability of their effects. In the longer term, theory-based realist syntheses may help clarify what works, for whom, and why. This review can help inform the design of such interventions by identifying which specific interventions or components may be associated with greater effectiveness. Ongoing evaluation of these studies can identify the mechanisms of change and facilitate the replication of successful de-implementation interventions across various settings.

Supplementary Information

13643_2026_3119_MOESM1_ESM.docx^{(51.6KB, docx)}

Additional file 1. Additional file 1 includes the AMSTAR assessment of the review by Zhelev et al [34].

13643_2026_3119_MOESM2_ESM.docx^{(43.4KB, docx)}

Additional file 2. Additional file 2 includes the ROBIS assessment of the review by Zhelev et al. [34].

13643_2026_3119_MOESM3_ESM.docx^{(64.8KB, docx)}

Additional file 3. Additional file 3 includes the PRISMA 2020 Checklist.

13643_2026_3119_MOESM4_ESM.docx^{(18KB, docx)}

Additional file 4. Additional file 4 includes changes made to the information provided at registration.

13643_2026_3119_MOESM5_ESM.docx^{(65.5KB, docx)}

Additional file 5. Additional file 5 includes the search strategies in Embase, Medline, Scopus, Cochrane, and Google Scholar.

13643_2026_3119_MOESM6_ESM.docx^{(21.5KB, docx)}

Additional file 6. Additional file 6 includes the list of data items extracted in the review.

13643_2026_3119_MOESM7_ESM.docx^{(469.1KB, docx)}

Additional file 7. Additional file 7 includes the full GRADE assessment of the interventions.

13643_2026_3119_MOESM8_ESM.docx^{(37.5KB, docx)}

Additional file 8. Additional file 8 includes the information on all articles excluded after full-text screening, including reason for exclusion.

13643_2026_3119_MOESM9_ESM.docx^{(206.4KB, docx)}

Additional file 9. Additional file 9 includes additional information on study characteristics, i.e. reported outcomes, reporting on theoretical foundations, and study period.

13643_2026_3119_MOESM10_ESM.docx^{(473.9KB, docx)}

Additional file 10. Additional file 10 includes the visualisation of the Risk of Bias assessment for the (cluster) RCTs and controlled studies.

13643_2026_3119_MOESM11_ESM.docx^{(263.3KB, docx)}

Additional file 11. Additional file 11 includes additional information on study results, i.e. the outcome values (pre/postintervention), notes on outcome measures and statistical indicators (confidence interval, p-value, relative reduction, difference in means).

13643_2026_3119_MOESM12_ESM.docx^{(167.2KB, docx)}

Additional file 12. Additional file 12 includes additional information on study characteristics, i.e. funding and reported conflict of interest.

Acknowledgements

We are grateful to Zhivko Zhelev for his invaluable insights and critical analyses in the prior review, which served as a foundation for the results presented in this article. We thank him for his guidance and continued support throughout the process.

Abbreviations

CDSS: Clinical decision support systems
CoE: Certainty of evidence
CoV: Coefficient of variation
CW: Choosing Wisely
GRADE: Grading of Recommendations, Assessment, Development, and Evaluations
NRSI: Non-randomised studies of interventions
NS: Not significant
PDSA: Plan-Do-Study-Act
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RCT: Randomised controlled trial
RQ: Research question
RoB: Risk of bias
RoB 2: RoB 2.0 tool
ROBINS-I: RoB In Non-randomised Studies—of Interventions
ROBIS: RoB In Systematic reviews tool
T3: Triiodothyronine
T4: Thyroxine
TDF: Theoretical Domains Framework
TFT: Thyroid function test
TSH: Thyroid-stimulating hormone

Authors’ contributions

CP and MH contributed to the design, analysis, and interpretation and drafted the manuscript. GG contributed to the design, analysis, and interpretation. VV contributed to the design and interpretation and finalised the manuscript. All authors read and approved the final manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL. The systematic review was funded by the Federal Joint Committee (GBA), the highest body of self-administration in the German healthcare system. Funding code 01VSF19038 [102]. The funders were not involved in the development of the review.

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information files.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Carolina Pioch and Meik Hildebrandt should be considered joint first author.

References

1.Horton S, Fleming KA, Kuti M, Looi L-M, Pai SA, Sayed S, et al. The top 25 laboratory tests by volume and revenue in five different countries. Am J Clin Pathol. 2019;151(5):446–51. [DOI] [PubMed] [Google Scholar]
2.Dufour DR. Laboratory tests of thyroid function: uses and limitations. Endocrinol Metab Clin North Am. 2007;36(3):579–94. [DOI] [PubMed] [Google Scholar]
3.Kluesner JK, Beckman DJ, Tate JM, Beauvais AA, Kravchenko MI, Wardian JL, et al. Analysis of current thyroid function test ordering practices. J Eval Clin Pract. 2018;24(2):347–52. [DOI] [PubMed] [Google Scholar]
4.Beckett GJ, Toft AD. First-line thyroid function tests - - TSH alone is not enough. Clin Endocrinol. 2003;58(1):20–1. [DOI] [PubMed] [Google Scholar]
5.Premawardhana LD. Thyroid testing in acutely ill patients may be an expensive distraction. Biochemia medica. 2017;27(2):300–7. [DOI] [PMC free article] [PubMed]
6.Kien C, Daxenbichler J, Titscher V, Baenziger J, Klingenstein P, Naef R et al. Effectiveness of de-implementation of low-value healthcare practices: an overview of systematic reviews. Implement sci: IS. 2024;19(1):56. [DOI] [PMC free article] [PubMed]
7.Garmendia Madariaga A, Santos Palacios S, Guillén-Grima F, Galofré JC. The incidence and prevalence of thyroid dysfunction in Europe: a meta-analysis. J Clin Endocrinol Metab. 2014;99(3):923–31. [DOI] [PubMed] [Google Scholar]
8.El Kawkgi OM, Brito JP. Screening for thyroid dysfunction: prevention of overdiagnosis and overtreatment. CMAJ. 2019;191(46):E1260–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hueber S, Biermann V, Tomandl J, Warkentin L, Schedlbauer A, Tauchmann H, et al. Consequences of early thyroid ultrasound on subsequent tests, morbidity and costs: an explorative analysis of routine health data from German ambulatory care. BMJ Open. 2023;13(3):e059016. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Schübel J, Voigt K, Uebel T. Elevated TSH values in primary care: DEGAM-Guideline. AWMF-Register-Nr. 053–046; 2023 [cited 2024 Jul 24]. Available from: https://register.awmf.org/assets/guidelines/053_D_Ges_fuer_Allgemeinmedizin_und_Familienmedizin/053-046k-eng_S2k_Erhoehter-TSH-Wert-in-der-Hausarztpraxis_2023-07.pdf.
11.Cure C. Screening for thyroid dysfunction: do not routinely order TSH in all patients: Canadian Task Force on Preventive Health Care; 2024 [cited 2024 Jul 24]. Available from: URL: https://canadiantaskforce.ca/screening-for-thyroid-dysfunction-do-not-routinely-order-tsh-in-all-patients/.
12.Canadian Society of Endocrinology and Metabolism. Five things patients and physicians should question: endocrinology and metabolism - Choosing wisely Canada; 2020 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/recommendation/endocrinology-and-metabolism/.
13.Choosing Wisely Australia. The Endocrine Society of Australia: recommendations; 2024 [cited 2024 Jul 1]. Available from: URL: https://www.choosingwisely.org.au/recommendations/esa5.
14.Canadian Society of Endocrinology and Metabolism. CSEM review and response: thyroid testing and management; 2024 [cited 2024 Jul 24]. Available from: URL: https://www.endo-metab.ca/cpgs-qi/thyroid-testing.
15.Gupta S, Verma M, Gupta AK, Kaur A, kaur V, Singh K. Are we using thyroid function tests appropriately? Indian J Clin Biochem. 2011;26(2):178–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Baranek H, Lee J. Less is more with T and T4: Choosing Wisely Canada; 2018 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/less-t3-t4/.
17.Hildebrandt M, Pioch C, Dammertz L, Ihle P, Nothacker M, Schneider U et al. Quantifying low-value care in Germany: an observational study using statutory health insurance data from 2018 to 2021. Value Health. 2024;28(6):884–93. [DOI] [PubMed]
18.Pioch C, Neubert A, Dammertz L, Ermann H, Hildebrandt M, Ihle P, et al. Selecting indicators for the measurement of low-value care using German claims data: a three-round modified Delphi panel. PLoS One. 2025;20(2):e0314864. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Crampton N, Kalia S, Del Giudice ME, Wintemute K, Sullivan F, Aliarzadeh B, et al. Over-use of thyroid testing in Canadian and UK primary care in frequent attenders: a cross-sectional study. Int J Clin Pract. 2021;75(6):e14144. [DOI] [PubMed] [Google Scholar]
20.Berthe E, Bencheqroun S, Mentaverri R. Recommendations for improved clinical practices for total thyroxine (T4) assay. J Appl Lab Med. 2025;10(3):764–7. [DOI] [PubMed] [Google Scholar]
21.Cliff BQ, Avanceña ALV, Hirth RA, Lee S-Y. The impact of choosing wisely interventions on low-value medical services: a systematic review. Milbank Q. 2021;99(4):1024–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Larsson A, Biom S, Wernroth ML, Hultén G, Tryding N. Effects of an education programme to change clinical laboratory testing habits in primary care. Scand J Prim Health Care. 1999;17(4):238–43. [DOI] [PubMed] [Google Scholar]
23.Berwick DM, Coltin KL. Feedback reduces test use in a health maintenance organization. J Am Med Assoc. 1986;255(11):1450–4. [PubMed] [Google Scholar]
24.Thomas RE, Croal BL, Ramsay C, Eccles M, Grimshaw J. Effect of enhanced feedback and brief educational reminder messages on laboratory test requesting in primary care: a cluster randomised trial. Lancet. 2006;367(9527):1990–6. [DOI] [PubMed] [Google Scholar]
25.Daucourt V, Saillour-Glénisson F, Michel P, Jutand MA, Abouelfath A. A multicenter cluster randomized controlled trial of strategies to improve thyroid function testing. Med Care. 2003;41(3):432–41. [DOI] [PubMed] [Google Scholar]
26.Emerson JF, Emerson SS. The impact of requisition design on laboratory utilization. Am J Clin Pathol. 2001;116(6):879–84. [DOI] [PubMed] [Google Scholar]
27.Tierney WM, McDonald CJ, Hui SL, Martin DK. Computer predictions of abnormal test results. Effects on outpatient testing. J Am Med Assoc. 1988;259(8):1194–8. [PubMed] [Google Scholar]
28.van Walraven C, Goel V, Chan B. Effect of population-based interventions on laboratory utilization: a time-series analysis. J Am Med Assoc. 1998;280(23):2028–33. [DOI] [PubMed] [Google Scholar]
29.Hardwick DF, Morrison JI, Tydeman J, Cassidy PA, Chase WH. Structuring complexity of testing: a process oriented approach to limiting unnecessary laboratory use. Am J Med Technol. 1982;48(7):605–8. [PubMed] [Google Scholar]
30.Gilmour JA, Weisman A, Orlov S, Goldberg RJ, Goldberg A, Baranek H, et al. Promoting resource stewardship: reducing inappropriate free thyroid hormone testing. J Eval Clin Pract. 2017;23(3):670–5. [DOI] [PubMed] [Google Scholar]
31.French SD, Green SE, O'Connor DA, McKenzie JE, Francis JJ, Michie S et al. Developing theory-informed behaviour change interventions to implement evidence into practice: a systematic approach using the Theoretical Domains Framework. Implement Sci: IS. 2012;7(1):38. [DOI] [PMC free article] [PubMed]
32.Gangathimmaiah V, Drever N, Evans R, Moodley N, Sen Gupta T, Cardona M et al. What works for and what hinders deimplementation of low-value care in emergency medicine practice? A scoping review. BMJ open. 2023;13(11):e072762. [DOI] [PMC free article] [PubMed]
33.Grimshaw JM, Patey AM, Kirkham KR, Hall A, Dowling SK, Rodondi N et al. De-implementing wisely: developing the evidence base to reduce low-value care. BMJ Qual Saf. 2020;29(5):409–17. [DOI] [PMC free article] [PubMed]
34.Zhelev Z, Abbott R, Rogers M, Fleming S, Patterson A, Hamilton WT et al. Effectiveness of interventions to reduce ordering of thyroid function tests: a systematic review. BMJ open. 2016;6(6):e010065. [DOI] [PMC free article] [PubMed]
35.Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wittich L, Tsatsaronis C, Kuklinski D, Schöner L, Steinbeck V, Busse R, et al. Patient-reported outcome measures as an intervention: a comprehensive overview of systematic reviews on the effects of feedback. Value Health. 2024;27(10):1436–53. [DOI] [PubMed] [Google Scholar]
37.Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Paige MJ et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023): Cochrane; 2023. Available from: URL: www.training.cochrane.org/handbook.
39.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical research ed). 2021;372:n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. [DOI] [PMC free article] [PubMed]
41.Solomon DH, Hashimoto H, Daltroy L, Liang MH. Techniques to improve physicians' use of diagnostic tests: a new conceptual framework. J Am Med Assoc. 1998;280(23):2020–7. [DOI] [PubMed]
42.Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ. 1995;153(10):1423–31. [PMC free article] [PubMed]
43.The EndNote Team. EndNote X9 (64-bit). Philadelphia (PA): Clarivate; 2013. Available from: https://endnote.com.
44.Janssens PMW, Staring W, Winkelman K, Krist G. Active intervention in hospital test request panels pays. Clin Chem Lab Med. 2015;53(5):731–42. [DOI] [PubMed] [Google Scholar]
45.Chami N, Li Y, Weir S, Wright JG, Kantarevic J. Effect of strict and soft policy interventions on laboratory diagnostic testing in Ontario, Canada: a Bayesian structural time series analysis. Health policy. 2021;125(2):254–60. [DOI] [PubMed]
46.Brown AF, Ma GX, Miranda J, Eng E, Castille D, Brockie T et al. Structural interventions to reduce and eliminate health disparities. Am J Pub Health. 2019;109(S1):S72–8. [DOI] [PMC free article] [PubMed]
47.Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898. [DOI] [PubMed] [Google Scholar]
48.Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): an R package and Shiny web app for visualizing risk-of-bias assessments. Res Synth Methods. 2021;12(1):55–61. [DOI] [PubMed] [Google Scholar]
50.Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94. [DOI] [PubMed]
51.Murad MH, Almasri J, Alsawas M, Farah W. Grading the quality of evidence in complex interventions: a guide for evidence-based practitioners. Evid Based Med. 2017;22(1):20–2. [DOI] [PubMed] [Google Scholar]
52.Bateman EA, Gob A, Chin-Yee I, MacKenzie HM. Reducing waste: a guidelines-based approach to reducing inappropriate vitamin D and TSH testing in the inpatient rehabilitation setting. BMJ Open Qual. 2019;8(4):e000674. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Bejjanki H, Mramba LK, Beal SG, Radhakrishnan N, Bishnoi R, Shah C, et al. The role of a best practice alert in the electronic medical record in reducing repetitive lab tests. ClinicoEconomics and outcomes research : CEOR. 2018;10:611–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Bellodi E, Vagnoni E, Bonvento B, Lamma E. Economic and organizational impact of a clinical decision support system on laboratory test ordering. BMC Med Inform Decis Mak. 2017;17(1):179. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Bradshaw AB, Bonnecaze AK, Burns CA, Beardsley JR. Impact of an interprofessional collaborative quality improvement initiative to decrease inappropriate thyroid function testing. Hosp Pharm. 2021;56(5):481–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Caldarelli G, Troiano G, Rosadini D, Nante N. Adoption of TSH reflex algorithm in an Italian clinical laboratory. Annali di igiene : medicina preventiva e di comunita. 2017;29(4):317–22. [DOI] [PubMed] [Google Scholar]
57.Dalal S, Bhesania S, Silber S, Mehta P. Use of electronic clinical decision support and hard stops to decrease unnecessary thyroid function testing. BMJ Quality Improvement Reports. 2017;6(1):u223041. w8346. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Delvaux N, Piessens V, Burghgraeve T, Mamouris P, Vaes B, Stichele RV, et al. Clinical decision support improves the appropriateness of laboratory test ordering in primary care without increasing diagnostic error: the ELMO cluster randomized trial. Implement Sci IS. 2020;15(1):100. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Elrewini AM, Zubair M, Afridi NK, Dildar MT, Javed H, Alwalah SM. To determine the effectiveness of different interventions to reduce unnecessary requests of serum thyroid stimulating hormone levels in a hospital. The Professional Medical Journal. 2022;29(05):686–92. [Google Scholar]
60.Krouss M, Israilov S, Alaiev D, Hupart K, Da Shin W, Mestari N, et al. Free the T3: implementation of best practice advisory to reduce unnecessary orders. Am J Med. 2022;135(12):1437–42. [DOI] [PubMed] [Google Scholar]
61.Leis B, Frost A, Bryce R, Lyon AW, Coverett K. Altering standard admission order sets to promote clinical laboratory stewardship: a cohort quality improvement study. BMJ Quality & Safety. 2019;28(10):846–52. [DOI] [PubMed] [Google Scholar]
62.Leung E, Song S, Al-Abboud O, Shams S, English J, Naji W, et al. An educational intervention to increase awareness reduces unnecessary laboratory testing in an internal medicine resident-run clinic. Journal of community hospital internal medicine perspectives. 2017;7(3):168–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.MacPherson RD, Reeve SA, Stewart TV, Cunningham AES, Craven ML, Fox G, et al. Effective strategy to guide pathology test ordering in surgical patients. ANZ J Surg. 2005;75(3):138–43. [DOI] [PubMed] [Google Scholar]
64.Muris DMJ, Molenaers M, Nguyen T, Bergmans PWMP, van Acker BAC, Krekels MME, et al. Effect of a price display intervention on laboratory test ordering behavior of general practitioners. BMC Fam Pract. 2021;22(1):242. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Notas G, Kampa M, Malliaraki N, Petrodaskalaki M, Papavasileiou S, Castanas E. Implementation of thyroid function tests algorithms by clinical laboratories: a four-year experience of good clinical and diagnostic practice in a tertiary hospital in Greece. Eur J Intern Med. 2018;54:81–6. [DOI] [PubMed] [Google Scholar]
66.Salinas M, López-Garrigós M, Flores E, Leiva-Salinas M, Asencio A, Lugo J, et al. Managing inappropriate requests of laboratory tests: from detection to monitoring. Am J Manag Care. 2016;22(9):e311–6. [PubMed] [Google Scholar]
67.Sue LY, Kim JE, Oza H, Chong T, Woo HE, Cheng EM, et al. Reducing inappropriate serum T3 laboratory test ordering in patients with treated hypothyroidism. Endocr Pract. 2019;25(12):1312–6. [DOI] [PubMed] [Google Scholar]
68.Taher J, Beriault DR, Yip D, Tahir S, Hicks LK, Gilmour JA. Reducing free thyroid hormone testing through multiple plan-do-study-act cycles. Clin Biochem. 2020;81:41–6. [DOI] [PubMed] [Google Scholar]
69.Wintemute K, Greiver M, McIsaac W, Del Elisabeth GM, Sullivan F, Aliarzadeh B, et al. Choosing Wisely Canada campaign associated with less overuse of thyroid testing Retrospective parallel cohort study. Can Fam Phys. 2019;65(11):E487–96. [PMC free article] [PubMed] [Google Scholar]
70.Adlan MA, Neel V, Lakra SS, Bondugulapati LNR, Premawardhana LDKE. Targeted thyroid testing in acute illness: achieving success through audit. J Endocrinol Invest. 2011;34(8 SUPPL.):e210–3. [DOI] [PubMed] [Google Scholar]
71.Baker R, Smith JF, Lambert PC. Randomised controlled trial of the effectiveness of feedback in improving test ordering in general practice. Scand J Prim Health Care. 2003;21(4):219–23. [DOI] [PubMed] [Google Scholar]
72.Chu KH, Wagholikar AS, Greenslade JH, O’Dwyer JA, Brown AF. Sustained reductions in emergency department laboratory test orders: impact of a simple intervention. Postgrad Med J. 2013;89(1056):566–71. [DOI] [PubMed] [Google Scholar]
73.Cipullo JA, Mostoufizadeh M. Bringing order to test orders: one lab’s story. CAP today. 1996;10(1):20–2. [PubMed] [Google Scholar]
74.Dowling PT, Alfonsi G, Brown MI, Culpepper L. An education program to reduce unnecessary laboratory tests by residents. J Med Educ. 1989;64(7):410–2. [DOI] [PubMed] [Google Scholar]
75.Feldkamp CS, Carey JL. An algorithmic approach to thyroid function testing in a managed care setting: 3-year experience. Am J Clin Pathol. 1996;105(1):11–6. [DOI] [PubMed] [Google Scholar]
76.Gama R, Nightingale PG, Broughton PM, Peters M, Bradby GV, Berg J, et al. Feedback of laboratory usage and cost data to clinicians: does it alter requesting behaviour? Ann Clin Biochem. 1991;28(Pt 2):143–9. [DOI] [PubMed] [Google Scholar]
77.Grivell AR, Forgie HJ, Fraser CG, Berry MN. Effect of feedback to clinical staff of information on clinical biochemistry requesting patterns. Clin Chem. 1981;27(10):1717–20. [PubMed] [Google Scholar]
78.Horn DM, Koplan KE, Senese MD, Orav EJ, Sequist TD. The impact of cost displays on primary care physician laboratory test ordering. J Gen Int Med. 2014 [cited 20131121//]; 29(5):708–14. [DOI] [PMC free article] [PubMed]
79.Mindemark M, Larsson A. Long-term effects of an education programme on the optimal use of clinical chemistry testing in primary health care. Scand J Clin Lab Invest. 2009;69(4):481–6. [DOI] [PubMed] [Google Scholar]
80.Nightingale PG, Peters M, Mutimer D, Neuberger JM. Effects of a computerised protocol management system on ordering of clinical tests. Quality in health care : QHC. 1994;3(1):23–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Rhyne RL, Gehlbach SH. Effects of an educational feedback strategy on physician utilization of thyroid function panels. J Fam Pract. 1979;8(5):1003–7. [PubMed] [Google Scholar]
82.Schectman JM, Elinsky EG, Pawlson LG. Effect of education and feedback on thyroid function testing strategies of primary care clinicians. Arch Intern Med. 1991;151(11):2163–6. [PubMed] [Google Scholar]
83.Stuart PJ, Crooks S, Porton M. An interventional program for diagnostic testing in the emergency department. Med J Aust. 2002;177(3):131–4. [DOI] [PubMed] [Google Scholar]
84.Tomlin A, Dovey S, Gauld R, Tilyard M. Better use of primary care laboratory services following interventions to ‘market’ clinical guidelines in New Zealand: a controlled before-and-after study. BMJ Qual Saf. 2011;20(3):282–90. [DOI] [PubMed] [Google Scholar]
85.Toubert ME, Chevret S, Cassinat B, Schlageter MH, Beressi JP, Rain JD. From guidelines to hospital practice: reducing inappropriate ordering of thyroid hormone and antibody tests. Eur J Endocrinol. 2000;142(6):605–10. [DOI] [PubMed] [Google Scholar]
86.Vidal-Trécan G, Toubert ME, Coste J, Paycha F, Durand-Zaleski I, Fulla Y, et al. Reducing the number of T3 orders in the Paris hospital network: towards better appropriatness of thyroid function test prescription. Ann Endocrinol. 2003;64(3):210–5. [PubMed] [Google Scholar]
87.Willis EA, Datta BN. Effect of an educational intervention on requesting behaviour by a medical admission unit. Ann Clin Biochem. 2013;50(2):166–8. [DOI] [PubMed] [Google Scholar]
88.Wong ET, McCarron MM, Shaw ST. Ordering of laboratory tests in a teaching hospital: can it be improved? JAMA. 1983;249(22):3076–80. [PubMed] [Google Scholar]
89.European Commission. Member State Coordination Group on HTA (HTACG). Guidance on the validity of clinical studies for joint clinical assessments. Directorate General for Health and Food Safety. 2024. Available from: https://health.ec.europa.eu/publications/guidance-validity-clinical-studies-joint-clinical-assessments_en.
90.Leis B, Frost A, Bryce R, Coverett K. Standard admission order sets promote ordering of unnecessary investigations: a quasi-randomised evaluation in a simulated setting. BMJ Quality & Safety. 2017;26(11):938–40. [DOI] [PubMed] [Google Scholar]
91.Rogers E. Diffusion of innovations. New York: THe Free Press; 1995. [Google Scholar]
92.van Walraven C, Naylor CD. Do we know what inappropriate laboratory utilization is? A systematic review of laboratory clinical audits. JAMA. 1998;280(6):550–8. [DOI] [PubMed] [Google Scholar]
93.Hulscher MEJL, Laurant MGH, Grol RPTM. Process evaluation on quality improvement interventions. Qual Saf Healthc. 2003;12(1):40–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Provost LP, Murray SK. The health care data guide: learning from data for improvement. Second edition. Hoboken, NJ: John Wiley & Sons; 2022.
95.Institute for Healthcare Improvement. How to improve: model for improvement; 2024 [cited 2024 Jul 25]. Available from: URL: https://www.ihi.org/resources/how-improve-model-improvement.
96.Alishahi Tabriz A, Turner K, Clary A, Hong Y-R, Nguyen OT, Wei G et al. De-implementing low-value care in cancer care delivery: a systematic review. Implementation Sci. 2022;17(1):24. [DOI] [PMC free article] [PubMed]
97.Mucha H, Robert S, Breitschwerdt R, Fellmann M. Usability of clinical decision support systems. Z Arb Wiss. 2023;77(1):92–101.
98.Kobewka DM, Ronksley PE, McKay JA, Forster AJ, van Walraven C. Influence of educational, audit and feedback, system based, and incentive and penalty interventions to reduce laboratory test utilization: a systematic review. Clin Chem Lab Med. 2015;53(2):157–83. [DOI] [PubMed] [Google Scholar]
99.Nilsen P, Ingvarsson S, Hasson H, Thiele Schwarz U von, Augustsson H. Theories, models, and frameworks for de-implementation of low-value care: a scoping review of the literature. Implement Res Pract. 2020;1:2633489520953762. [DOI] [PMC free article] [PubMed]
100.Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist review--a new method of systematic review designed for complex policy interventions. J health serv res policy. 2005;10 Suppl 1:21–34. [DOI] [PubMed]
101.Augustsson H, Casales Morici B, Hasson H, von Thiele Schwarz U, Schalling SK, Ingvarsson S, et al. National governance of de-implementation of low-value care: a qualitative study in Sweden. Health Res Policy Syst. 2022;20(1):92. [DOI] [PMC free article] [PubMed] [Google Scholar]
102.Gemeinsamer Bundesauschuss (GBA). IndiQ – Entwicklung eines Tools zur Messung von Indikationsqualität in Routinedaten und Identifikation von Handlungsbedarfen und -strategien - G-BA Innovationsfonds; 2024 [cited 2024 Aug 2]. Available from: URL: https://innovationsfonds.g-ba.de/projekte/versorgungsforschung/indiq-entwicklung-eines-tools-zur-messung-von-indikationsqualitaet-in-routinedaten-und-identifikation-von-handlungsbedarfen-und-strategien.325.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13643_2026_3119_MOESM1_ESM.docx^{(51.6KB, docx)}

Additional file 1. Additional file 1 includes the AMSTAR assessment of the review by Zhelev et al [34].

13643_2026_3119_MOESM2_ESM.docx^{(43.4KB, docx)}

Additional file 2. Additional file 2 includes the ROBIS assessment of the review by Zhelev et al. [34].

13643_2026_3119_MOESM3_ESM.docx^{(64.8KB, docx)}

Additional file 3. Additional file 3 includes the PRISMA 2020 Checklist.

13643_2026_3119_MOESM4_ESM.docx^{(18KB, docx)}

Additional file 4. Additional file 4 includes changes made to the information provided at registration.

13643_2026_3119_MOESM5_ESM.docx^{(65.5KB, docx)}

Additional file 5. Additional file 5 includes the search strategies in Embase, Medline, Scopus, Cochrane, and Google Scholar.

13643_2026_3119_MOESM6_ESM.docx^{(21.5KB, docx)}

Additional file 6. Additional file 6 includes the list of data items extracted in the review.

13643_2026_3119_MOESM7_ESM.docx^{(469.1KB, docx)}

Additional file 7. Additional file 7 includes the full GRADE assessment of the interventions.

13643_2026_3119_MOESM8_ESM.docx^{(37.5KB, docx)}

Additional file 8. Additional file 8 includes the information on all articles excluded after full-text screening, including reason for exclusion.

13643_2026_3119_MOESM9_ESM.docx^{(206.4KB, docx)}

Additional file 9. Additional file 9 includes additional information on study characteristics, i.e. reported outcomes, reporting on theoretical foundations, and study period.

13643_2026_3119_MOESM10_ESM.docx^{(473.9KB, docx)}

Additional file 10. Additional file 10 includes the visualisation of the Risk of Bias assessment for the (cluster) RCTs and controlled studies.

13643_2026_3119_MOESM11_ESM.docx^{(263.3KB, docx)}

13643_2026_3119_MOESM12_ESM.docx^{(167.2KB, docx)}

Additional file 12. Additional file 12 includes additional information on study characteristics, i.e. funding and reported conflict of interest.

Data Availability Statement

All data generated or analysed during this study are included in this published article and its supplementary information files.

[CR1] 1.Horton S, Fleming KA, Kuti M, Looi L-M, Pai SA, Sayed S, et al. The top 25 laboratory tests by volume and revenue in five different countries. Am J Clin Pathol. 2019;151(5):446–51. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Dufour DR. Laboratory tests of thyroid function: uses and limitations. Endocrinol Metab Clin North Am. 2007;36(3):579–94. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Kluesner JK, Beckman DJ, Tate JM, Beauvais AA, Kravchenko MI, Wardian JL, et al. Analysis of current thyroid function test ordering practices. J Eval Clin Pract. 2018;24(2):347–52. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Beckett GJ, Toft AD. First-line thyroid function tests - - TSH alone is not enough. Clin Endocrinol. 2003;58(1):20–1. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Premawardhana LD. Thyroid testing in acutely ill patients may be an expensive distraction. Biochemia medica. 2017;27(2):300–7. [DOI] [PMC free article] [PubMed]

[CR6] 6.Kien C, Daxenbichler J, Titscher V, Baenziger J, Klingenstein P, Naef R et al. Effectiveness of de-implementation of low-value healthcare practices: an overview of systematic reviews. Implement sci: IS. 2024;19(1):56. [DOI] [PMC free article] [PubMed]

[CR7] 7.Garmendia Madariaga A, Santos Palacios S, Guillén-Grima F, Galofré JC. The incidence and prevalence of thyroid dysfunction in Europe: a meta-analysis. J Clin Endocrinol Metab. 2014;99(3):923–31. [DOI] [PubMed] [Google Scholar]

[CR8] 8.El Kawkgi OM, Brito JP. Screening for thyroid dysfunction: prevention of overdiagnosis and overtreatment. CMAJ. 2019;191(46):E1260–1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Hueber S, Biermann V, Tomandl J, Warkentin L, Schedlbauer A, Tauchmann H, et al. Consequences of early thyroid ultrasound on subsequent tests, morbidity and costs: an explorative analysis of routine health data from German ambulatory care. BMJ Open. 2023;13(3):e059016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Schübel J, Voigt K, Uebel T. Elevated TSH values in primary care: DEGAM-Guideline. AWMF-Register-Nr. 053–046; 2023 [cited 2024 Jul 24]. Available from: https://register.awmf.org/assets/guidelines/053_D_Ges_fuer_Allgemeinmedizin_und_Familienmedizin/053-046k-eng_S2k_Erhoehter-TSH-Wert-in-der-Hausarztpraxis_2023-07.pdf.

[CR11] 11.Cure C. Screening for thyroid dysfunction: do not routinely order TSH in all patients: Canadian Task Force on Preventive Health Care; 2024 [cited 2024 Jul 24]. Available from: URL: https://canadiantaskforce.ca/screening-for-thyroid-dysfunction-do-not-routinely-order-tsh-in-all-patients/.

[CR12] 12.Canadian Society of Endocrinology and Metabolism. Five things patients and physicians should question: endocrinology and metabolism - Choosing wisely Canada; 2020 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/recommendation/endocrinology-and-metabolism/.

[CR13] 13.Choosing Wisely Australia. The Endocrine Society of Australia: recommendations; 2024 [cited 2024 Jul 1]. Available from: URL: https://www.choosingwisely.org.au/recommendations/esa5.

[CR14] 14.Canadian Society of Endocrinology and Metabolism. CSEM review and response: thyroid testing and management; 2024 [cited 2024 Jul 24]. Available from: URL: https://www.endo-metab.ca/cpgs-qi/thyroid-testing.

[CR15] 15.Gupta S, Verma M, Gupta AK, Kaur A, kaur V, Singh K. Are we using thyroid function tests appropriately? Indian J Clin Biochem. 2011;26(2):178–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Baranek H, Lee J. Less is more with T and T4: Choosing Wisely Canada; 2018 [cited 2024 Jul 24]. Available from: URL: https://choosingwiselycanada.org/less-t3-t4/.

[CR17] 17.Hildebrandt M, Pioch C, Dammertz L, Ihle P, Nothacker M, Schneider U et al. Quantifying low-value care in Germany: an observational study using statutory health insurance data from 2018 to 2021. Value Health. 2024;28(6):884–93. [DOI] [PubMed]

[CR18] 18.Pioch C, Neubert A, Dammertz L, Ermann H, Hildebrandt M, Ihle P, et al. Selecting indicators for the measurement of low-value care using German claims data: a three-round modified Delphi panel. PLoS One. 2025;20(2):e0314864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Crampton N, Kalia S, Del Giudice ME, Wintemute K, Sullivan F, Aliarzadeh B, et al. Over-use of thyroid testing in Canadian and UK primary care in frequent attenders: a cross-sectional study. Int J Clin Pract. 2021;75(6):e14144. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Berthe E, Bencheqroun S, Mentaverri R. Recommendations for improved clinical practices for total thyroxine (T4) assay. J Appl Lab Med. 2025;10(3):764–7. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Cliff BQ, Avanceña ALV, Hirth RA, Lee S-Y. The impact of choosing wisely interventions on low-value medical services: a systematic review. Milbank Q. 2021;99(4):1024–58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Larsson A, Biom S, Wernroth ML, Hultén G, Tryding N. Effects of an education programme to change clinical laboratory testing habits in primary care. Scand J Prim Health Care. 1999;17(4):238–43. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Berwick DM, Coltin KL. Feedback reduces test use in a health maintenance organization. J Am Med Assoc. 1986;255(11):1450–4. [PubMed] [Google Scholar]

[CR24] 24.Thomas RE, Croal BL, Ramsay C, Eccles M, Grimshaw J. Effect of enhanced feedback and brief educational reminder messages on laboratory test requesting in primary care: a cluster randomised trial. Lancet. 2006;367(9527):1990–6. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Daucourt V, Saillour-Glénisson F, Michel P, Jutand MA, Abouelfath A. A multicenter cluster randomized controlled trial of strategies to improve thyroid function testing. Med Care. 2003;41(3):432–41. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Emerson JF, Emerson SS. The impact of requisition design on laboratory utilization. Am J Clin Pathol. 2001;116(6):879–84. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Tierney WM, McDonald CJ, Hui SL, Martin DK. Computer predictions of abnormal test results. Effects on outpatient testing. J Am Med Assoc. 1988;259(8):1194–8. [PubMed] [Google Scholar]

[CR28] 28.van Walraven C, Goel V, Chan B. Effect of population-based interventions on laboratory utilization: a time-series analysis. J Am Med Assoc. 1998;280(23):2028–33. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Hardwick DF, Morrison JI, Tydeman J, Cassidy PA, Chase WH. Structuring complexity of testing: a process oriented approach to limiting unnecessary laboratory use. Am J Med Technol. 1982;48(7):605–8. [PubMed] [Google Scholar]

[CR30] 30.Gilmour JA, Weisman A, Orlov S, Goldberg RJ, Goldberg A, Baranek H, et al. Promoting resource stewardship: reducing inappropriate free thyroid hormone testing. J Eval Clin Pract. 2017;23(3):670–5. [DOI] [PubMed] [Google Scholar]

[CR31] 31.French SD, Green SE, O'Connor DA, McKenzie JE, Francis JJ, Michie S et al. Developing theory-informed behaviour change interventions to implement evidence into practice: a systematic approach using the Theoretical Domains Framework. Implement Sci: IS. 2012;7(1):38. [DOI] [PMC free article] [PubMed]

[CR32] 32.Gangathimmaiah V, Drever N, Evans R, Moodley N, Sen Gupta T, Cardona M et al. What works for and what hinders deimplementation of low-value care in emergency medicine practice? A scoping review. BMJ open. 2023;13(11):e072762. [DOI] [PMC free article] [PubMed]

[CR33] 33.Grimshaw JM, Patey AM, Kirkham KR, Hall A, Dowling SK, Rodondi N et al. De-implementing wisely: developing the evidence base to reduce low-value care. BMJ Qual Saf. 2020;29(5):409–17. [DOI] [PMC free article] [PubMed]

[CR34] 34.Zhelev Z, Abbott R, Rogers M, Fleming S, Patterson A, Hamilton WT et al. Effectiveness of interventions to reduce ordering of thyroid function tests: a systematic review. BMJ open. 2016;6(6):e010065. [DOI] [PMC free article] [PubMed]

[CR35] 35.Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Wittich L, Tsatsaronis C, Kuklinski D, Schöner L, Steinbeck V, Busse R, et al. Patient-reported outcome measures as an intervention: a comprehensive overview of systematic reviews on the effects of feedback. Value Health. 2024;27(10):1436–53. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Paige MJ et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023): Cochrane; 2023. Available from: URL: www.training.cochrane.org/handbook.

[CR39] 39.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical research ed). 2021;372:n71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. [DOI] [PMC free article] [PubMed]

[CR41] 41.Solomon DH, Hashimoto H, Daltroy L, Liang MH. Techniques to improve physicians' use of diagnostic tests: a new conceptual framework. J Am Med Assoc. 1998;280(23):2020–7. [DOI] [PubMed]

[CR42] 42.Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ. 1995;153(10):1423–31. [PMC free article] [PubMed]

[CR43] 43.The EndNote Team. EndNote X9 (64-bit). Philadelphia (PA): Clarivate; 2013. Available from: https://endnote.com.

[CR44] 44.Janssens PMW, Staring W, Winkelman K, Krist G. Active intervention in hospital test request panels pays. Clin Chem Lab Med. 2015;53(5):731–42. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Chami N, Li Y, Weir S, Wright JG, Kantarevic J. Effect of strict and soft policy interventions on laboratory diagnostic testing in Ontario, Canada: a Bayesian structural time series analysis. Health policy. 2021;125(2):254–60. [DOI] [PubMed]

[CR46] 46.Brown AF, Ma GX, Miranda J, Eng E, Castille D, Brockie T et al. Structural interventions to reduce and eliminate health disparities. Am J Pub Health. 2019;109(S1):S72–8. [DOI] [PMC free article] [PubMed]

[CR47] 47.Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): an R package and Shiny web app for visualizing risk-of-bias assessments. Res Synth Methods. 2021;12(1):55–61. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94. [DOI] [PubMed]

[CR51] 51.Murad MH, Almasri J, Alsawas M, Farah W. Grading the quality of evidence in complex interventions: a guide for evidence-based practitioners. Evid Based Med. 2017;22(1):20–2. [DOI] [PubMed] [Google Scholar]

[CR52] 52.Bateman EA, Gob A, Chin-Yee I, MacKenzie HM. Reducing waste: a guidelines-based approach to reducing inappropriate vitamin D and TSH testing in the inpatient rehabilitation setting. BMJ Open Qual. 2019;8(4):e000674. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Bejjanki H, Mramba LK, Beal SG, Radhakrishnan N, Bishnoi R, Shah C, et al. The role of a best practice alert in the electronic medical record in reducing repetitive lab tests. ClinicoEconomics and outcomes research : CEOR. 2018;10:611–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Bellodi E, Vagnoni E, Bonvento B, Lamma E. Economic and organizational impact of a clinical decision support system on laboratory test ordering. BMC Med Inform Decis Mak. 2017;17(1):179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Bradshaw AB, Bonnecaze AK, Burns CA, Beardsley JR. Impact of an interprofessional collaborative quality improvement initiative to decrease inappropriate thyroid function testing. Hosp Pharm. 2021;56(5):481–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Caldarelli G, Troiano G, Rosadini D, Nante N. Adoption of TSH reflex algorithm in an Italian clinical laboratory. Annali di igiene : medicina preventiva e di comunita. 2017;29(4):317–22. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Dalal S, Bhesania S, Silber S, Mehta P. Use of electronic clinical decision support and hard stops to decrease unnecessary thyroid function testing. BMJ Quality Improvement Reports. 2017;6(1):u223041. w8346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Delvaux N, Piessens V, Burghgraeve T, Mamouris P, Vaes B, Stichele RV, et al. Clinical decision support improves the appropriateness of laboratory test ordering in primary care without increasing diagnostic error: the ELMO cluster randomized trial. Implement Sci IS. 2020;15(1):100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Elrewini AM, Zubair M, Afridi NK, Dildar MT, Javed H, Alwalah SM. To determine the effectiveness of different interventions to reduce unnecessary requests of serum thyroid stimulating hormone levels in a hospital. The Professional Medical Journal. 2022;29(05):686–92. [Google Scholar]

[CR60] 60.Krouss M, Israilov S, Alaiev D, Hupart K, Da Shin W, Mestari N, et al. Free the T3: implementation of best practice advisory to reduce unnecessary orders. Am J Med. 2022;135(12):1437–42. [DOI] [PubMed] [Google Scholar]

[CR61] 61.Leis B, Frost A, Bryce R, Lyon AW, Coverett K. Altering standard admission order sets to promote clinical laboratory stewardship: a cohort quality improvement study. BMJ Quality & Safety. 2019;28(10):846–52. [DOI] [PubMed] [Google Scholar]

[CR62] 62.Leung E, Song S, Al-Abboud O, Shams S, English J, Naji W, et al. An educational intervention to increase awareness reduces unnecessary laboratory testing in an internal medicine resident-run clinic. Journal of community hospital internal medicine perspectives. 2017;7(3):168–72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.MacPherson RD, Reeve SA, Stewart TV, Cunningham AES, Craven ML, Fox G, et al. Effective strategy to guide pathology test ordering in surgical patients. ANZ J Surg. 2005;75(3):138–43. [DOI] [PubMed] [Google Scholar]

[CR64] 64.Muris DMJ, Molenaers M, Nguyen T, Bergmans PWMP, van Acker BAC, Krekels MME, et al. Effect of a price display intervention on laboratory test ordering behavior of general practitioners. BMC Fam Pract. 2021;22(1):242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Notas G, Kampa M, Malliaraki N, Petrodaskalaki M, Papavasileiou S, Castanas E. Implementation of thyroid function tests algorithms by clinical laboratories: a four-year experience of good clinical and diagnostic practice in a tertiary hospital in Greece. Eur J Intern Med. 2018;54:81–6. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Salinas M, López-Garrigós M, Flores E, Leiva-Salinas M, Asencio A, Lugo J, et al. Managing inappropriate requests of laboratory tests: from detection to monitoring. Am J Manag Care. 2016;22(9):e311–6. [PubMed] [Google Scholar]

[CR67] 67.Sue LY, Kim JE, Oza H, Chong T, Woo HE, Cheng EM, et al. Reducing inappropriate serum T3 laboratory test ordering in patients with treated hypothyroidism. Endocr Pract. 2019;25(12):1312–6. [DOI] [PubMed] [Google Scholar]

[CR68] 68.Taher J, Beriault DR, Yip D, Tahir S, Hicks LK, Gilmour JA. Reducing free thyroid hormone testing through multiple plan-do-study-act cycles. Clin Biochem. 2020;81:41–6. [DOI] [PubMed] [Google Scholar]

[CR69] 69.Wintemute K, Greiver M, McIsaac W, Del Elisabeth GM, Sullivan F, Aliarzadeh B, et al. Choosing Wisely Canada campaign associated with less overuse of thyroid testing Retrospective parallel cohort study. Can Fam Phys. 2019;65(11):E487–96. [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Adlan MA, Neel V, Lakra SS, Bondugulapati LNR, Premawardhana LDKE. Targeted thyroid testing in acute illness: achieving success through audit. J Endocrinol Invest. 2011;34(8 SUPPL.):e210–3. [DOI] [PubMed] [Google Scholar]

[CR71] 71.Baker R, Smith JF, Lambert PC. Randomised controlled trial of the effectiveness of feedback in improving test ordering in general practice. Scand J Prim Health Care. 2003;21(4):219–23. [DOI] [PubMed] [Google Scholar]

[CR72] 72.Chu KH, Wagholikar AS, Greenslade JH, O’Dwyer JA, Brown AF. Sustained reductions in emergency department laboratory test orders: impact of a simple intervention. Postgrad Med J. 2013;89(1056):566–71. [DOI] [PubMed] [Google Scholar]

[CR73] 73.Cipullo JA, Mostoufizadeh M. Bringing order to test orders: one lab’s story. CAP today. 1996;10(1):20–2. [PubMed] [Google Scholar]

[CR74] 74.Dowling PT, Alfonsi G, Brown MI, Culpepper L. An education program to reduce unnecessary laboratory tests by residents. J Med Educ. 1989;64(7):410–2. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Feldkamp CS, Carey JL. An algorithmic approach to thyroid function testing in a managed care setting: 3-year experience. Am J Clin Pathol. 1996;105(1):11–6. [DOI] [PubMed] [Google Scholar]

[CR76] 76.Gama R, Nightingale PG, Broughton PM, Peters M, Bradby GV, Berg J, et al. Feedback of laboratory usage and cost data to clinicians: does it alter requesting behaviour? Ann Clin Biochem. 1991;28(Pt 2):143–9. [DOI] [PubMed] [Google Scholar]

[CR77] 77.Grivell AR, Forgie HJ, Fraser CG, Berry MN. Effect of feedback to clinical staff of information on clinical biochemistry requesting patterns. Clin Chem. 1981;27(10):1717–20. [PubMed] [Google Scholar]

[CR78] 78.Horn DM, Koplan KE, Senese MD, Orav EJ, Sequist TD. The impact of cost displays on primary care physician laboratory test ordering. J Gen Int Med. 2014 [cited 20131121//]; 29(5):708–14. [DOI] [PMC free article] [PubMed]

[CR79] 79.Mindemark M, Larsson A. Long-term effects of an education programme on the optimal use of clinical chemistry testing in primary health care. Scand J Clin Lab Invest. 2009;69(4):481–6. [DOI] [PubMed] [Google Scholar]

[CR80] 80.Nightingale PG, Peters M, Mutimer D, Neuberger JM. Effects of a computerised protocol management system on ordering of clinical tests. Quality in health care : QHC. 1994;3(1):23–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR81] 81.Rhyne RL, Gehlbach SH. Effects of an educational feedback strategy on physician utilization of thyroid function panels. J Fam Pract. 1979;8(5):1003–7. [PubMed] [Google Scholar]

[CR82] 82.Schectman JM, Elinsky EG, Pawlson LG. Effect of education and feedback on thyroid function testing strategies of primary care clinicians. Arch Intern Med. 1991;151(11):2163–6. [PubMed] [Google Scholar]

[CR83] 83.Stuart PJ, Crooks S, Porton M. An interventional program for diagnostic testing in the emergency department. Med J Aust. 2002;177(3):131–4. [DOI] [PubMed] [Google Scholar]

[CR84] 84.Tomlin A, Dovey S, Gauld R, Tilyard M. Better use of primary care laboratory services following interventions to ‘market’ clinical guidelines in New Zealand: a controlled before-and-after study. BMJ Qual Saf. 2011;20(3):282–90. [DOI] [PubMed] [Google Scholar]

[CR85] 85.Toubert ME, Chevret S, Cassinat B, Schlageter MH, Beressi JP, Rain JD. From guidelines to hospital practice: reducing inappropriate ordering of thyroid hormone and antibody tests. Eur J Endocrinol. 2000;142(6):605–10. [DOI] [PubMed] [Google Scholar]

[CR86] 86.Vidal-Trécan G, Toubert ME, Coste J, Paycha F, Durand-Zaleski I, Fulla Y, et al. Reducing the number of T3 orders in the Paris hospital network: towards better appropriatness of thyroid function test prescription. Ann Endocrinol. 2003;64(3):210–5. [PubMed] [Google Scholar]

[CR87] 87.Willis EA, Datta BN. Effect of an educational intervention on requesting behaviour by a medical admission unit. Ann Clin Biochem. 2013;50(2):166–8. [DOI] [PubMed] [Google Scholar]

[CR88] 88.Wong ET, McCarron MM, Shaw ST. Ordering of laboratory tests in a teaching hospital: can it be improved? JAMA. 1983;249(22):3076–80. [PubMed] [Google Scholar]

[CR89] 89.European Commission. Member State Coordination Group on HTA (HTACG). Guidance on the validity of clinical studies for joint clinical assessments. Directorate General for Health and Food Safety. 2024. Available from: https://health.ec.europa.eu/publications/guidance-validity-clinical-studies-joint-clinical-assessments_en.

[CR90] 90.Leis B, Frost A, Bryce R, Coverett K. Standard admission order sets promote ordering of unnecessary investigations: a quasi-randomised evaluation in a simulated setting. BMJ Quality & Safety. 2017;26(11):938–40. [DOI] [PubMed] [Google Scholar]

[CR91] 91.Rogers E. Diffusion of innovations. New York: THe Free Press; 1995. [Google Scholar]

[CR92] 92.van Walraven C, Naylor CD. Do we know what inappropriate laboratory utilization is? A systematic review of laboratory clinical audits. JAMA. 1998;280(6):550–8. [DOI] [PubMed] [Google Scholar]

[CR93] 93.Hulscher MEJL, Laurant MGH, Grol RPTM. Process evaluation on quality improvement interventions. Qual Saf Healthc. 2003;12(1):40–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR94] 94.Provost LP, Murray SK. The health care data guide: learning from data for improvement. Second edition. Hoboken, NJ: John Wiley & Sons; 2022.

[CR95] 95.Institute for Healthcare Improvement. How to improve: model for improvement; 2024 [cited 2024 Jul 25]. Available from: URL: https://www.ihi.org/resources/how-improve-model-improvement.

[CR96] 96.Alishahi Tabriz A, Turner K, Clary A, Hong Y-R, Nguyen OT, Wei G et al. De-implementing low-value care in cancer care delivery: a systematic review. Implementation Sci. 2022;17(1):24. [DOI] [PMC free article] [PubMed]

[CR97] 97.Mucha H, Robert S, Breitschwerdt R, Fellmann M. Usability of clinical decision support systems. Z Arb Wiss. 2023;77(1):92–101.

[CR98] 98.Kobewka DM, Ronksley PE, McKay JA, Forster AJ, van Walraven C. Influence of educational, audit and feedback, system based, and incentive and penalty interventions to reduce laboratory test utilization: a systematic review. Clin Chem Lab Med. 2015;53(2):157–83. [DOI] [PubMed] [Google Scholar]

[CR99] 99.Nilsen P, Ingvarsson S, Hasson H, Thiele Schwarz U von, Augustsson H. Theories, models, and frameworks for de-implementation of low-value care: a scoping review of the literature. Implement Res Pract. 2020;1:2633489520953762. [DOI] [PMC free article] [PubMed]

[CR100] 100.Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist review--a new method of systematic review designed for complex policy interventions. J health serv res policy. 2005;10 Suppl 1:21–34. [DOI] [PubMed]

[CR101] 101.Augustsson H, Casales Morici B, Hasson H, von Thiele Schwarz U, Schalling SK, Ingvarsson S, et al. National governance of de-implementation of low-value care: a qualitative study in Sweden. Health Res Policy Syst. 2022;20(1):92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR102] 102.Gemeinsamer Bundesauschuss (GBA). IndiQ – Entwicklung eines Tools zur Messung von Indikationsqualität in Routinedaten und Identifikation von Handlungsbedarfen und -strategien - G-BA Innovationsfonds; 2024 [cited 2024 Aug 2]. Available from: URL: https://innovationsfonds.g-ba.de/projekte/versorgungsforschung/indiq-entwicklung-eines-tools-zur-messung-von-indikationsqualitaet-in-routinedaten-und-identifikation-von-handlungsbedarfen-und-strategien.325.

PERMALINK

Effectiveness and mechanisms of interventions to reduce low-value thyroid function tests: a systematic review

Carolina Pioch

Meik Hildebrandt

Gregor Goetz

Verena Vogt

Abstract

Objective

Design

Results

Conclusions

Supplementary Information

Introduction

Methods

Registration

Data sources, searches, and selection

Table 1.

Data extraction and analysis

Bias assessment and certainty of evidence

Results

Study selection

Fig. 1.

Study characteristics

Table 2.

Table 3.

Bias assessment

Effectiveness of the interventions

Table 4.

Theoretical foundations and contextual factors

Discussion

Limitations

Conclusion

Supplementary Information

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases