Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE

Rebecca Beynon; Mariska MG Leeflang; Steve McDonald; Anne Eisinga; Ruth L Mitchell; Penny Whiting; Julie M Glanville

doi:10.1002/14651858.MR000022.pub3

. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3

Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE

Rebecca Beynon ¹, Mariska MG Leeflang ^2,^✉, Steve McDonald ³, Anne Eisinga ⁴, Ruth L Mitchell ⁵, Penny Whiting ⁶, Julie M Glanville ⁷

Editor: Cochrane Methodology Review Group

PMCID: PMC7390022 PMID: 24022476

Abstract

Background

A systematic and extensive search for as many eligible studies as possible is essential in any systematic review. When searching for diagnostic test accuracy (DTA) studies in bibliographic databases, it is recommended that terms for disease (target condition) are combined with terms for the diagnostic test (index test). Researchers have developed methodological filters to try to increase the precision of these searches. These consist of text words and database indexing terms and would be added to the target condition and index test searches.

Efficiently identifying reports of DTA studies presents challenges because the methods are often not well reported in their titles and abstracts, suitable indexing terms may not be available and relevant indexing terms do not seem to be consistently assigned. A consequence of using search filters to identify records for diagnostic reviews is that relevant studies might be missed, while the number of irrelevant studies that need to be assessed may not be reduced. The current guidance for Cochrane DTA reviews recommends against the addition of a methodological search filter to target condition and index test search, as the only search approach.

Objectives

To systematically review empirical studies that report the development or evaluation, or both, of methodological search filters designed to retrieve DTA studies in MEDLINE and EMBASE.

Search methods

We searched MEDLINE (1950 to week 1 November 2012); EMBASE (1980 to 2012 Week 48); the Cochrane Methodology Register (Issue 3, 2012); ISI Web of Science (11 January 2013); PsycINFO (13 March 2013); Library and Information Science Abstracts (LISA) (31 May 2010); and Library, Information Science & Technology Abstracts (LISTA) (13 March 2013). We undertook citation searches on Web of Science, checked the reference lists of relevant studies, and searched the Search Filters Resource website of the InterTASC Information Specialists' Sub‐Group (ISSG).

Selection criteria

Studies reporting the development or evaluation, or both, of a MEDLINE or EMBASE search filter aimed at retrieving DTA studies, which reported a measure of the filter’s performance were eligible.

Data collection and analysis

The main outcome was a measure of filter performance, such as sensitivity or precision. We extracted data on the identification of the reference set (including the gold standard and, if used, the non‐gold standard records), how the reference set was used and any limitations, the identification and combination of the search terms in the filters, internal and external validity testing, the number of filters evaluated, the date the study was conducted, the date the searches were completed, and the databases and search interfaces used. Where 2 x 2 data were available on filter performance, we used these to calculate sensitivity, specificity, precision and Number Needed to Read (NNR), and 95% confidence intervals (CIs). We compared the performance of a filter as reported by the original development study and any subsequent studies that evaluated the same filter.

Main results

Ninteen studies were included, reporting on 57 MEDLINE filters and 13 EMBASE filters. Thirty MEDLINE and four EMBASE filters were tested in an evaluation study where the performance of one or more filters was tested against one or more gold standards. The reported outcome measures varied. Some studies reported specificity as well as sensitivity if a reference set containing non‐gold standard records in addition to gold standard records was used. In some cases, the original development study did not report any performance data on the filters. Original performance from the development study was not available for 17 filters that were subsequently tested in evaluation studies. All 19 studies reported the sensitivity of the filters that they developed or evaluated, nine studies reported the specificities and 14 studies reported the precision.

No filter which had original performance data from its development study, and was subsequently tested in an evaluation study, had what we defined a priori as acceptable sensitivity (> 90%) and precision (> 10%). In studies that developed MEDLINE filters that were evaluated in another study (n = 13), the sensitivity ranged from 55% to 100% (median 86%) and specificity from 73% to 98% (median 95%). Estimates of performance were lower in eight studies that evaluated the same 13 MEDLINE filters, with sensitivities ranging from 14% to 100% (median 73%) and specificities ranging from 15% to 96% (median 81%). Precision ranged from 1.1% to 40% (median 9.5%) in studies that developed MEDLINE filters and from 0.2% to 16.7% (median 4%) in studies that evaluated these filters. A similar range of specificities and precision were reported amongst the evaluation studies for MEDLINE filters without an original performance measure. Sensitivities ranged from 31% to 100% (median 71%), specificity ranged from 13% to 90% (median 55.5%) and precision from 1.0% to 11.0% (median 3.35%).

For the EMBASE filters, the original sensitivities reported in two development studies ranged from 74% to 100% (median 90%) for three filters, and precision ranged from 1.2% to 17.6% (median 3.7%). Evaluation studies of these filters had sensitivities from 72% to 97% (median 86%) and precision from 1.2% to 9% (median 3.7%). The performance of EMBASE search filters in development and evaluation studies were more alike than the performance of MEDLINE filters in development and evaluation studies. None of the EMBASE filters in either type of study had a sensitivity above 90% and precision above 10%.

Authors' conclusions

None of the current methodological filters designed to identify reports of primary DTA studies in MEDLINE or EMBASE combine sufficiently high sensitivity, required for systematic reviews, with a reasonable degree of precision. This finding supports the current recommendation in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy that the combination of methodological filter search terms with terms for the index test and target condition should not be used as the only approach when conducting formal searches to inform systematic reviews of DTA.

Plain language summary

Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE

A diagnostic test is any kind of medical test performed to help with the diagnosis or detection of a disease. A systematic review of a particular diagnostic test for a disease aims to bring together and assess all the available research evidence. Bibliographic databases are usually searched by combining terms for the disease with terms for the diagnostic test. However, depending on the topic area, the number of articles retrieved by such searches may be very large. Methodological filters consisting of text words and database indexing terms have been developed in the hope of improving the searches by increasing their precision when these filters are added to the search terms for the disease and diagnostic test. On the other hand, using filters to identify records for diagnostic reviews may miss relevant studies while at the same time not making a big difference to the number of studies that have to be assessed for inclusion. This review assessed the performance of 70 filters (reported in 19 studies) for identifying diagnostic studies in the two main bibliographic databases in health, MEDLINE and EMBASE. The results showed that search filters do not perform consistently, and should not be used as the only approach in formal searches to inform systematic reviews of diagnostic studies. None of the filters reached our minimum criteria of a sensitivity greater than 90% and a precision above 10%.

Background

As with Cochrane reviews of interventions, Cochrane diagnostic test accuracy (DTA) reviews should aim to identify and evaluate as much available evidence about a specific topic as possible within the available resources (DeVet 2008). Thus, a systematic and extensive search for eligible studies is an essential step in any review. Recommendations for searching for DTA studies are that electronic bibliographic databases, such as MEDLINE and EMBASE, should be searched by combining search terms for disease indicators (target condition) with terms for the diagnostic test (index test) (DeVet 2008). Depending on the topic area, the number of articles retrieved by such searches may be too large to be processed with the available resources. A number of methodological filters consisting of text words and database specific indexing terms (such as MEDLINE Medical Subject Headings (MeSH)) have been developed in an attempt to increase the precision of searches and reduce the resources required to process results. These search filters are typically added to a search strategy consisting of the target condition and index test(s).

Methodological search filters have been developed for retrieving articles relating to many types of clinical question, including those about aetiology, diagnosis, prognosis and therapy. These filters are typically combinations of database indexing terms or text words, or both, that reflect the study design and statistical methods reported by the articles’ authors. For example, Haynes and co‐workers have developed a series of filters to assist searchers to retrieve articles according to aetiology, diagnosis, prognosis or therapy (Haynes 1994; Haynes 2005; Haynes 2005a; Wilczynski 2003; Wilczynski 2004). They are available as ‘Clinical Queries’ limits in both PubMed and via the OvidSP interfaces for MEDLINE and EMBASE (NLM 2005; OvidSP 2013; OvidSP 2013a).

Methodological search filters have proved to be particularly effective in identifying intervention (therapy) studies. Within The Cochrane Collaboration, a highly sensitive search strategy is widely used for identifying reports of randomised trials in MEDLINE (Lefebvre 2011).

For DTA studies, however, the relevant methodology is often not well reported by authors in their titles and abstracts. In addition, MEDLINE lacks a suitable publication type indexing term to apply to DTA studies. EMBASE has recently introduced a check tag for DTA studies (diagnostic test accuracy) but this is only being prospectively applied. Some relevant indexing terms do exist in both EMBASE and MEDLINE, for example sensitivity and specificity, however they are inconsistently assigned by indexers to DTA studies (Fielding 2002; Wilczynski 1995; Wilczynski 2005a; Wilczynski 2007). A consequence of adding filters to subject and index term strategies to identify records for DTA reviews is that relevant studies might be missed without, at the same time, significantly reducing the number of studies that have to be assessed for inclusion (Doust 2005; Leeflang 2006; Whiting 2008; Whiting 2011).

We conducted a methodology review of empirical studies that reported the development and evaluation of methodological search filters to retrieve reports of DTA studies in MEDLINE and EMBASE to assess the value of adding methodological search filters to search strategies to identify records for inclusion in DTA reviews. Until now, a comprehensive and systematic review of studies that develop or evaluate diagnostic search filters has not been published. The findings of this review will help to elucidate the performance of these filters to find studies relevant to diagnostic systematic reviews and to allow a recommendation for their use (or not) when conducting literature searches.

Objectives

To systematically review empirical studies that report the development or evaluation, or both, of methodological search filters designed to retrieve diagnostic test accuracy (DTA) studies in MEDLINE and EMBASE.

Methods

Criteria for considering studies for this review

Types of studies

Primary studies of any design were included. Studies in which the main objective was the development or evaluation, or both, of a methodological filter for the purpose of searching for DTA studies in MEDLINE and EMBASE were eligible. We defined a development study as one in which a new filter was conceived, tested in a reference set of diagnostic studies, and the performance reported. An evaluation study was one in which a filter from a development study publication was tested in a new reference set and the performance reported. A study could be both a development and an evaluation study if it reported on the development and performance of a newly designed filter and evaluated a filter which had previously been published by a different development study. We also included filters assessed in evaluation studies for which there was no corresponding development study publication. We excluded studies that developed or evaluated filters designed to retrieve clinical prediction studies or prognostic studies.

Types of data

Eligible studies must have reported the performance of search filters using a recognised measure, such as sensitivity or precision.

Types of methods

Assessments of the performance of search strategies for identifying reports of DTA in MEDLINE and EMBASE.

Types of outcome measures

Eligible outcome measures were those that assessed the accuracy of the search.

Primary outcomes

Measures of search performance, including:

sensitivity (proportion of relevant reports correctly retrieved by the filter);
specificity (proportion of irrelevant reports correctly not retrieved by the filter);
accuracy (the highest possible sensitivity in combination with the highest possible specificity);
precision (the number of relevant reports retrieved divided by the total number of records retrieved by the filter).

We defined a priori the levels of sensitivity (> 90%) and precision (> 10%) from the external validation of evaluation studies as the acceptable threshold for use when searching for DTA studies.

Secondary outcomes

Number Needed to Read (NNR) (also called Number Needed to Screen), which is the inverse of the precision (Bachmann 2002).

Search methods for identification of studies

Electronic searches

The following databases were searched to identify relevant studies: MEDLINE (1950 to week 1 November 2012); EMBASE (1980 to 2012 Week 48); the Cochrane Methodology Register (Issue 3, 2012); ISI Web of Science (11 January 2013); PsycINFO (13 March 2013); Library and Information Science Abstracts (LISA) (31 May 2010); and Library, Information Science & Technology Abstracts (LISTA) (13 March 2013). Three information specialists developed and conducted the searches. The search strategies are listed in the appendices (Appendix 1; Appendix 2; Appendix 3; Appendix 4; Appendix 5; Appendix 6; Appendix 7). No language restrictions were applied.

Searching other resources

We also undertook citation searches of the included studies on Web of Science. Furthermore, reference lists of all relevant studies were assessed (Horsley 2011) and the Search Filters Resource website of the InterTASC Information Specialists' Sub‐Group (ISSG) was screened (InterTASC 2011). InterTASC is a collaboration of six academic units in the UK who conduct and critique systematic reviews for the National Institute for Health and Care Excellence.

Data collection and analysis

Selection of studies

Two authors independently screened the titles and abstracts of all retrieved records. Inclusion assessment of full papers was conducted by one author and checked by a second. Any disagreements were resolved through discussion or referral to a third author.

Data extraction and management

Data extraction was performed by one author and checked by a second; disagreements were resolved through discussion. The ISSG Search Filter Appraisal Checklist (Glanville 2008) was used to structure the data extraction and assessment of methodological quality. This checklist was developed using consensus methods and tested on several filters. It assesses the scope of the filter (limitations, generalisability and obsolescence), and the methods used to develop the filter, including the generation of the reference set.

Data were extracted on the characteristics of the reference set (inclusion of gold and non‐gold standard records, years of publication of the records, journals covered, inclusion criteria, size); how search terms were identified; presence of internal and external validity testing; and any limitations or comparisons between studies. In the context of filter development, the reference set is the same as the reference standard or gold standard in DTA studies. In contrast, the gold standard in the context of filter development is equivalent to diseased individuals in diagnostic accuracy studies (that is the 'relevant' studies) and the non‐gold standard is equivalent to non‐diseased individuals (that is the non‐relevant studies).

Data were also extracted on the date the study was conducted; the date the searches were completed; the database(s) and search interface(s) used; the outcome measures of performance (sensitivity, specificity, precision) and their definitions; and whether the search strategy was developed for specific clinical areas or to identify diagnostic studies over a broad range of topics. We assessed whether the search strategies were described in sufficient detail to be reproducible (that is were the search terms and their combination reported, were the dates of the search reported, and was the interface and database reported?).

Where studies reported data on multiple filters, results were extracted for each filter. However, for filter development studies, if data were also presented on the sensitivity and precision of all tested individual terms, only single term filters that the original authors selected as reporting best performance were extracted, as well as all multiple term filters.

Assessment of risk of bias in included studies

Bias occurs if systematic flaws or limitations in the design or conduct of a study distort the results. Applicability refers to the generalisability of results: can the results of the filter development or evaluation study be applied to other settings with different populations, index tests, reference standards or target conditions?

We identified three areas that we considered to have the potential to introduce bias or affect the applicability of the included studies.

1. Absence of DTA search strategy in reference set development: bias may be introduced when either a development or an evaluation study used a systematic review (or reviews) to provide studies for the reference set, and this systematic review used a search strategy containing diagnostic terms to find primary studies. This could introduce bias because the performance of a filter tested in this reference set will naturally be higher when the difficult to retrieve studies have been missed by the reference set search.

2. Choice of gold standard: concerns about applicability may be introduced in both development and evaluation studies in the generalisability of the filter to all diagnostic studies. Some filters have been developed or evaluated using a reference set that is composed of topic specific studies (such as studies on the diagnosis of deep vein thrombosis), whereas other reference sets will be generic (studies covering a wide range of diagnostic tests and conditions). Ideally, a filter will perform equally well across different topic areas but if it is only evaluated in one specific topic area its performance in other areas will be unclear.

3. Validation of filters in development studies: the process of validation can be split into two parts; the method of internal validation can have bias issues, while the method of external validation (if done) can have both applicability and bias issues. Internal validity is the ability of the filter to find studies from the reference set from which it was developed. A study could be at risk of bias if the internal validation set contained the references from which the filter terms were derived. External validity is the ability of the filter to find studies in a real‐world setting (that is using a reference set composed of topic specific studies). This relates to how generalisable the results are to searching for diagnostic studies for different systematic review topics and most closely relates to how the filters would be used in practice by systematic reviewers. This issue only applies to development studies. A study which has used external validation in a real‐world setting will be judged to have low levels of concern about applicability. However, a study that includes external validity testing could still be at risk of bias if the validity testing occurred in a validation set containing the references used to derive the terms.

Data synthesis

We synthesised performance measures of the filters separately for MEDLINE and EMBASE. We tabulated the performance measures reported by development and evaluation studies grouped by individual filters, so that a comparison could be made between the original reported performance of a filter and its performance in subsequent evaluation studies. If sensitivity, specificity or precision together with 95% confidence intervals (CIs) were not reported in the original reports, these were calculated from the 2 x 2 data, where possible.

Each of the performance measures can be calculated as shown by the formulae below (a further description of performance measures is available in Appendix 8).

		Reference set
		Gold standard records	Non‐gold standard records
Searches incorporating methodological filter	Detected	a (true positive)	b (false positive)
Searches incorporating methodological filter	Not detected	c (false negative)	d (true negative)

Open in a new tab

Sensitivity = a/(a + c)

Precision = a/(a + b)

Specificity = d/(b + d)

Accuracy = (a + d)/(a + b + c + d)

Number needed to read = 1/(a/(a + b))

Reference set = gold standard + non‐gold standard records = (a + b + c + d)

Gold standard = relevant DTA studies = a + c

NB. This is different to the gold (reference) standard in DTA studies, which is equivalent to the reference set in filter evaluations. The gold standard in DTA studies is able to correctly identify the true positives and as well as the true negatives, unlike the gold standard in a filter evaluation study which is limited to the true positives.

Paired results of either sensitivity and specificity or sensitivity and precision for each filter were displayed in receiver operating characteristic (ROC) plots. The original individual filter performance estimates from the development studies were plotted in the same ROC space as the individual filter performance estimates from the evaluation studies, to allow for visual inspection of disparities and similarities. We did not pool data due to heterogeneity across studies.

Results

Description of studies

The searches retrieved 5628 records, of which 19 studies reported in 21 papers met the inclusion criteria (Figure 1). These assessed 57 MEDLINE filters and 13 EMBASE filters.

MEDLINE search filters

Description of development studies

Ten studies reported on the development of 40 MEDLINE filters (range 1 to 12 filters per study). Key features of each study are summarized in the Characteristics of included studies table and Table 1. Thirty‐one filters were composed of multiple terms and nine filters were single term strategies. Nine filters consisted of MeSH terms only, six filters had text words only, and 25 filters combined MeSH with text words. Full details of methods used in each study and the size of the reference set are given in Table 2. A description of each filter and its performance are listed in Table 3.

1. Summary of study designs of MEDLINE filter development studies.

	Author (year)
	Astin 2008	Berg 2005	van der Weijden 1997	Deville 2002	Deville 2000	Haynes 2004	Haynes 1994	Bachmann 2002	Vincent 2003	Noel‐Storr 2011
Method of identification of reference set records (one from list below selected for each study)
Hand‐searching for primary studies	✓	✓	‐	‐	✓	✓	✓	✓	‐	‐
DTA systematic reviews	‐	‐	‐	✓	‐	‐	‐	‐	✓	✓
Personal literature database	‐	‐	✓	‐	‐	‐	‐	‐	‐	‐
If systematic reviews used in reference set development, did they include DTA search terms in search strategy?
	‐	‐	‐	Unclear	‐	‐	‐	‐	✓	X
Reference set also contained non‐gold standard records
	✓	✓	X	NR	✓	✓	✓	✓	X	✓
Description of non‐gold standard records if used in reference set	NR	‐	‐	‐	‐	NR	NR	NR	‐	‐
All studies retrieved by search not classified as gold standard records		✓	‐	‐	‐	‐	‐	‐	‐	✓
False positive papers selected by a previously published search strategy, exclusion of some publication types e.g. reviews and meta‐analyses.		‐	‐	‐	✓	‐	‐	‐	‐	‐
Generic gold standard records i.e. not topic specific
	X	X	X	X	X	✓	✓	✓	X	X
Method of deriving filter terms (a combination of methods could be used)
Analysis of reference set	✓	✓	‐	‐	✓	✓	✓	✓	✓	✓*
Expert knowledge	‐	‐	‐	‐	‐	✓	✓	‐	‐	‐
Adaption of existing filter	‐	✓	‐	✓	‐	‐	‐	‐	✓	‐
Checking key publications for terms and language used	‐	‐	✓	‐	‐	‐	‐	‐	‐	‐
Internal validation in reference set independent from records used to derive filter terms
	✓	X	N/A**	N/A**	X	X	X	✓	X	X
External validation in reference set independent from records used to derive filter terms and internal validation set
	X	X	✓	✓	✓	X	X	✓	X	X

Open in a new tab

*Noel‐Storr derived filter terms by running published search filters in MEDLINE combined with a subject search, locating 10 papers that all filters missed and choosing a term from the title/abstract or keywords of each.

** Only external validation was carried out (no internal validation) in real‐world topics.

Abbreviations used: NR= not reported; N/A= not applicable

2. Study characteristics and methods of MEDLINE development studies.

Author (Year) Study ID	Identification of reference set	How was reference set used	How were search terms identified for filter	Ref set years	# gold standard records	# non‐gold standard records	# journals ref set
Astin 2008	Hand search. Articles reporting on imaging as a diagnostic test in imaging journals. 6 high impact journals used to find studies for development set and 6 lower impact journals used to find studies for validation set. Journals indexed in MEDLINE and were also selected to cover general radiology, specific modalities and specific systems.	Two independent sets of records developed. Test set used to derive terms and test strategies. Validation set used to test external validity	Performed statistical analysis of terms in test set	development set 1985 Clin Radiol, 1988 Am J Neuroradiol; validation set 2000	333 in development set; 186 in validation set	2222 in development set; 1070 in validation set	12 (6 in development set; 6 in validation set)
Berg 2005	Manual review of a certain set of articles found using a search (via PubMed) combining sensitive terms for nursing literature plus cancer‐related fatigue diagnosis terms. Manual review of these articles carried out to find diagnostic studies.	To derive terms and test strategies. Did not validate in a separate set of references	Existing PubMed Clinical Queries filter with extra terms from filters for CINAHL, medical publications, published recommendations & diagnosis definitions. Inductively collected terms derived from indexing of included citations: MeSH terms and frequently used text words in titles/abstracts.	NR	NR	238	NR
van der Weijden 1997	Personal literature database compiled over 10 years 'by every means of literature searching' of studies reporting on erythrocyte sedimentation rate as a diagnostic test.	To test strategies.	Checking key publications for definitions & terms used.	1985‐1994	221	0	NR
Deville 2002	Studies included in two systematic reviews (relative recall).	To test strategies	Adapted three published search strategies	NR	NR	NR	NR
Deville 2000	Reference set of publications found through handsearch of 9 highest rank family medicine journals available on MEDLINE for years 1992‐95. A ‘control’ set of publications for testing validity of strategies was found by adapting Haynes 1991 most sensitive and most specific searches by adding terms, then run in MEDLINE to retrieve all diagnostic primary studies, then limited to the 9 journals.	To derive terms from reference set; to test strategies in control set; to test external validity the best performing filters were compared against Haynes filters in a systematic review (SR) of meniscal lesions in the knee.	Performed statistical analysis of terms in reference set. Univariate analysis to calculate sensitivity, specificity & diagnostic odds ratio (DOR) of all relevant MeSH terms & text words. Models developed by forward stepwise logistic regression analysis.	1992‐1995	75; 33 in meniscal lesions set	2392; NR meniscal lesions set	9
Haynes 2004	Manual review of 161 journals indexed on MEDLINE for year 2000. Journal titles regularly reviewed for appraisal for Evidence Based Medicine, Evidence Based Nursing, Evidence Based Mental Health and ACP Journal Club.	Test strategies and validate. The reference standard could not be divided into a test set and validation set.	MeSH terms and text words listed using expert knowledge of the field.	2000	147	48881	161
Haynes 1994	Manual review of 10 high impact journals for the years 1986 and 1991. The 10 journals searched were American Journal of Medicine, Annals of Internal Medicine, Archives of Internal Medicine, BMJ, Circulation, Diabetes Care, Journal of Internal Medicine, JAMA, Lancet and NEJM	To test strategies and validate.	MeSH terms and text words listed using expert knowledge of the field.	1986 and 1991	92 in 1986 set; 111 in 1991 set	426 in 1986 set; 301 in 1991 set.	10
Bachmann 2002	Hand search European Journal of Paediatrics, Gastroenterology, American Journal of Obstetrics and Gynecology, and Thorax for years 1989 and 1994. Four different journals searched in 1999: NEJM, JAMA, BMJ and Lancet.	1989 set search used to derive terms and test strategies, 1994 and 1999 sets used to validate	Word frequency analysis on titles, abstracts and subject indexes of all references in 1989 set.	1989, 1994 and 1999	83 in 1989 test set; 53 in 1994 validation set; 61 in 1999 validation set.	1646 in 1989 test set; 1744 in 1994 validation set; 7875 in 1999 validation set	8
Vincent 2003	SRs retrieved from MEDLINE and EMBASE on OVID reporting on diagnostic tests for DVT. 16 SRs selected and all articles included that were indexed on MEDLINE became the reference set. Only English language articles included	To test strategies	Adapted from 5 published strategies: CASP, PubMed, Rochester, Deville, and North Thames	1969‐2000	126	0	NR
Noel‐Storr 2011	SR on the volume of evidence in biomarker studies in those with mild cognitive impairment, conducted by the authors.	To derive terms; to test strategies	Published search filters applied in MEDLINE combined with a subject search (Southampton A, Van der Weijden, and Southampton E), 10 papers were missed by all filters. One term from the title/abstract or keywords of each of 10 papers combined in the new filter.	2000‐2011	128 in Sept 2010 set; additional 16 found in update search therefore 144 in August 2011	17266 in Sept 2010 set; additional 1654 found in update search therefore 18920 in August 2011	NR

Open in a new tab

Abbreviations used: NR=Not reported; ref set= reference set

3. Performance of diagnostic filters from MEDLINE development studies.

Author	Filter Description	Interface	Reference set	Sensitivity % (95% CI)	Specificity % (95% CI)	Accuracy (95% CI)	Precision% (95% CI)	NNR (95% CI)
Astin 2008	1. Exp "sensitivity and specificity"/  2. False positive reactions/  3. False negative reactions/  4. du.fs  5. sensitivity.tw  6. (predictive adj4 value$).tw  7. distinguish$.tw  8. differentiat$.tw  9. enhancement.tw  10. identif$.tw  11. detect$.tw  12. diagnos$.tw  13. accura$.tw  14. comparison.tw  15. or/1‐14	Ovid	Derivation set	95.8 (93.1, 97.5)	52.3 (50.2, 54.3)		23.1 (21.0‐25.4)	0.04*
Astin 2008		Ovid	Validation set	96.8 (93.1, 98.5)	43.9 (41.0, 46.9)		23.1 (20.3‐26.2)	0.04*
Berg 2005	Some search terms were combined using "OR" thus increasing sensitivity and reducing specificity (e.g. nursing assessment [MeSH: noexp] AND questionnaire [Text Word])  Exemplary MeSH terms ‐ Diagnosis, Differential; psychological tests; Likelihood functions; Area Under Curve; diagnostic tests; routine; diagnosis [MeSH subheading]; Diagnostic Techniques and Procedures; nursing assessment. Exemplary text words: sensitivity; specificity; predictive value; validity; reliability; likelihood ratio; questionnaire.	PubMed		87	73		Positive likelihood ratio (PLR)=3.2	2.3
Berg 2005	Some search terms were combined using "AND" thus increasing specificity and reducing sensitivity (e.g. nursing assessment [MeSH: noexp] AND questionnaire [Text Word])  Exemplary MeSH terms ‐ Diagnosis, Differential; psychological tests; Likelihood functions; Area Under Curve; diagnostic tests; routine; diagnosis [MeSH subheading]; Diagnostic Techniques and Procedures; nursing assessment. Exemplary text words: sensitivity; specificity; predictive value; validity; reliability; likelihood ratio; questionnaire.	PubMed		76	83		PLR= 6.3	1.7
Haynes 2004	sensitiv:.mp OR diagnos:.mp OR di.fs	Ovid		98.6 (96.8‐100)	74.3 (73.9‐74.7)	74.3 (74.0‐74.7)	1.1 (1.0‐1.3)	0.9*
	High specificity: specificity.tw	Ovid		64.6 (56.9‐72.4)	98.4 (98.2‐98.5)	98.3 (98.1‐98.4)	10.6 (8.6‐12.6)	0.09*
	High Sensitivity: di.xs.	Ovid		91.8 (87.4‐96.3)	68.3 (67.9‐68.7)	68.4 (68.0‐68.8)	0.9 (0.7‐1.0)	1.11*
	sensitiv:.mp OR predictive value:.mp OR accurac:.tw	Ovid		92.5 (88.3‐96.8)	92.1 (91.8‐92.3)	92.1 (91.8‐92.3)	3.4 (2.8‐3.9)	0.29*
	Optimising sensitivitiy and specificity: exp "diagnostic techniques and procedures"	Ovid		66.7 (59.1‐74.3)	74.6 (74.2‐75.0)	74.5 (74.2‐74.9)	0.8 (0.6‐0.9)	1.25*
	Sensitive:.mp. OR diagnos:.mp. OR accuracy.tw.	Ovid		98.0 (95.7‐100.0)	82.7 (82.4‐83.1)	82.8 (82.5‐83.1)	1.7 (1.4‐2.0)	0.59*
	Sensitive:.mp. OR diagnos:mp. OR test:.tw.	Ovid		98.0 (95.7‐100.0)	75.1 (74.8‐75.5)	75.2 (74.8‐75.6)	1.2 (1.0‐1.4)	0.83*
	Specificity.tw. OR predictive value:.tw.	Ovid		72.8 (65.6‐80.0)	97.9 (97.8‐98.1)	97.9 (97.7‐98.0)	9.6 (7.9‐11.3)	0.10*
	Accuracy:.tw. OR predictive value:tw.	Ovid		52.4 (44.3‐60.5)	97.9 (97.8‐98.1)	97.8 (97.7‐97.9)	7.1 (5.6‐8.6)	0.14*
	Sensitive:.mp. OR diagnostic.mp. OR predictive value:.tw.	Ovid		92.5 (88.3‐96.8)	91.8 (91.6‐92.1)	91.8 (91.6‐92.1)	3.3 (2.8‐3.8)	0.30*
	Exp sensitivity and specificity OR predicitive value:.tw.	Ovid		79.6	94.9	94.8	4.5	0.22*
Haynes 1994	Best sensitivity: diagnosis (subheading pre‐explosion) OR specificity (tw)	NR		86	73	73	7	0.14*
	Best accuracy: Exp sensitivity and specificity OR diagnosis (subheading) OR diagnostic use (subheading) OR specificity (tw) OR predicitive (tw) AND value (tw))	NR		86	84	84	13	0.08*
	Best specificity: specificity (tw) OR (predictive (tw) AND value (tw)) OR (false (tw) and positive (tw))	NR		49	98		36	0.03*
	Best specificity: Exp sensitivity and specificity OR predictive (tw) AND value (tw)	NR		55	98		40	0.03*
	Diagnosis (subheading pre‐explosion) OR Specificity (tw)	NR		86	73		7	0.14*
	Best sensitivity: Exp sensitivity and specificity OR diagnosis (subheading pre‐explosion) OR diagnostic use (subheading) OR sensitivity (tw) OR specificity (tw)	NR		92	73		9	0.11*
		NR	Haynes (2004) 2000 ref set	96.6	65	0.008	65.7	0.02*
	Diagnostic use (sh)	NR	1986 set	16	96		10	0.10*
	Diagnostic use (sh)	NR	1991 set	26	96		18	0.06*
	Diagnosis (sh)	NR	1986 set	62	89		9	0.11*
	Diagnosis (sh)	NR	1991 set	59	88		13	0.08*
	Diagnosis& (px)	NR	1986 set	79	74		60	0.02*
	Diagnosis& (px)	NR	1991 set	80	77		90	0.01*
	Exp Sensitivity and Specificity	NR	1991 set	50	98		3	0.33*
	Specificity (tw)	NR	1991 set	54	96
	Sensitivity (tw)	NR	1991 set	57	97
	Sensitivity (tw)	NR	1986 set	43	98		3	0.33*
van der Weijden 1997	MeSH short strategy (terms OR'd together) explode DIAGNOSIS/diagnosis  DIAGNOSIS‐DIFFERENTIAL/all subheadings.  explode SENSITIVITY‐AND‐SPECIFICITY  REFERENCE‐VALUES/all subheadings .  FALSE‐NEGATIVE‐REACTIONS/ all subheadings .  FALSE‐POSITIVE‐REACTIONS/ all subheadings .  explode MASS‐SCREENING/ all subheadings .	OVID		31			34	0.03*
	MeSH extended strategy (terms OR'd together) explode DIAGNOSIS/ all subheadings .  explode SENSITIVITY‐AND‐SPECIFICITY  REFERENCE‐VALUES/all subheadings .  FALSE‐NEGATIVE‐REACTIONS/ all subheadings .  FALSE‐POSITIVE‐REACTIONS/ all subheadings .  Explode MASS‐SCREENING/ all subheadings .	OVID		69			11	0.09*
	MeSH extended and free text strategy explode DIAGNOSIS/ all subheadings .  explode SENSITIVITY‐AND‐SPECIFICITY  REFERENCE‐VALUES/all subheadings .  FALSE‐NEGATIVE‐REACTIONS/ all subheadings .  FALSE‐POSITIVE‐REACTIONS/ all subheadings .  Explode MASS‐SCREENING/ all subheadings .  diagnos* OR sensitivity or specificity OR predictive value* OR reference value* OR ROC* OR likelihood ratio* OR monitoring	OVID		91			10	0.1*
Deville 2002	Sensitivity and specificity [Mesh; exploded] OR mass screening [Mesh; exploded] OR reference values [Mesh] OR false positive reactions [Mesh] OR false negative reactions [Mesh] OR specificit$.tw OR screening.tw OR false positive$.tw OR false negative$.tw	NR	Knee lesions SR	70
Deville 2002			Urine dipstick SR	92
Bachmann 2002	"SENSITIVITY AND SPECIFICITY"# OR predict* OR diagnos* OR sensitiv*	Datastar	1989 test set	92.8 (84.9‐97.3)			15.6	6.4 (5.2‐8.0)
			1994 validation set	98.1			10.9	9.2
			1999 validation set	91.8			4.7	21.3
	"SENSITIVITY AND SPECIFICITY"# OR predict* OR diagnos* OR accura*	Datastar	1989 test set	95.2 (88.1‐98.7)			16.9	5.9 (4.8‐7.3)
			1994 validation set	98.1 (89.9‐99.9)			12 (9.1‐1.4)	8.3 (6.7‐11.3)
			1999 validation set	95.1			5	20.0
Vincent 2003	Strategy A 1. exp 'sensitivity and specificity'/; 2. (sensitivity or specificity or accuracy).tw.; 3. ((predictive adj3 value$) or (roc adj curve$)).tw.; 4. ((false adj positiv$) or (false negativ$)).tw.; 5. (observer adj variation$) or (likelihood adj3 ratio$)).tw.; 6. likelihood function/; 7. exp mass screening/; 8. diagnosis, differential/ or exp Diagnostic errors/; 9. di.xs or du.fs; 10. or/1‐9	Ovid		100			3*	0.33*
	Strategy B 1. exp 'sensitivity and specificity'/; 2. (sensitivity or specificity or accuracy).tw.; 3. (predictive adj3 value$); 4. exp Diagnostic errors/; 5. ((false adj positiv$) or (false adj negativ$)).tw; 6. (observer adj variation$).tw; 7. (roc adj curve$).tw; 8. (likelihood adj3 ratio$).tw.; 9. likelihood function/; 10. exp venous thrombosis/di, ra, ri, us; 11. exp thrombophlebitis/di, ra, ri, us; 12. or/1‐11	Ovid		98.4			5*	0.2*
	Strategy C 1. exp 'sensitivity and specificity'/; 2. (sensitivity or specificity or accuracy).tw.; 3. ((predictive adj3 value$) or (roc adj curve$)).tw.; 4. ((false adj positiv$) or (false negativ$)).tw.; 5. (observer adj variation$); 6. likelihood function/ or; 7. exp Diagnostic errors/; 8. (likelihood adj3 ratio$).tw.; 9. or /1‐8	Ovid		79.4			10*	0.1*
Deville 2000	Strategy 4 SENSITIVITY AND SPECIFICITY (exp) OR specificity (tw) OR false negative (tw) OR accuracy (tw) OR screening (tw)	NR		89.3 (82.3‐96.3)	91.9 (90.8‐93)			DOR 95
		NR	Meniscal lesion	61 (42.1‐77.1)			4.7	0.22*
	Strategy 3 SENSITIVITY AND SPECIFICITY (exp) OR specificity (tw) OR false negative (tw) OR accuracy (tw)	NR		80.0 (71.0‐89.1)	97.3 (96.6‐97.9)		48 (40‐56)	DOR 149
	Strategy 2 SENSITIVITY AND SPECIFICITY (exp) OR specificity (tw) OR false negative (tw)	NR		73.3 (63.3‐83.3)	98.4 (97.9‐98.9)			DOR 170
	Strategy 1 SENSITIVITY AND SPECIFICITY (exp) OR specificity (tw)	NR		70.7 (60.4‐81.0)	98.5 (98.0‐98.9)			DOR 158
Noel‐Storr 2011	1. Disease progression/ 2. di.fs. 3. logitudinal*.ab. 4. Follow‐up studies/ 5. conversion.ab. 6. transition.ab. 7. converters.ab. 8. progressive.ab. 9. “increased risk”.ab. 10. “follow‐up”.ab.	Ovid	2000‐Sept 2010	97 (92‐99)	38 (37‐39)		1.1 (0.95‐1.4)
Noel‐Storr 2011		Ovid	2000‐Aug 2011	98 (94‐100)	38 (37‐39)		1.2 (1.0‐1.4)

Open in a new tab

NR=Not reported

Method of identification of reference set records

Different methods were used to compile the reference sets. Six studies handsearched journals to obtain a database of ‘gold standard’ references reporting relevant DTA studies (Astin 2008; Berg 2005; Deville 2000; Haynes 1994; Haynes 2004).

Three studies used a relative recall reference standard, that is the reference set was based on studies included in systematic reviews. Deville 2002 used references from two published systematic reviews (on diagnosing knee lesions and the accuracy of urine dipstick testing) that had formed part of the first author's thesis. Noel‐Storr 2011 used the references from a systematic review on the volume of evidence in biomarker studies in people with mild cognitive impairment. Another study (van der Weijden 1997) developed a reference set based on a personal literature database on erythrocyte sedimentation rate as a diagnostic test, compiled over 10 years ‘by every means of literature searching’. Finally, one study used a validated filter to locate systematic reviews indexed in MEDLINE and EMBASE reporting on diagnostic tests for deep vein thrombosis, and used the studies included in these reviews as the reference set (Vincent 2003).

Two of the 10 studies described above included all articles that were retrieved by the search for gold standard records but which were subsequently rejected from the gold standard as the non‐gold standard records in the reference set (Berg 2005; Noel‐Storr 2011). A third study used the false positive articles selected by a search using a previously published diagnostic search strategy as the non‐gold standard records in the reference set (Deville 2000). This study further restricted the non‐gold standard studies by excluding reviews, meta‐analyses, comments, editorials and animal studies. The remaining studies that included non‐gold standard records in their reference set did not provide details on how these were identified.

Composition of reference set

Seven studies included both gold and non‐gold standard references in their reference sets (Astin 2008; Bachmann 2002; Berg 2005; Deville 2000; Haynes 1994; Haynes 2004; Noel‐Storr 2011) and two studies used only gold standard studies (van der Weijden 1997; Vincent 2003). One study did not give any details on the composition of the reference set (Deville 2002). It was possible to calculate sensitivity, specificity, precision and NNR from the studies that had a reference set compiled of both included DTA studies (gold standard references) and studies that did not meet the criteria of a DTA study (non‐gold standard references) if 2 x 2 data were available. However, it was not possible to calculate specificity or precision from a reference set composed of only included DTA references. This was because the percentage of correctly non‐identified studies cannot be calculated since data for only half of a 2 x 2 table were available.

Of the six studies that used handsearching to develop the reference set, two studies concentrated on specific topic areas. Astin 2008 included records on imaging as a diagnostic test and Berg 2005 included articles from the nursing literature on cancer‐related fatigue diagnosis. The remaining studies that had a handsearched reference set were not topic specific. The two studies that used published systematic reviews to compile the reference set, and the study which used a personal literature database, were all topic specific.

Where reported, the mean number of gold standard studies in the reference set was 128 (range 33 to 333) from a mean of 35 journals (range 9 to 161). Of the studies that used reference sets which included non‐gold standard as well as gold standard records, the mean number of overall references included was 8582 (range 238 to 48,881).

Method of identification of search terms

Three studies used the reference set to derive search terms by performing statistical analysis on terms found in titles, abstracts and subject headings (Astin 2008; Bachmann 2002; Deville 2000). Three studies adapted existing search strategies (Berg 2005; Deville 2002; Vincent 2003), one of which expanded the existing filters by adding frequently occurring MeSH terms and text words found in titles and abstracts of the reference set (Berg 2005). Vincent 2003 also combined the use of existing filters with the results of reference set analysis. Of the remaining four studies, one used expert knowledge of the field to generate a list of terms (Haynes 1994), one used expert knowledge and analysis of the reference set (Haynes 2004), one checked key publications for the definitions and terms used (van der Weijden 1997), and one analysed terms in 10 studies missed by the three most sensitive published filters (Noel‐Storr 2011).

Description of studies that evaluated published MEDLINE filters

Ten evaluation studies that assessed 30 MEDLINE filters were included (Table 4; Table 5). Of these, three were development studies that also evaluated published filters and were therefore classed as both development and evaluation studies (Deville 2000; Noel‐Storr 2011; Vincent 2003). Most filters (n = 23) were evaluated by at least two studies. The median number of filters evaluated in a each study was 6, but ranged from 1 (Deville 2000; Kastner 2009) to 22 (Noel‐Storr 2011; Ritchie 2007; Whiting 2010).

4. Summary of study design characteristics of MEDLINE filter evaluation studies.

	Evaluation study: Author (year)
	Kastner 2009	Ritchie 2007	Leeflang 2006	Kassai 2006	Doust 2005	Whiting 2010	Vincent 2003	Deville 2000	Mitchell 2005	Noel‐Storr 2011
Method of identification of reference set records(one from list below selected for each study)
Handsearching for primary studies	‐	‐	‐	‐	‐	‐	‐	✓	✓	‐
Internet search for DTA systematic reviews	✓	‐	✓	‐	‐	‐	✓	‐	‐	‐
Systematic reviews conducted by authors	‐	✓	‐	‐	✓	✓	‐	‐	‐	✓
Primary studies identified through Internet search				✓	‐	‐	‐	‐	‐	‐
If systematic reviews used in reference set development, did they include DTA search terms in search strategy?	✓	X	Unclear	‐	✓	X	✓	‐	‐	X
Reference set also contained non‐gold standard records	X	✓	X	✓	X	✓	X	✓	✓	✓
Description of non‐gold standard records if contained in reference set	‐	‐	‐	‐	‐	NR	‐	‐	NR	‐
All studies retrieved by search not classified as gold‐standard records	‐	✓	‐	✓	‐	‐	‐	‐	‐	✓
False positive papers selected by a previously published search strategy, exclusion of some publication types e.g. reviews and meta‐analyses.	‐	‐	‐	‐	‐	‐	‐	✓	‐	‐
Generic gold standard records I.e. not topic specific	✓	X	✓	X	X	✓	✓	X	X	X

Open in a new tab

5. Study characteristics and methods of included MEDLINE filter evaluation studies.

Study	Identification of reference set	Reference set selection criteria	Ref set years	# gold standard records	# non‐gold standard records	# journals ref set if handsearched	Definition of DTA study if handsearched gold standard identified	Description of filter allows reproducibility	Definitions of Se & Sp	Number of filters evaluated
Kastner 2009	Included studies from 12 published SRs on the ACP Journal Club website and indexed on MEDLINE or EMBASE.	Eligibility criteria for including SR were: published in 2006; incorporated a MEDLINE and EMBASE search as a data source; and available and downloadable in electronic format. In addition the review cannot have used the Clinical Queries filter, but other search filters were permissible.	2006 (date publication SRs	441	0	Not given. The 12 SRs were from 9 journals.	The study compared at least two diagnostic test procedures with one another.	yes	no	1
Ritchie 2007	SR of DTA studies for UTI in young children carried out by the authors	Included studies that could be identified in Ovid MEDLINE	1966‐2003	160	27804	NA	NA	no	no	22
Leeflang 2006	Included studies from 27 published SR. Reviews selected after an electronic search for SRs of DTA studies published between January 1999 and April 2002 in MEDLINE, EMBASE, DARE and Medion	Criteria for inclusion SRs: assessment of DTA, the inclusion of >10 original studies with inclusion not based on design characteristics, and sufficient data to reproduce the contingency table. Exclusion of reviews that reported the application of a diagnostic search filter.	1999‐2002	820	0	NA	NA	yes	no	12
Kassai 2006	Used PubMed interface to search MEDLINE, Science Citation Index, EMBASE and Pascal Biomed for relevant articles using search strategies with terms (MeSH and free text for MEDLINE) related to venous thrombosis, venography and ultrasonography in all databases.	Any relevant article retrieved through topic search on MEDLINE, Science Citation Index, EMBASE and Pascal Biomed	1966‐2002	237	1236	NR	NR		yes	3
Doust 2005	Included studies from two SRs: tympanometry (TR) for the diagnosis of otitis media with effusion in children, and natriuretic peptides (NPR). Initial list of citations was generated from MEDLINE using the search strategy used by the sensitivity option of the Clinical Queries filter for DTA in PubMed. Reference lists of potentially relevant papers and review articles were checked for further possible papers.	Included in two SRs conducted by the authors	TR 1966‐2001; NPR 1994‐2002	TR n=33; NPR n=20	TR n=0; NPR n=0	TR n=22; NPR n=16	NR	yes	yes	5
Whiting 2010	Test accuracy studies indexed on MEDLINE from 7 SRs carried out by authors. Relative recall reference set.	All included studies indexed on MEDLINE from 7 SRs of DTA. SRs that conducted extensive searches that were not limited using methodological filters or search terms relating to measures of test accuracy	NR	506	25880**	NR	Studies in which cross‐tabulation data comparing the results of the index test with the reference standard were available.	yes	yes	22
Vincent 2003	SRs retrieved from MEDLINE and EMBASE on OVID using validated SR filter on diagnostic tests for DVT. 16 SR selected and all articles included that were indexed on MEDLINE became the reference set. Only English language articles included	Studies included in 16 SRs that compared one of the specified diagnostic tests for DVT against a venogram.	1969‐2000	126	0	NR	Compared specified diagnostic test to reference standard	yes	yes	5
Deville 2000	Adapted Haynes 1991 most sensitive and specific filter by adding terms. Ran search in MEDLINE to retrieve all primary DTA studies. Second set of references selected on diagnosis of meniscal lesions of the knee for external validity testing. No further details on how this set was selected are provided.	Primary DTA studies indexed on MEDLINE; studies included on physical tests for the diagnosis of meniscal lesions of the knee.	1992‐1995	75; 33 in meniscal lesions set	2392; NR in meniscal lesions set		Diagnostic test was compared with a reference standard	yes	yes	1
Mitchell 2005	Handsearch of the 3 top ranking renal journals for the years 1990‐1991 and 2002‐2003.	Primary DTA studies that could be identified in MEDLINE on the diagnosis of kidney disease	1991‐1992 2002‐2003	99	4409	3	A test or tests being compared to a reference standard in a human population	yes*	NR	6
Noel‐Storr 2011	SR on the volume of evidence in biomarker studies in those with mild cognitive impairment, conducted by the authors.	Primary DTA longitudinal studies indexed on MEDLINE with at least one follow‐up period; at least one of biomarkers of interest used as test of interest; included subjects with objective cognitive impairment at baseline, no dementia.	2000‐Sept 2010; 2000‐Aug 2011	128 Sept 2010; 144 Aug 2011	17266 Sept 2010; 18920 Aug 2011	NR	NA	yes*	no	22

Open in a new tab

Abbreviations used: TR= Tympanometry review; NPR= Natriuretic peptides review; SR= systematic review; NA= not applicable; NR= not reported; ref set= reference set; Se= sensitivity; Sp= specificity.

* Full strategy obtained from authors

** Number of gold‐standard records obtained from authors

Method of identification of reference set records

Seven studies used a relative recall reference set consisting of studies included in DTA systematic reviews (Doust 2005; Kastner 2009; Leeflang 2006; Noel‐Storr 2011; Ritchie 2007; Vincent 2003; Whiting 2010). Of these, three studies located systematic reviews through electronic searches (Kastner 2009; Leeflang 2006; Vincent 2003) and four studies used a convenience sample of systematic reviews that either the authors or colleagues had undertaken themselves (Doust 2005; Noel‐Storr 2011; Ritchie 2007; Whiting 2010). One study used references located through handsearching of the nine highest ranking journals available on MEDLINE (Deville 2000); one study handsearched three high ranking renal journals (as identified by the authors) for primary studies on the diagnosis of renal disease (Mitchell 2005); and one study used an electronic search for primary DTA studies related to venous thrombosis, venography and ultrasonography (Kassai 2006).

Three of the studies that used a relative recall reference set included reviews which used a methodological filter to find diagnostic studies in addition to terms for test and condition (Doust 2005; Kastner 2009; Vincent 2003). One of these studies supplemented the search, which had first used the Clinical Queries diagnostic filter in PubMed, by searching the reference lists of included studies (Doust 2005).

Two studies included all articles that were retrieved by the search for gold standard records but which were subsequently rejected from the gold standard as the non‐gold standard records in the reference set (Kassai 2006; Ritchie 2007). A third study used the false positive articles selected by a search using a previously published diagnostic search strategy as the non‐gold standard records in the reference set (Deville 2000). This study further restricted the non‐gold standard studies by excluding reviews, meta‐analyses, comments, editorials and animal studies. The remaining studies that included non‐gold standard records in their reference set did not provide details on how these were identified.

Composition of reference set

Three of the seven studies derived their reference set from a systematic review that used gold standard and non‐gold standard studies (Noel‐Storr 2011; Ritchie 2007; Whiting 2010); the remaining four studies used a reference set comprised of only gold standard studies (Doust 2005; Kastner 2009; Leeflang 2006; Vincent 2003). The three studies which used an electronic search or a handsearch to find primary studies also included non‐gold standard studies in their reference sets (Deville 2000; Kassai 2006; Mitchell 2005).

The number of gold standard studies included in the reference standard ranged from 53 from two systematic reviews (Doust 2005) to 820 from 27 reviews (Leeflang 2006). In all studies that also included non‐gold standard studies, the number of irrelevant studies ranged from 1236 to 27,804.

Description of evaluated filters

All but one of the search strategies combined MeSH terms and text words; one used the single term strategy “specificity.tw” (Whiting 2010). Two of the evaluated filters that were displayed were based on the same original strategy by Haynes 1994. Falck‐Ytter 2004 presented an alternative interpretation of the original filter in a PubMed format.

EMBASE search filters

Description of development studies

Two studies reported the development of 12 search filters for finding DTA studies indexed in EMBASE (Table 6; Table 7) (Bachmann 2003; Wilczynski 2005). Eleven of the filters were composed of multiple terms. Table 6 gives a summary of the study design characteristics of the included studies.

6. Summary of study design characteristics of EMBASE filter development studies.

	Author
	Bachmann 2003	Wilczynski 2005
Method of identification of reference set records (one from list below selected for each study)
Hand‐searching for primary studies	✓	✓
DTA systematic reviews	‐	‐
Personal literature database	‐	‐
Reference set also contained non‐gold standard records	✓	✓
Description of non‐gold standard records if contained in reference set	‐	NR
All records retrieved by search that were not classified as gold standard studies	✓
Generic gold standard records i.e. not topic specific	✓	✓
Method of deriving filter terms (a combination of methods could be used)
Analysis of reference set	✓	✓
Expert knowledge	‐	✓
Adaption of existing filter	‐	‐
Checking key publications for terms and language used	‐	‐
Internal validation in reference set independent from records used to derive filter terms	x	X
External validation in reference set independent from records used to derive filter terms and internal validation set	x	x

Open in a new tab

Abbreviations used: NR= not reported

7. Study characteristics and methods of EMBASE filter development studies.

Study	Identification of reference set	How was reference set used	How were search terms identified for filter	Ref set years	# gold standard records	# non‐gold standard records	# journals ref set
Bachmann 2003	Handsearching of all issues of NEJM, Lancet, JAMA and BMJ published in 1999.	To derive terms; to test strategies	Word frequency analysis on title, abstract and subject indexing of handsearched records	1999	61	6082	4
Wilczynski 2005	Handsearching each issue of 55 journals in 2000.	To test strategies	Initial list of MeSH terms and text words compiled using knowledge of the field and input from librarians and clinicians. Stepwise logistic regression used to improve performance of filters.	2000	97	27,672	55

Open in a new tab

Abbreviations used: ref set= reference set

Method of identification of reference set records

In both studies the reference set was generated by handsearching journals, and included both gold standard and non‐gold standard records. One study reported that the non‐gold standard records were identified as all articles retrieved by the search that were not classified as gold‐standard records (Bachmann 2003). The other study was not clear about how non‐gold standard records were selected (Wilczynski 2005).

Composition of reference set

Both studies included both gold standard and non‐gold standard records in the reference set.

Method of identification of search terms

One study used the reference set to derive filter terms using word frequency analysis (Bachmann 2003). The other study initially identified terms for the filter by consulting experts and then entered the terms into a logistic regression model to find the most frequently occurring terms (Wilczynski 2005).

Description of studies that evaluated published EMBASE filters

Three studies evaluated four filters designed to find DTA studies in EMBASE (Table 8; Table 9) (Kastner 2009; Mitchell 2005; Wilczynski 2005). One filter was evaluated by two studies, and three filters were evaluated by only one study. A summary of the study design characteristics of included studies is in Table 8.

8. Summary of study design characteristics of EMBASE filter evaluation studies.

	Author
	Kastner 2009	Wilczynski 2005	Mitchell 2005
Method of identification of reference set records(one from list below selected for each study)
Handsearching for primary studies	‐	✓	✓
Internet search for DTA systematic reviews	✓	‐	‐
Systematic reviews conducted by authors	‐	‐	‐
Primary studies identified through Internet search		‐	‐
If systematic reviews used in reference set development, did they include DTA search terms in search strategy?	✓	‐	‐
Reference set also contained non‐gold standard records	x	✓	✓
Description of non‐gold standard records if contained in reference set	NR	NR	NR
Generic gold standard records i.e. not topic specific	✓	✓	x

Open in a new tab

Abbreviations used: NR= not reported

9. Study characteristics and methods of studies evaluating EMBASE filters.

Study	Identification of gold standard	Reference set selection criteria	Ref set years	# gold standard studies ref set	# non‐gold standard studies in ref set	# journals ref set for handsearched gold standard	Definition of DTA study	Description of filter allows reproducibility	Definitions of Se & Sp	Number of filters evaluated
Kastner 2009	Included studies from 12 published SRs on the ACP Journal Club website and indexed in MEDLINE or EMBASE.	Eligibility criteria for including SR were: published in 2006; incorporated a MEDLINE and EMBASE search as a data source; and available and downloadable in electronic format. In addition the review cannot have used the Clinical Queries filter.	2006 (date SRs published)	441	441	NA	The study compared at least two diagnostic test procedures with one another.	yes	no	1
Wilczynski 2005	Handsearch of each issue of 55 journals in 2000.	Studies indexed in EMBASE found through handsearching which met the methodological criteria for a diagnostic study:	2000	97	27575	55	Inclusion of spectrum of participants; reference standard; participants received both the new test of reference standard; interpretation of index test without knowledge of reference standard and vice versa; analysis consistent with study design.	yes	yes	2
Mitchell 2005	Handsearch of the 3 top ranking renal journals for the years 1990‐1991 and 2002‐2003	Primary DTA studies that could be identified in EMBASE reporting on the accuracy of tests for kidney disease diagnosis	1991‐1992 2002‐2003	96	3984	3	A test or tests being compared to a reference standard in a human population	yes*	no	4

Open in a new tab

Abbreviations used: ref set= reference set; Se= sensitivity; Sp= specificity

Method of identification of reference set records

One study used studies from 12 published systematic reviews to construct the reference standard (Kastner 2009). The other two EMBASE filter studies identified primary DTA studies through handsearching (Mitchell 2005; Wilczynski 2005). Neither study which had included non‐gold standard records described how those articles were identified.

Composition of reference set

Two studies included both gold standard and non‐gold standard records in the reference set (Mitchell 2005; Wilczynski 2005). The number of gold standard records ranged from 96 to 441. The number of non‐gold standard records ranged from 3984 to 27,575.

Description of evaluated filters

One evaluated filter consisted of MeSH terms and text words, the other three filters consisted of text words only. Every filter combined multiple terms.

Risk of bias in included studies

The methodological quality of the identified studies was not formally assessed using a validated tool, but we identified three areas that could affect the methodological quality of the studies in terms of the risk of bias and applicability as described above (see Assessment of risk of bias in included studies).

1. Use of systematic reviews to compile reference set search strategy

MEDLINE development and evaluation studies

Of the eight studies which used systematic reviews to compile their reference sets, three used reviews which did not include diagnostic terms in their search strategies and were at low risk of bias; one development and evaluation study and two evaluation studies specified that they only included systematic reviews which had not used a diagnostic search filter (Noel‐Storr 2011; Ritchie 2007; Whiting 2010). The systematic reviews used by Whiting and Noel‐Storr were conducted by the authors, therefore the reviewers could be sure that no such filter was applied. Ritchie also used a systematic review carried out by Whiting, which did not use a diagnostic filter.

Three studies used reviews with diagnostic terms in their search strategies and were therefore at high risk of bias. One was a development and evaluation study which contained the references from 16 systematic reviews and, of these, at least one used a diagnostic filter (Vincent 2003). Some of the other systematic reviews did not report whether they used a diagnostic filter or not, while the remaining studies were not available. Two evaluation studies also used reviews with diagnostic filter terms. Kastner's reference set contained the studies from 12 systematic reviews and, of these, just over half used diagnostic terms in their search strategies (Kastner 2009). Doust 2005 conducted two systematic reviews which were used in reference set development, and the search strategy for these applied the PubMed Clinical Queries filter for diagnostic studies.

For one development and one evaluation study, it was not clear whether the systematic reviews used a diagnostic filter in their searches (Deville 2002; Leeflang 2006). The risk of bias for these studies was unclear. The original source of the review used by Deville was not available (from the author's thesis), but a meta‐analysis published by the same author on the same topic did describe the use of diagnostic terms in the search strategy. Leeflang stated in their discussion that while they attempted to exclude any review which used a diagnostic filter in their literature search, they found that of the 27 reviews where the studies were included, seven did not describe their search in detail.

EMBASE development and evaluation studies

Only one evaluation study, reporting an EMBASE filter, used the studies from systematic reviews to compile the reference set, and just over half of the 12 systematic reviews used diagnostic terms in their search strategies (Kastner 2009). This study was, therefore, judged to be at high risk of bias.

2. Choice of gold standard records

MEDLINE development and evaluation studies

Of 17 studies, three development and three evaluation studies used generic gold standard records and caused a low level of concern regarding applicability (Bachmann 2002; Haynes 1994; Haynes 2004; Kastner 2009; Leeflang 2006; Whiting 2010). Of these, the development studies handsearched a broad range of general medical journals while the evaluation studies used the included studies from systematic reviews covering a range of diagnostic tests and conditions.

Four development studies used topic specific gold standard records to develop their filters (Astin 2008; Berg 2005; Deville 2002; van der Weijden 1997). In addition, the three studies which both developed and evaluated filters also used topic specific records (Deville 2000; Noel‐Storr 2011; Vincent 2003). Four evaluation studies used topic specific gold standard records to test the performance of published filters (Doust 2005; Kassai 2006; Mitchell 2005; Ritchie 2007). These studies caused high levels of concern regarding applicability as they were only likely to be applicable to the particular topic area in which they were developed or evaluated. The topics included in these studies varied in their breadth, for example a very narrow topic was used by Kassai 2006 (limited to studies comparing ultrasound to venography for the diagnosis of deep vein thrombosis), whereas Deville 2000 included studies on diagnostic tests from nine family medicine journals. Other topics included diagnostic tests in radiology and biomarkers for mild cognitive impairment. Noel‐Storr 2011 designed their filter to specifically retrieve longitudinal DTA studies and evaluated published filters for their ability to retrieve delayed cross‐sectional DTA studies.

EMBASE development and evaluation studies

All but one of the four studies that developed or evaluated a diagnostic EMBASE filter used a set of gold standard records derived from on a broad range of topics and tests. One evaluation study handsearched the three top ranking renal journals for studies on the diagnosis of kidney disease (Mitchell 2005).

3. Validation of filters

MEDLINE development studies

Of the 10 studies reporting the development of a MEDLINE filter, two studies used discrete derivation and validation sets of references to test internal validity and were considered to be at low risk of bias (Astin 2008; Bachmann 2002). Astin handsearched six high ranking radiology journals to find studies for the derivation set and used a different set of six journals to compile studies for the validation set. Bachmann handsearched journals in different years; the studies found in 1989 comprised the set of references used to derive terms, while the studies from 1994 comprised the validation set.

Six of the remaining studies used an internal validation set which contained the references used to derive the terms for the filter and the studies were therefore judged to be at high risk of bias (Berg 2005; Deville 2000; Haynes 1994; Haynes 2004; Noel‐Storr 2011; Vincent 2003). Of these studies, three independently selected terms to use as part of their filters, but the final strategies (made up of those terms) were derived from testing in the same set of references (Haynes 1994; Haynes 2004; Vincent 2003). Also of note, Noel‐Storr 2011 derived filter terms by running published search filters in MEDLINE combined with a subject search, locating 10 papers that all filters missed and choosing a term from the title, abstract or keywords of each. These 10 papers were included in the reference set of 144 studies.

Two studies did not perform internal validity testing of the two filters that had been developed, rather specific diagnostic topics (reviews) were used only to externally validate (Deville 2002; van der Weijden 1997). These studies reported sensitivities > 90% for their most sensitive filters.

Four studies carried out external validation of their filters in a validation set that represented real‐world settings, and the filters were judged to cause low levels of concern about applicability (Bachmann 2002; Deville 2000; Deville 2002; van der Weijden 1997). The remaining studies did not validate their filters in real‐world settings and were considered to cause high levels of concern regarding applicability (Astin 2008; Berg 2005; Haynes 1994; Haynes 2004; Noel‐Storr 2011; Vincent 2003).

EMBASE development studies

Both EMBASE development studies were at high risk of bias in this domain because neither study used a set of records independent from those used to derive the terms to internally validate their strategies (Bachmann 2003; Wilczynski 2005). Bachmann used word frequency analysis of all the titles and abstracts of studies included in the reference set to find and combine the 10 terms with the highest sensitivity and precision. Wilczynski first derived a list of potential diagnostic terms from clinical studies and then from clinicians and librarians. The individual search terms with sensitivity > 25% and specificity > 75%, when tested in the reference set, were then combined into the search strategies.

Neither study externally validated their newly developed filters and were therefore judged to have high concerns regarding applicability in this domain.

Effect of methods

1. Performance of MEDLINE filters as reported in development studies

Sensitivity ranged from 16% to 100% (median 86%; 39 filters, 10 studies), specificity ranged from 38% to 99% (median 88.5%; 30 filters, 6 studies) and precision ranged from 0.8% to 90% (median 9.3%; 32 filters, 8 studies) (Table 3).

2. Performance of evaluated MEDLINE filters

Performance data on each evaluated filter can be found in Table 10 and full search strategies can be found in Appendix 9. Thirteen of the 30 MEDLINE filters assessed by the evaluation studies had original performance data available from development studies. The other 17 filters were reported without any details on how they were developed or their performance.

10. MEDLINE filters evaluated by two or more studies (values given in percentages).

	SENSITIVITY																SPECIFICITY								PRECISION
	ORIGINAL DEVELOPMENT STUDY	RITCHIE	WHITING	LEEFLANG	KASTNER	*DOUST TR	*DOUST NPR		VINCENT	DEVILLE	DEVILLE ML		KASSAI	MITCHELL	NOEL‐STORR		ORIGINAL DEVELOPMENT STUDY		WHITING		MITCHELL		NOEL‐STORR		ORIGINAL DEVELOPMENT STUDY		RITCHIE		WHITING		*DOUST TR		*DOUST NPR	DEVILLE ALL	DEVILLE ML	MITCHELL		NOEL‐STORR
Original development study did report performance data
Bachmann 2002 Sensitive	95	74	87	88		70	90							84	84		NR		37		80		36		5.0		1.4		3.0		5.0		4.0			8.8		0.2
Haynes 2004 Sensitive	99	69	80	87	88	70	100							67	69		74		41		85		45		1.1		1.3		3.0		4.0		5.0			9.1		0.9
Haynes 2004 Specific	65	21	43	28											14		98		94				95		10.6		6.7		15.0									2.0
Deville 2000 Strategy 4	ALL=89	46	68	46		58	100		75					49	55		92		81		95		82		NR		4.4		7.0		9.0		9.0			16.7		2.2
	ML=61																								4.7
Haynes 1994 Specific	55	33	55	29											51		98		90				88		40.0		7.4											3.0
Haynes 1994 Sensitive**	92	70	85	81					96	73	45		95	80	91		73		23		80		32		9.0		1.5							29	3.4	5.3		1.0
Vincent 2003 Strategy C	79	87	67	44											54		NR		85				83		10.0		3.3		9.0									2.3
van der Weijden 1997 Sensitive	91		87	92		73	100							96	93		NR		15		96		30		NR				2.0		4.0		4.0			5.6		1.0
Deville 2002 Accurate	KSR=70 USR=92			51
Haynes 1994 Accurate	86			81													84								13.0
Deville 2000 Strategy 3	80			41													97								48.0
Deville 2000 Strategy 1	71												76				99
Vincent 2003 Strategy A	100	87												81			NR				81				2.5		3.3									5.5
Original development study did NOT report performance data
Falck‐Ytter 2004**	NR	74	85													71		NR		39		51		NR		1.3		3.0									1.1
CASP 2002^$	NR	73	83					100				95				67		NR		53		49		NR		1.2		3.0									1.0
Deville 2002a Extended	NR	52	71			58	100									60		NR		78		78		NR		3.9		7.0		8.0		6.0					2.0
Aberdeen InterTASC 2011^$	NR	69	86													87		NR		39		33		NR		1.2		3.0									1.0
Southampton A InterTASC 2011^$	NR	71	86													93		NR		13		29		NR		1.0		2.0									1.0
Southampton B InterTASC 2011^$	NR	45	69													55		NR		80		81		NR		4.6		7.0									2.1
Southampton C InterTASC 2011	NR	31	56													51		NR		90		88		NR		8.5		11.0									3.0
Southampton D InterTASC 2011	NR	66	84													89		NR		21		42		NR		1.1		2.0									1.1
Southampton E InterTASC 2011^$	NR	71	87													92		NR		14		31		NR		1.0		2.0									1.0
CRD A InterTASC 2011	NR	53	73													70		NR		62		58		NR		2.2		4.0									1.2
CRD B InterTASC 2011	NR	40	64													67		NR		81		71		NR		4.1		7.0									1.7
CRD C InterTASC 2011	NR	69	85													90		NR		24		43		NR		1.2		2.0									1.2
HTBS InterTASC 2011	NR	46	69													56		NR		83		80		NR		3.7		8.0									2.0
Shipley Miner 2002	NR	48	72													63		NR		73		73		NR		1.8		5.0									1.7
Deville 2002a Accurate	NR			88														NR						NR
University of Rochester 2002^$	NR							79										NR						NR
North Thames 2002^$	NR							53										NR						NR

Open in a new tab

* Doust combines each methodological filter with a content filter for a Tympanometry systematic review (TR) and a Natriuretic peptides systematic review (NPR), this is the reason for two results being reported for each filter.

Similarly, Deville (2000) uses an independent set of references to externally validate their own filter and the Haynes 1994 sensitive filter; ALL= all references in main reference set; ML= references on the diagnosis of meniscal lesions of the knee.

** Falck‐Ytter filter is an adaption of the Hanyes 1994 sensitive filter for OVID into a PubMed format (alternative to the PubMed Clinical Queries adaption of the same filter).

Abbreviations used: KSR= Knee lesion systematic review; USR= Urine dipstick systematic review.

^$ Filter no longer available from source cited by evaluation studies.

None of the filters tested in development or evaluation studies had sensitivity > 90% and precision > 10%. The original studies reported sensitivities ranging from 55% to 100% (median 86%); evaluation studies reporting on the same 13 filters had sensitivities ranging from 14% to 100% (median 73%). Doust 2005 evaluated the two strategies with 100% sensitivity in a reference set composed of included studies from a systematic review of natriuretic peptides. The original searches for the two systematic reviews used the PubMed Clinical Queries filter (from Haynes 2004), supplemented by screening the reference lists of included studies. This might explain why the evaluated filters performed so well in this reference set. The sensitivities of the 18 evaluated filters that did not have accompanying original performance data ranged from 40% to 100% (median 71%).

Specificity was only reported in the original study and three evaluation studies (Mitchell 2005; Noel‐Storr 2011; Whiting 2010) for four filters and ranged from 73% to 98% (median 94.5%) in the original study and from 15% to 96% (median 81%) in the evaluation studies. Similarly, precision was only reported in both the original study and evaluation studies for seven filters and ranged from 1.1% to 40% (median 9.5%) in the original study and from 0.2% to 16.7% (median 4%) in the evaluation studies. Similar ranges of specificities and precision were reported in the evaluation studies for the 17 filters without an original performance measure. Sensitivities ranged from 31% to 100% (median 71%), specificity ranged from 13% to 90% (median 55.5%) and precision from 1.0% to 11.0% (median 3.35%).

Original estimates of sensitivity were higher than those reported in the evaluation studies in 43 of 53 comparisons. (If an evaluation study had two reference sets, it contributed twice to the total number of comparisons for each filter evaluated.) Original estimates of specificity were higher in 10 of 14 comparisons, and precision was higher in 16 of 25 comparisons. None of the evaluated filters performed consistently well for any of the performance measures reported by evaluation studies (Table 10).

Seven filters had data on both sensitivity and specificity from the original development study and at least one evaluation study (Figure 2). Original estimates showed greater sensitivity and specificity than the estimates from the evaluation studies. The results from the development studies followed a more uniform pattern along a curve, whereas the estimates from the evaluation studies were more heterogenous, especially for specificity. There were two outliers in the evaluation study results: Mitchell’s (Mitchell 2005) measure of van der Weijden’s (van der Weijden 1997) sensitive filter with very high sensitivity and specificity relative to the other estimates (96% sensitivity; 96% specificity); and Noel‐Storr’s (Noel‐Storr 2011) measure of Haynes 2004 (Haynes 2004) specific filter with very low sensitivity compared to the other estimates (14% sensitivity; 95% specificity). No apparent reason could be found for these anomalous results.

ROC plot of sensitivity and specificity of MEDLINE search filters from development and evaluation studies.

Ten filters had data on both sensitivity and specificity from the original development study and at least one evaluation study (Figure 3). The estimates from both development and evaluation studies showed a wide range in precision and there was substantial variation in sensitivity in the evaluation studies. Precision was generally lower in the evaluation studies, but the pattern was not uniform. There were a number of outliers amongst both the development study and the evaluation study data points. Three outliers had much higher precision than the other estimates. These were: the original performance estimate of the Haynes 1994 specific filter, the original estimate of Deville 2000 strategy 3, and Mitchell’s (Mitchell 2005) evaluation of Deville 2000 strategy 4. It was not clear why these precision estimates were high.

ROC plot of sensitivity and precision of MEDLINE search filters from development and evaluation studies.

3. Performance of EMBASE filters as reported by development studies

Table 11 shows the 12 filters and their performance data (Bachmann 2003; Wilczynski 2005). Sensitivity ranged from 46% to 100% (median 90%), and precision ranged from 1.2% to 27.7% (median 9%). Half the filters had a sensitivity greater than 90% (median 90.2%), but of these six filters only one had a precision greater than 10 (median 10.4) (Bachmann 2003).

11. Performance of EMBASE filters from development studies.

Author (year) ID	Filter Description	Filter interface	Sensitivity % (95% CI)	Specificity % (95% CI)	Precision % (95% CI)	NNR
Bachmann 2003	sensitiv* OR detect* (specific filter)	Datastar, Ovid and Silverplatter	73.7 (60.9‐84.2)		17.6	5.7 (4.4‐7.6)
	sensitiv* OR detect* OR accura* OR specific* OR reliab* OR positive OR negative OR diagnos*	Datastar, Ovid and Silverplatter	100 (94.1‐100)		3.7	27.0 (21.0‐34.8)
	sensitiv* OR detect* OR accura*	Datastar, Ovid and Silverplatter	85.2		14.2	7.0
	sensitiv* OR detect* OR accura* OR specific*	Datastar, Ovid and Silverplatter	86.9		10.4	9.6
	sensitiv* OR detect* OR accura* OR specific* OR reliab*	Datastar, Ovid and Silverplatter	90.2		10.4	9.6
	sensitiv* OR detect* OR accura* OR specific* OR reliab* OR positive	Datastar, Ovid and Silverplatter	91.8		9.2	10.9
	sensitiv* OR detect* OR accura* OR specific* OR reliab* OR positive OR negative	Datastar, Ovid and Silverplatter	91.8		8.5	11.8
	sensitiv*	Datastar, Ovid and Silverplatter	45.9		27.7	3.6
Wilczynski 2005	Best sensitivity: di.fs OR predict:.tw OR specificity.tw	Ovid	100 (100‐100)	70.4 (69.8‐70.9)	1.2 (0.9‐1.4)
	Small drop in sensitivity with substantive gain in specificity: diagnos:.mp OR predict:.tw OR specificity.tw	Ovid	96.9 (93.5‐100)	78.2 (77.7‐78.7)	1.5 (1.2‐1.8)
	Small drop in specificity with a substantive gain in sensitivity: specificity.tw OR accurac:.tw	Ovid	73.2 (64.4‐82.0)	97.4 (97.2‐97.5)	8.8 (6.9‐10.8)
	Best optimal strategy: sensitiv:.tw OR diagnostic accuracy.sh OR diagnostic.tw	Ovid	89.7 (83.6‐95.7)	91.6 (91.3‐91.9)	3.3 (2.9‐4.4)

Open in a new tab

4. Performance of evaluated EMBASE filters

The original studies reported sensitivities ranging from 74% to 100% (median 90%); evaluation studies reporting on the same filters had sensitivities ranging from 72% to 97% (median 86%). The original studies reported precision ranging from 1.2% to 17.6% (median 3.7%); evaluation studies reporting on the same filters had precision ranging from 1.2% to 9% (median 3.7%) (Table 12). One of the evaluated filters did not have an original estimate of performance from the development study (Ovid 2010). Figure 4 shows that in general filters performed better in the original development studies than in the evaluation studies for both sensitivity and precision. None of the filters offered both high sensitivity (> 90%) and high precision (> 10%). The original development studies did not report specificity estimates for the filters that were also tested in evaluation studies, hence a ROC plot of sensitivity and specificity has not been prepared.

12. Performance of evaluated EMBASE filters.

Filter (original reference)	Author (year) of evaluation study	Description of filter from evaluation paper	Interface filter developed for	Sensitivity %	Precision %	Comments and other measures
PubMed Clinical Queries Ovid 2010	ORIGINAL	sensitiv:.mp. OR diagnos:.mp. OR di.fs.	Ovid	NR	NR
PubMed Clinical Queries Ovid 2010	Kastner 2009	sensitiv:.mp. OR diagnos:.mp. OR di.fs.	Ovid	88
Bachmann 2003 Sensitive	ORIGINAL	sensitiv* OR detect* OR accura* OR specific* OR reliab* OR positive OR negative OR diagnos*	Datastar, Ovid and Silverplatter	100	3.7
	Wilczynski 2005	sensitiv:.tw. OR detect:.tw. OR accura:.tw. OR specific:.tw. OR reliab:.tw. OR positive:.tw. OR negative:.tw. OR diagnos:.tw.	Ovid	97	1.2	Specificity=72.%; Accuracy=72.%
	Mitchell 2005	sensitive* OR detect* OR accura* OR specific* OR reliab* OR positive OR negative OR diagnos*	Ovid	86	4.4	Specificity=60%
Bachmann 2003 Specific	ORIGINAL	sensitiv* OR detect*	Datastar, Ovid and Silverplatter	74	17.6	NNR=5.7
Bachmann 2003 Specific	Mitchell 2005	sensitiv* .tw. OR detect* .tw.	Ovid	79	3.0	Specificity=91%; Accuracy=91%
Wilczynski 2005 Sensitive	ORIGINAL	di.fs OR predict:.tw OR specificity.tw	Ovid	100	1.2	Specificity=70%; Accuracy=71%
Wilczynski 2005 Sensitive	Mitchell 2005	di.fs OR predict* .tw. OR specificity.tw.	Ovid	72	9	Specificity=83%

Open in a new tab

Abbreviations used: NR= not reported

ROC plot of sensitivity and precision of EMBASE search filters from development and evaluation studies.

Discussion

Summary of main results

Nineteen studies, reporting 57 MEDLINE filters and 13 EMBASE filters, were eligible for this review. We pre‐specified that filters should have a sensitivity > 90% and a precision > 10% to be considered acceptable when searching for studies for systematic reviews of diagnostic test accuracy. We acknowledge that other researchers may set alternative performance levels.

Reports of filter performance were available from studies using a variety of designs, ranging from authors’ reports of their filter development process to evaluations of filters carried out by independent researchers using one or more different gold standards. The latter study design should provide best evidence of the performance of filters outside of the original authors’ test environment and the consistency of a filter’s performance across different sets of records.

Several filters reported performance levels in the development studies which met the pre‐specified performance criteria. However, these performance levels typically declined when the filters were validated in the evaluation studies. Thirty MEDLINE filters and four EMBASE filters were tested in an evaluation study against one or more gold standards. In both the evaluation studies that developed their reference set from studies included in several systematic reviews on a broad spectrum of topics, covering a number of publication years, and in those that developed reference sets from heandsearching, no single filter achieved the sensitivity (> 90%) and precision (> 10%) that we pre‐specified as 'acceptable'. This means that no filter is suitable for combination with the search terms for the target condition and index tests to create a single search strategy with which to identify studies for systematic reviews of diagnostic test accuracy.

As well as not reaching our pre‐specified performance criteria, none of the evaluated filters for use in MEDLINE or EMBASE gave consistent sensitivity and precision measures. This may be caused by translation from one platform to another, or from mistakes made in the transcription of the filters. Another reason may be differences between the indexing and reporting of studies from different scientific fields. For these reasons, the degree of reduction in performance cannot be assessed consistently, making the filters unreliable tools for searching when sensitivity is an important consideration.

Overall completeness and applicability of evidence

The search filters were identified by extensive sensitive searches, checking reference lists of published filters and filter evaluations (Horsley 2011), and by searching a key website which identifies and collects search filters: the ISSG Search Filter Resource (InterTASC 2011). We are confident that we have identified the vast majority of published search filters, in particular those filters developed using a research method and those tested by independent researchers.

We did not, however, search for unpublished search filters, such as those which might have been developed by people conducting systematic reviews of diagnostic test accuracy studies. There are likely to be many unpublished filters reported in the search strategies of such reviews. These 'unpublished' filters could be identified and evaluated against gold standard sets of relevant records, in the same way that published filters have been evaluated. However, the evidence from the evaluations of the many published filters developed using research methods that we have compiled in this review suggests that unpublished filters may be subject to the same difficulties in achieving the pre‐specified performance criteria if those filters consist of variants of the search terms used in the published search filters.

Quality of the evidence

The most reliable filter development studies are likely to be those where the authors used handsearched gold standards and tested their filters against internal validation record sets that are different from the record sets used to develop the filters, and externally validated the filters in a real‐world topic. In the one study where this occurred, the MEDLINE filter performance was maintained and had a higher sensitivity (Bachmann 2002).

The nature of the most reliable filter evaluation studies is a matter for debate. Testing filters against a handsearched gold standard set of records would seem to be the most reliable technique because it should yield a range of different DTA study types. However, the disadvantage of handsearching is that researchers are often limited to a small number of journals, which limits the generalisability of the evaluation. Handsearching can be limited by a narrow range of topics and publication years and so impede judgments about the generalisability of the search filters to other topics and time periods. Only two evaluations of MEDLINE filters used handsearched reference sets, which were both topic specific (Deville 2000; Mitchell 2005). In those two studies, some filters maintained their sensitivity as reported in their development papers and others experienced large drops in sensitivity.

Another method of reference set development is to use the studies included in systematic reviews. Whereas handsearching of journals for reference set studies is limited to a small number of journals, using systematic reviews broadens the journal base and the number of publication years covered. However, the primary diagnostic studies in systematic reviews may have been retrieved using a search strategy containing diagnostic terms, which could introduce bias. By including systematic reviews that used a methodological filter to find diagnostic studies, the performance of the evaluated filters in the reference set may be exaggerated. Precision is improved because irrelevant records will be removed but sensitivity may suffer because ‘difficult to find’ studies may not be retrieved by the filter. This was discussed by Leeflang et al who also used reviews to compile their reference set for the evaluation of 12 filters (Leeflang 2006). Only seven of the reviews in the initial set of 28 reviews used in their study reported search terms. If those seven reviews also used one of the search filters evaluated by Leeflang et al, then the results are likely to be overestimated and the real percentage of missed studies could be even higher than reported. Three other evaluation studies used systematic reviews to compile the reference set, and some of these reviews had included a DTA methodological filter in the original search for eligible studies (Doust 2005; Kastner 2009; Vincent 2003).

How the reference set is used can be a source of bias. If the records used to derive the search terms for the filter are also included in the set of references used in the validation process, this can introduce bias by artificially inflating performance. A discrete set of derivation records and validation records should be used to avoid this. Only two MEDLINE development studies (but neither EMBASE development study) used this approach (Astin 2008; Bachmann 2002).

External validation relates to how generalisable (applicable) the results are to searching for diagnostic studies for different systematic review topics, and only applies to development studies. Four MEDLINE studies carried out external validation of their filters in a real‐world setting and were judged to have low concerns about applicability (Bachmann 2002; Deville 2000; Deville 2002; van der Weijden 1997).

The date of the filter may raise concerns. The problem of missed studies is increased in older studies, as shown by Haynes et al whose filter tested in the 1986 reference set did not perform as well as it did in the 1991 reference set. This may be a feature of the reporting of DTA studies. The STARD statement, which was published in 2003, aimed to improve the standard of reporting of DTA studies (Bossuyt 2003). STARD’s first recommendation is that authors should identify their publication as a study of diagnostic accuracy. If authors and editors support STARD, this alone will enhance the efficient retrievability of DTA studies.

There are concerns that the same filter may not have been implemented uniformly across evaluation studies and that this may hamper an evaluation of the consistency of filter performance. Some researchers have translated filters across searching platforms, for example from Ovid MEDLINE to PubMed. The translation process may influence the performance of the filters, although the likely effect of this is unclear. Translations may change the number of missed studies and may impact sensitivity and precision. PubMed, in particular, carries out automatic mapping of search terms and this factor needs to be taken into account when translating from PubMed to other interfaces and when translating a strategy to make it suitable for use in PubMed. An example is the different adaptations made by the Haynes team in translating the original Haynes 1994 sensitive search filter developed in Ovid into the PubMed Clinical Queries sensitive filter, and Falck‐Ytter’s adaptation of the same filter for use in PubMed. Sensitivities reported by the evaluation studies varied between each of the three filters, which may be due to differences in translation. Furthermore, some evaluators report strategies with mistakes; the mistakes might have been made in the conduct of the strategies or might have been introduced at the reporting stage. This uncertainty leads to doubts about the performance data reported for some of the filters, and we were unable to make any judgement about whether the original filters were applied correctly in the evaluation studies.

Potential biases in the review process

It can be difficult to identify the filters reported in evaluation studies because the filters can be named differently and the filter used is not always listed in the paper or appendix (that is only a reference may be provided). In those circumstances, it is unclear whether the strategy was used accurately or whether it was adapted. In some cases the original source of a filter has disappeared because of changes to websites. The Shipley Miner and University of Rochester filters evaluated by Ritchie, Whiting and Vincent are no longer available online and we have to rely on the evaluators for a listing of the strategies, rather than being able to visit the original website. This means that our review may have erroneously assigned some performance data to a named filter or to a filter which is a variant of a published filter.

Agreements and disagreements with other studies or reviews

Many of the search filters included in this systematic review have been extensively evaluated in other studies with different but relevant gold standards. This systematic review of evaluation studies draws the same conclusions as the most comprehensive evaluation study by Whiting and colleagues, which concluded that filtered searches miss additional studies compared with searches based on index test and target condition alone (Whiting 2010). None of the filters evaluated by Whiting provided reductions in Number Needed to Read for acceptable sensitivity and should not be used to identify studies for inclusion in systematic reviews (Whiting 2010). A key strength of the Whiting study is the size and homogeneity in the creation of the reference set; the team used seven systematic reviews published on a broad range of topics that had been conducted by the authors using extensive, rigorous and, for the first time, reproducible search methods without the inclusion of a methodological search filter. The inclusion criteria for each review produced sufficient data to allow cross‐tabulation of results comparing index tests with a reference standard and meant that only true test accuracy studies were included.

Authors' conclusions

Implication for methodological research.

The information retrieval environment is not static and better reporting of DTA studies as advocated by STARD, additional indexing terms (as recently introduced by EMBASE) and more consistent indexing of diagnostic studies could help to make published methodological filters more sensitive or create the opportunity for the development of new filters.

Search filters which make more use of proximity operators and careful exclusion may also yield improvements in performance in traditional database interfaces reliant on Boolean searching. Beyond Boolean approaches, developments in information retrieval such as semantic textual analysis may lead to filtering programs or record matching rules which can better identify diagnostic test accuracy studies from batches of records retrieved by sensitive searches. The increasing availability of full text journals may also improve the retrieval of DTA studies as there will be the whole paper to search and DTA performance measures may be more consistently identified.

In the absence of current suitable search filters, the impact of different search approaches could be investigated. The effectiveness of multi‐strand searching is unexplored. In addition, the yield of restricted searching on the results of the systematic review could be explored. A combination of search approaches where the results of strategies using filters are augmented with more extensive reference checking and citation searching could also be investigated as an alternative approach to identifying as many relevant DTA studies as possible.

What's new

Date	Event	Description
30 November 2011	Amended	Updated the protocol and added an author
3 July 2009	Amended	Updated the protocol with other authors and revised text
27 December 2007	Amended	Converted to new review format

Open in a new tab

History

Protocol first published: Issue 2, 2006  Review first published: Issue 9, 2013

Date	Event	Description
20 February 2007	New citation required and major changes	Substantive amendment

Open in a new tab

Acknowledgements

We thank Marit Johansen for her help in designing the search strategies.

Appendices

Appendix 1. MEDLINE search strategy

MEDLINE ® OvidSP 1950 to week 1 November 2012

1 “Information Storage and Retrieval”/

2 ((information or literature) adj5 retriev$).tw.

3 Databases, Bibliographic/

4 ((bibliographic adj1 database$) or (electronic adj1 database$) or (online adj1 database$)).tw.

5 Medline/

6 PubMed/

7 Medlars/

8 Grateful Med/

9 (medline or pubmed or medlars or grateful‐med or gratefulmed or embase$ or excerpta medica).tw.

10 or/1‐9

11 (search$ adj5 (strateg$ or filter$ or hedge$ or technique$ or term$1)).tw.

12 (retriev$ adj5 (strateg$ or filter$ or hedge$ or technique$)).tw.

13 ((methodology or methodologic$) adj5 (strateg$ or filter$ or hedge$ or search$ or term$1)).tw.

14 (search$ adj5 (precision or recall or accura$ or sensitiv*)).tw.

15 (retriev$ adj5 (precision or recall or accura$ or sensitiv$)).tw.

16 or/11‐15

17 (diagnos$ adj5 (strateg$ or filter$ or hedge$ or search$ or term$1)).tw.

18 exp Diagnosis/

19 diagnos$.tw.

20 "Sensitivity and Specificity"/

21 (sensitiv$ and specific$).tw.

22 or/18‐21

23 10 and 16 and 22

24 10 and 17

25 23 or 24

26 "cochrane database of systematic reviews".so.

27 25 not 26

Appendix 2. EMBASE search strategy

EMBASE OvidSP 1980 to 2012 Week 48

1 Information Retrieval/

2 ((information or literature) adj5 retriev$).tw.

3 Bibliographic Database/

4 ((bibliographic adj1 database$) or (electronic adj1 database$) or (online adj1 database$)).tw.

5 Medline/ or Embase/

6 (medline or pubmed or medlars or grateful‐med or gratefulmed or embase$ or excerpta medica).tw.

7 or/1‐6

8 (search$ adj5 (strateg$ or filter$ or hedge$ or technique$ or term$1)).tw.

9 (retriev$ adj5 (strateg$ or filter$ or hedge$ or technique$)).tw.

10 (search$ adj5 (precision or recall or accura$ or sensitiv*)).tw.

11 (retriev$ adj5 (precision or recall or accura$ or sensitiv$)).tw.

12 ((methodology or methodologic$) adj5 (strateg$ or filter$ or hedge$ or search$ or term$1)).tw.

13 or/8‐12

14 (diagnos$ adj5 (strateg$ or filter$ or hedge$ or search$ or term$1)).tw.

15 exp "Diagnosis, Measurement and Analysis"/

16 diagnos$.tw.

17 "Sensitivity and Specificity"/

18 (sensitiv$ and specific$).tw.

19 or/15‐18

20 7 and 13 and 19

21 7 and 14

22 20 or 21

23 "cochrane database of systematic reviews".so.

24 “cochrane database of systematic reviews (online)”.so.

25 23 or 24

26 22 not 25

Appendix 3. ISI Web of Science search strategy

ISI Web of Science searched 11 January 2013

ISI Web of Science Databases=SCI‐EXPANDED, SSCI, A&HCI, CPCI‐S, CPCI‐SSH, CCR‐EXPANDED, IC Timespan=All Years

# 6 #3 AND #4 AND #5

# 5 #1 OR #2

# 4 TS=diagnos*

# 3 TS=(information retriev* OR literature retriev* OR bibliographic database OR  medline OR pubmed OR medlars OR grateful med OR gratefulmed OR embase* OR  psycinfo)

# 2 TS=(retriev* same (strateg* OR filter* OR hedge* OR technique*))

# 1 TS=(search* same (strateg* OR filter* OR hedge* OR technique* OR term*))

Appendix 4. PsycINFO search strategy

PsycINFO (OvidSP) searched 13 March 2013

1. exp Automated Information Retrieval/

2. Databases/

3. Information Seeking/

4. Computer Searching/

5. ((information or literature) adj2 retriev$).tw.

6. ((bibliographic adj1 database?) or (electronic adj1 database?)).tw.

7. (medline or pubmed or medlars or grateful med or gratefulmed or embase$ or excerpta medica).tw.

8. psycinfo.ti.

9. psycinfo.ab. /freq=2

10. or/1‐9

11. (search$ adj2 (strateg$ or filter$ or hedge? or technique? or term$1)).tw.

12. (retriev$ adj2 (strateg$ or filter$ or hedge? or technique?)).tw.

13. (sensitiv$ or specific$ or recall or precision or precise or number needed to read or NNR).tw.

14. or/11‐13

15. Diagnosis/

16. diagnos$.tw.

17. or/15‐16

18. and/10,14,17

Appendix 5. Library, Information Science and Technology Abstracts (LISTA) search strategy

Library, Information Science and Technology Abstracts (LISTA) strategy searched 13 March 2013

S41 S37 or S40

S40 S16 and S29 and S39

S39 S28 or S38

S38 S30 or S31 or S32 or S33 or S34 or S35

S37 S16 and S28 and S36

S36 S29 or S30 or S31 or S32 or S33 or S34 or S35

S35 NNR

S34 "number needed to read"

S33 precision

S32 recall

S31 specificity

S30 sensitivity

S29 diagnos*

S28 (S17 or S18 or S19 or S20 or S21 or S22 or S23 or S24 or S25 or S26 or S27)

S27 retriev* N2 techniqu*

S26 retriev* N2 hedge*

S25 retriev* N2 filter*

S24 retriev* N2 strateg*

S23 search* N2 terms

S22 search* N2 term

S21 search* N2 techniqu*

S20 search* N2 hedge*

S19 search* N2 filter*

S18 search* N2 strateg*

S17 DE Search Algorithms

S16 (S1 or S2 or S3 or S4 or S5 or S6 or S7 or S8 or S9 or S10 or S11 or S12 or S13 or S14 or S15)

S15 medline OR pubmed or medlars or "grateful med" or gratefulmed or embase* or "excerpta medica"

S14 DE Electronic Information Resources

S13 DE Bibliographic Databases

S12 DE Databases

S11 DE PubMed

S10 DE EMBASE

S9 DE MEDLINE

S8 DE "Information Storage & Retrieval Systems"

S7 information N2 search*

S6 literature N2 search*

S5 literature N2 retriev*

S4 information N2 retriev*

S3 DE "electronic information resource searching"

S2 DE "database searching"

S1 DE "information retrieval"

Appendix 6. Cochrane Methodology Register search strategy

Cochrane Methodology Register 2012, Issue 3 in The Cochrane Library (Wiley InterScience Online)

#1 ("diagnostic test accuracy" NEXT "search strategies"):kw in Methods Studies

#2 ("study identification" next general) or ("study identification" next "prospective registration") or ("study identification" next "internet") or ("information retrieval" next general) or ("information retrieval" next "retrieval techniques") or ("information retrieval" next "comparisons of methods") or ("information retrieval" next indexing):kw in Methods Studies

#3 search*:ti NEAR/5 (strateg* or filter* or hedge* or technique* or term or terms or precision or recall or accura*):ti in Methods Studies

#4 retriev*:ti NEAR/5 (strateg* or filter* or hedge* or technique* or term or terms or precision or recall or accura*):ti in Methods Studies

#5 search*:ab NEAR/5 (strateg* or filter* or hedge* or technique* or term or terms or precision or recall or accura*):ab in Methods Studies

#6 retriev*:ab NEAR/5 (strateg* or filter* or hedge* or technique* or term or terms or precision or recall or accura*):ab in Methods Studies

#7 methodology:ti NEAR/5 (strateg* or filter* or hedge* or term or terms):ti in Methods Studies

#8 methodologic*:ti NEAR/5 (strateg* or filter* or hedge* or term or terms):ti in Methods Studies

#9 methodology:ab NEAR/5 (strateg* or filter* or hedge* or term or terms):ab in Methods Studies

#10 methodologic*:ab NEAR/5 (strateg* or filter* or hedge* or term or terms):ab in Methods Studies

#11 (medline or pubmed or medlars or "grateful med" or gratefulmed or embase* or excerpta medica):ti in Methods Studies

#12 (medline or pubmed or medlars or "grateful med" or gratefulmed or embase* or excerpta medica):ab in Methods Studies

#13 (diagnos* or sensitiv* or specific*):ti in Methods Studies

#14 (diagnos* or sensitiv* or specific*):ab in Methods Studies

#15 (#2 OR #3 OR #4 OR #5 OR #6 OR #7 OR #8 OR #9 OR #10) AND (#11 OR #12)

#16 (#2 OR #3 OR #4 OR #5 OR #6 OR #7 OR #8 OR #9 OR #10) AND (#13 OR #14)

#17 diagnos*:ti NEAR/5 (strateg* or filter* or hedge* or search* or term or terms):ti in Methods Studies

#18 diagnos*:ab NEAR/5 (strateg* or filter* or hedge* or search* or term or terms):ab in Methods Studies

#19 (#17 OR #18)

#20 (#1 OR #15 OR #16 OR #19)

Appendix 7. Library and Information Science Abstracts (LISA) search strategy

LISA: Library and Information Science Abstracts (Cambridge Scientific Abstracts) ‐ searched 31 May 2010

(((DE=("databases" or "bibliographic databases" or "cd rom

databases" or "database producers" or "online databases" or "computerized

information retrieval" or "multiple database searches" or "online

information retrieval")) or (TI=((literature or information) within 2

retriev*)) or (AB=((literature or information) within 2 retriev*))

or (TI=((bibliographic or electronic) within 2 database*))

or (AB=((bibliographic or electronic) within 2 database*)) or (TI=(medline

or medlars or pubmed or grateful med or gratefulmed or embase* or

excerpta medica)) or (AB=(medline or medlars or pubmed or grateful med or

gratefulmed or embase* or excerpta medica))) and ((DE=("search strategies"

or "searching" or "boolean strategies" or "non boolean strategies" or

"term selection" or "free text searching" or "full text searching" or

"ranking")) or (DE=("boolean strategies" or "non boolean strategies"))

or (TI=(search* within 2 (strateg* or filter? or hedge? or technique? or

term?))) or (AB=(search* within 2 (strateg* or filter? or hedge? or

technique? or term?))) or (TI=(retriev* within 2 (strateg* or filter? or

hedge? or technique?))) or (AB=(retriev* within 2 (strateg* or filter? or

hedge? or technique?))))) and ((TI=diagnos* or AB=diagnos*)

or (DE=("recall" or "retrieval performance measures" or "exhaustivity" or

"pertinence" or "relevance")) or (DE="retrieval performance measures")

or (TI=(sensitivity or specificity or recall or precision or accuracy or

(number within3 read))) or (AB=(sensitivity or specificity or recall or

precision or accuracy or (number within3 read))))

Appendix 8. Definition of terms used in this review

Accuracy – proportion of all articles correctly categorised

Development study – a study which aims to develop and test a search strategy for locating diagnostic test accuracy studies

Diagnostic odds ratio – positive likelihood ratio/negative likelihood ratio

Diagnostic test accuracy study – a study which compares the results of the test of interest, the index test, to those of a reference standard, which should be the best available method of determining disease status

Evaluation study – a study which quantitatively evaluates existing search strategies for locating diagnostic test accuracy studies

Gold standard record – a record included in the reference set that meets the criteria for a diagnostic test accuracy study

Non‐gold standard record – a record included in the reference set that does not meet the criteria for a diagnostic test accuracy study

Number Needed to Read – the number of articles needed to read to identify one relevant article, calculated as 1 divided by precision

Positive likelihood ratio – the proportion of the probability of a true positive record to the false positive records

Precision/positive predictive value – proportion of retrieved records meeting diagnostic test criteria – proportion of gold standard records in the result set

Reference set – compilation of records which can be used to derive terms for search filter development and test the performance of search filters. The reference set can be composed of gold standard and non‐gold standard records, or gold standard records alone

Sensitivity – percentage of correctly identified gold standard studies

Specificity – percentage of correctly non‐identified studies.

	Reference set
Search terms	Gold standard records	Non‐gold standard
Detected	a	b
Not detected	c	d

Open in a new tab

Sensitivity = a/(a + c); precision = a/(a + b); specificity = d/(b + d); accuracy = (a + d)/(a + b + c + d). All included and excluded references in gold standard = (a + b + c + d)

Appendix 9. MEDLINE filters with full strategies as used by evaluation studies

Filter (original reference)	Author (year) of evaluation study	Description of filter as appears in evaluation study	Interface used by evaluation study	Sensitivity (95% CI)	Precision (95% CI)	Comments
Bachmann 2002 Sensitive	ORIGINAL	exp sensitivity and specificity or predict$ or diagnos$ or di.fs. or du.fs. or accura$
	Ritchie 2007	NR	Ovid	74	1.36
	Leeflang 2006	"sensitivity and specificity"[MeSH] OR predict* OR diagnose* OR diagnosi* OR diagnost* OR accura*	PubMed	88
	Doust 2005	Sensitivity and specificity [MeSH]  predict* [tw]  diagnos* [tw]  accura* [tw]	Datastar, Ovid, PubMed, Silverplatter	70	5	Methodological & content filter for TSR
				90	4	Methodological & content filter for NPSR
				88		Methodological filter for TSR
				90		Methodological filter for NPSR
	Whiting 2010	Exp "sensitivity and specificity"/  Diagnos$ OR di.fs. or du.fs.  Predict$  Accura$	Ovid	87 (81‐98)	3 (1‐22)	NNR 36 (4‐98)
	Noel‐Storr 2011	NR	Ovid	84 (77‐90)	0.17 (0.14‐0.20)
	Mitchell 2005	1. exp "Sensitivity and Specificity"/ 2. predict$.tw. 3. diagnos$.tw. 4. di.fs. 5. du.fs. 6. accura$.tw. 7. or/1‐6	Ovid	84	8.8	Strategy from Table 3
Haynes 2004 Sensitive	ORIGINAL	sensitiv:.mp. OR diagnos:.mp. OR di.fs.
	Ritchie 2007	NR	Ovid	69	1.3
	Leeflang 2006	sensitiv[Title/Abstract] OR sensitivity and specificity[MeSH Terms] OR diagnos[Title/Abstract] OR diagnosis[MeSH:noexp] OR diagnostic * [MeSH:noexp] OR diagnosis, differential[MeSH:noexp] OR diagnosis[Subheading:noexp]	PubMed	87
	Whiting 2010	Sensitive$.ti,ab. "sensitivity and specificity"/ Diagnos$.ti,ab. Diagnosis/ Diagnostic$.hw. Diagnosis, Differential/ di.fs.	Ovid	82	3	NNR 36
	Noel‐Storr 2011	Sensitive$.ti,ab. "sensitivity and specificity"/ Diagnos$.ti,ab. Diagnosis/ Diagnostic$.hw. Diagnosis, Differential/ di.fs.	Ovid	69 (60‐77)	0.92 (0.74‐1.10)
	Mitchell 2005	1. sensitiv$.mp. 2. diagnos$.mp. 3. di.fs. 4. or/1‐3.	Ovid	67	9.1
	Kastner 2009	sensitiv:.mp. OR diagnos:.mp. OR di.fs.	Ovid	88
	Doust 2005	sensitiv:.mp. OR diagnos:.mp. OR di.fs.	Ovid	100		Methodological filter for NPSR
				100	5	Methodological & content filter for NPSR
				88		Methodological filter for TSR
				70	4	Methodological & content filter for NPSR
Haynes 2004 Specific	ORIGINAL	Specifity.tw.
	Ritchie 2007	NR	Ovid	21	6.7
	Whiting 2010	Specificity.ti,ab.	Ovid	43	15	NNR 7
	Noel‐Storr 2011	Specificity.ti,ab.	Ovid	14 (9‐21)	2.04 (1.22‐3.21)
Haynes 1994 Accurate	ORIGINAL	Exp Sensitivity a#d Specificity Or Diagnosis (sh) Or Diagnostic Use (sh) Or Specificity (tw) Or Predicitive (tw) and Value: (tw)
Haynes 1994 Accurate	Leeflang 2006	‘‘sensitivity and specificity’’[MeSH] OR ‘‘Diagnosis’’[MeSH] OR ‘‘diagnostic use’’[subheading] OR specificity[tw] OR (predictive[tw] AND value[tw])	PubMed	81
Haynes 1994 Specific	ORIGINAL	Exp Sensitivity a#d  Specificity  OR Predictive (tw) AND Value: (tw)
	Ritchie 2007	NR	Ovid	33	7.4
	Leeflang 2006	‘‘sensitivity and specificity’’[MeSH] OR (predictive[tw] AND value[tw])	PubMed	29
	Whiting 2010	exp "sensitivity and specificity"/ (predictive and value$).ti,ab.	Ovid	56	11	NNR 9
	Noel‐Storr 2011	exp "sensitivity and specificity"/ (predictive and value$).ti,ab.	Ovid	51 (42‐60)	3.04 (2.36‐3.86)
Haynes 1994 Sensitive	ORIGINAL	Exp Sensitivity a#d Specificity  or Diagnosis& (px)  or Diagnostic Use (sh)  or Sensitivity (tw)  or Specificity (tw)
	Ritchie 2007	NR	Ovid	70	1.5
	Leeflang 2006	"sensitivity and specificity"[MeSH] OR diagnosis[subheading:noexp] OR "diagnostic use"[subheading] OR sensitivity[tw] OR specificity[tw]	PubMed	81
	Kassai 2006	NR	PubMed	95
	Whiting 2010	exp "sensitivity and specificity"/ di.xs. Du.fs. Sensitivity.ti,ab. Specificity.ti,ab.	Ovid	87	2	NNR 45
	Vincent 2003	1 exp ‘sensitivity and specificity’/ 2 sensitivity.tw. 3 di.fs. 4 du.fs. 5 specificity.tw. 6 or/1‐5	NR	96
	Deville 2000	Sensitivity and specificity (exploded) (sh) Diagnosis& (sh) Diagnostic use (sh) Sensitivity (tw) Specificity (tw)	NR	73 (63‐8)		Specificity=94.3 (93.3‐95.2); DOR=45
	Noel‐Storr 2011	exp "sensitivity and specificity"/ di.xs. Du.fs. Sensitivity.ti,ab. Specificity.ti,ab.	Ovid	91 (84‐95)	0.98 (0.80‐1.17)
	Mitchell 2005	1. exp "Sensitivity and Specificity"/ 2. di.xs. 3. du.fs. 4. sensitivity.tw. 5. specificity.tw. 6. or/1‐5	Ovid	80	5.3
Falck‐Ytter 2004	ORIGINAL	sensitiv:.tw. or exp "sensitivity and specificity"/ or diagnos:.tw,ot,hw,rw. or (di or du).fs.
	Ritchie 2007	NR	Ovid	74	1.3
	Whiting 2010	Sensitive:.tw.  exp "sensitivity and specificity"/  Diagnos:.tw,ot,hw,rw.  (di or du).fs.	Ovid	85 (80‐93)	3 (1‐19)	NNR 36 (5‐106)
	Noel‐Storr 2011	Sensitive:.tw.  exp "sensitivity and specificity"/  Diagnos:.tw,ot,hw,rw.  (di or du).fs.	Ovid	71 (62‐79)	1.06 (0.86‐1.31)
Deville 2000 Strategy 1	ORIGINAL	sensitivity and specificity (exploded)(sh) specificity (tw)
Deville 2000 Strategy 1	Kassai 2006	NR	PubMed	75.5
Deville 2000 Strategy 3	ORIGINAL	sensitivity and specificity (exploded)(sh) Specificity (tw) False negative (tw) Accuracy (tw)
Deville 2000 Strategy 3	Leeflang 2006	"sensitivity and specificity"[MeSH] OR specificity[tw] OR false negative[tw] OR accuracy [tw]	PubMed	41
Deville 2000 Strategy 4	ORIGINAL	sensitivity and specificity (exploded) (sh)  specificity (tw)  false negative (tw)  accuracy (tw)  screening (tw)
	Ritchie 2007	NR	Ovid	46	4.4
	Leeflang 2006	‘‘sensitivity and specificity’’[MeSH] OR specificity[tw] OR false negative[tw] OR accuracy[tw] OR screening[tw]	PubMed	46
	Doust 2005	Sensitivity and specificity [MeSH]  Specificity [tw]  False negative [tw]  Accuracy [tw]  Screening [tw]	Ovid	58	9	Methodological & content filter for TSR
				100	9	Methodological & content filter for NPSR
				100		Methodological filter for NPSR
				73		Methodological filter for TSR
	Whiting 2010	exp "sensitivity and specificity"/ Specificity.ti,ab. False negative.ti,ab. Accuracy.ti,ab. Screening.ti,ab.	Ovid	68	7	NNR 14
	Vincent 2003	1 exp sensitivity and specificity/ 2 specificit$.tw. 3 false negative$.tw. 4 Accuracy.tw. 5 screening.tw. 6 or/1‐5	NR	75		Authors say they tested the Deville specific strategy; however they have listed Deville sensitive strategy in the appendix.
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Specificity.ti,ab. False negative.ti,ab. Accuracy.ti,ab. Screening.ti,ab.	Ovid	55 (46‐64)	2.20 (1.70‐2.77)
	Mitchell 2005	1. exp "Sensitivity and Specificity"/ 2. specificity.tw. 3. false negative.tw. 4. accuracy.tw. 5. screening.tw. 4. or/1‐5	Ovid	49	16.7
Deville 2002 Extended	ORIGINAL	(((((((((((("sensitivity and specificity"[All Fields] OR "sensitivity and specificity/standards"[All Fields]) OR "specificity"[All Fields]) OR "screening"[All Fields]) OR "false positive"[All Fields]) OR "false negative"[All Fields]) OR "accuracy"[All Fields]) OR (((("predictive value"[All Fields] OR "predictive value of tests"[All Fields]) OR "predictive value of tests/standards"[All Fields]) OR "predictive values"[All Fields]) OR "predictive values of tests"[All Fields])) OR (("reference value"[All Fields] OR "reference values"[All Fields]) OR"reference values/  standards"[All Fields])) OR ((((((((((("roc"[All Fields] OR "roc analyses"[All Fields]) OR "roc analysis"[All Fields]) OR "roc and"[All Fields]) OR "roc area"[All Fields]) OR "roc auc"[All Fields]) OR "roc characteristics"[All Fields]) OR "roc curve"[All Fields]) OR "roc curve method"[All Fields]) OR "roc curves"[All Fields]) OR "roc estimated"[All Fields]) OR "roc evaluation"[All Fields])) OR "likelihood ratio"[All Fields]) AND notpubref [sb]) AND "human"[MeSH Terms])
	Ritchie 2007	NR	Ovid	52	3.9
	Doust 2005	(((((((((((("sensitivity and specificity"[All Fields] OR "sensitivity and specificity/standards"[All Fields]) OR "specificity"[All Fields]) OR "screening"[All Fields]) OR "false positive"[All Fields]) OR "false negative"[All Fields]) OR "accuracy"[All Fields]) OR (((("predictive value"[All Fields] OR "predictive value of tests"[All Fields]) OR "predictive value of tests/standards"[All Fields]) OR "predictive values"[All Fields]) OR "predictive values of tests"[All Fields])) OR (("reference value"[All Fields] OR "reference values"[All Fields]) OR"reference values/  standards"[All Fields])) OR ((((((((((("roc"[All Fields] OR "roc analyses"[All Fields]) OR "roc analysis"[All Fields]) OR "roc and"[All Fields]) OR "roc area"[All Fields]) OR "roc auc"[All Fields]) OR "roc characteristics"[All Fields]) OR "roc curve"[All Fields]) OR "roc curve method"[All Fields]) OR "roc curves"[All Fields]) OR "roc estimated"[All Fields]) OR "roc evaluation"[All Fields])) OR "likelihood ratio"[All Fields]) AND notpubref [sb]) AND "human"[MeSH Terms])	WebSpirs	58	8	Methodological & content filter for TSR
				100	6	Methodological & content filter for NPSR
				100		Methodological filter for NPSR
				76		Methodological filter for TSR
	Whiting 2010	“sensitivity and specificity”.mp. “sensitivity and specificity”/st Specificity.mp. Screening.mp. False positive.mp. False negative.mp. Accuracy.mp. Predictive value.mp. Predictive values.mp. Reference value.mp. Reference values.mp. Roc.mp. Likelihood ratio.mp. Humans/	Ovid	71	7	NNR 15
	Noel‐Storr 2011	“sensitivity and specificity”.mp. “sensitivity and specificity”/st Specificity.mp. Screening.mp. False positive.mp. False negative.mp. Accuracy.mp. Predictive value.mp. Predictive values.mp. Reference value.mp. Reference values.mp. Roc.mp. Likelihood ratio.mp. Humans/	Ovid	60 (51‐69)	1.99 (1.57‐2.47)
Deville 2002a Accurate	ORIGINAL	1. sensitivity and specificity[Mesh; exploded] 2. mass screening [Mesh; exploded] 3. reference values [Mesh] 4. false positive reactions [Mesh] 5. false negative reactions [Mesh] 6. specificit$.tw 7. screening.tw 8. false positive$.tw 9. false negative$.tw 10. accuracy.tw 11. predictive value$.tw 12. reference value$.tw 13. roc$.tw 14. likelihood ratio$.tw or/1‐14
Deville 2002a Accurate	Leeflang 2006	‘‘Sensitivity and Specificity’’[MeSH] OR ‘‘mass screening’’[MeSH] OR ‘‘Reference values’’[MeSH] OR specificit[tw] OR screening[tw] OR false positive[tw] OR false negative[tw] OR accuracy[tw] OR predictive value[tw] OR reference value[tw] OR roc[tw] OR likelihood ratio*[tw]	PubMed	51
Vincent 2003 Strategy A	ORIGINAL	1. exp 'sensitivity and specificity'/ 2. (sensitivity or specificity or accuracy).tw. 3. ((predictive adj3 value$) or (roc adj curve$)).tw. 4. ((false adj positiv$) or (false negativ$)).tw. 5. (observer adj variation$) or (likelihood adj3 ratio$)).tw. 6. likelihood function/ 7. exp mass screening/ 8. diagnosis, differential/ or exp Diagnostic errors/ 9. di.xs or du.fs 10. or/1‐9
	Ritchie 2007	NR	Ovid	87	3.3
	Mitchell 2005	1. exp “Sensitivity and Specificity”/ 2. (sensitivity or specificity or accuracy).tw. 3. ((predictive adj3 value$) or (roc adj curve$)).tw. 4. ((false adj positiv$) or (false negativ$)).tw. 5. (observer adj variation$) or (likelihood adj3 ratio$)).tw. 6. Likelihood Function/ 7. exp Mass Screening/ 8. Diagnosis, Differential/ or exp Diagnostic Errors/ 9. di.xs or du.fs 10. or/1‐9	Ovid	81	5.5
Vincent 2003 Strategy C	ORIGINAL	1. exp ‘sensitivity and specificity’/ 2. sensitivity.tw. or specificity.tw. 3. (predictive adj3 value$).tw. 4. exp Diagnostic errors/ 5. ((false adj positive$) or (false adj negative$)).tw. 6. (observer adj variation$).tw. 7. (roc adj curve$).tw. 8. (likelihood adj3 ratio$).tw. 9. likelihood function/ 10. exp venous thrombosis/di, ra, ri, us 11. exp thrombophlebitis/di, ra, ri, us 12. or/1‐11
	Leeflang 2006	‘‘sensitivity and specificity’’[MeSH] OR sensitivity[tw] OR specificity[tw] OR predictive value[tw] OR false positiv[tw] OR false negativ[tw] OR observer variation[tw] OR roc curve[tw] OR likelihood ratio[tw] OR ‘‘Likelihood Functions’’[MeSH]	PubMed	44
	Whiting 2010	exp "sensitivity and specificity"/ Sensitivity.tw. Specificity.tw. (predictive adj3 value$).tw. Exp diagnostic errors/ (false adj positiv$).tw. (false adj negativ$).tw. (observer adj variation$).tw. (roc adj curve$).tw. (likelihood adj3 ratio$).tw. Likelihood functions/	Ovid	67	9	NNR 12
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Sensitivity.tw. Specificity.tw. (predictive adj3 value$).tw. Exp diagnostic errors/ (false adj positiv$).tw. (false adj negativ$).tw. (observer adj variation$).tw. (roc adj curve$).tw. (likelihood adj3 ratio$).tw. Likelihood functions/	Ovid	54 (45‐63)	2.30 (1.79‐2.89)
van der Weijden 1997 Sensitive	ORIGINAL	MeSH terms explode DIAGNOSIS/all.s explode SENSITIVITY‐AND‐SPECIFICITY REFERENCE‐VALUES/all.s FALSE‐NEGATIVE‐REACTIONS/all.s FALSE‐POSITIVE‐REACTIONS/all.s explode MASS‐SCREENING/all.s Freetext terms diagnos* sensitivity or specificity predictive value* reference value* ROC* Likelihood ratio* monitoring
	Leeflang 2006	"Diagnosis"[MeSH] OR "sensitivity and specificity"[MeSH] OR "Reference values"[MeSH] OR "False Positive Reactions"[MeSH] OR "False Negative Reactions"[MeSH] OR "Mass Screening"[MeSH] OR diagnos* OR sensitvity OR specificity OR predictive value* OR reference value* OR ROC* OR likelihood ratio* OR monitoring	PubMed	92
	Doust 2005	Diagnosis [subheading]  Sensitivity and Specificity [MeSH]  Sensitivity [tw]  Specificity [tw]  Diagnosis differential [MeSH]  Reference values [MeSH]  False negative reactions [MeSH]  False positive reactions [MeSH]  Mass screening [MeSH]  diagnos* [tw]  predictive value [tw]  reference value* [tw]  ROC* [tw]	CD‐ROM Ovid	Error noted in strategy – original does not include Diagnosis differential [MeSH] and Doust has omitted to add likelihood ratio* and monitoring textwords
				73	4	Methodological & content filter for TSR
				100	4	Methodological & content filter for NPSR
				91		Methodological filter for TSR
				100		Methodological filter for NPSR
	Whiting 2010	Exp diagnosis/ exp "sensitivity and specificity"/ Reference values/ False negative reactions/ False positive reactions/ Exp Mass screening/ Diagnos$.ti,ab. Sensitivity.ti,ab. Specificity.ti,ab. Predictive value$.ti,ab. Reference value$.ti,ab. Roc$.ti,ab. Likelihood ratio$.ti,ab. Monitoring.ti,ab.	Ovid	87	2	NNR 50
	Noel‐Storr 2011	Exp diagnosis/ exp "sensitivity and specificity"/ Reference values/ False negative reactions/ False positive reactions/ Exp Mass screening/ Diagnos$.ti,ab. Sensitivity.ti,ab. Specificity.ti,ab. Predictive value$.ti,ab. Reference value$.ti,ab. Roc$.ti,ab. Likelihood ratio$.ti,ab. Monitoring.ti,ab.	Ovid	93 (87‐97)	0.98 (0.80‐1.17)
	Mitchell 2005	exp Diagnosis/ exp "Sensitivity and Specificity"/ Reference Values/ False Negative Reactions/ False Positive Reactions/ exp Mass Screening/ diagnos$.ti,ab. sensitivity.ti,ab. specificity.ti,ab. predictive value$.ti,ab. reference value$.ti,ab. roc$.ti,ab. likelihood ratio$.ti,ab. monitoring.ti,ab.	Ovid	96	5.6
CASP 2002^$	ORIGINAL	sensitivity‐specificity (s) sensitivity (t) di.fs. ri.fs du.fs specificity ( t)
	Ritchie 2007	NR	Ovid	73	1.2
	Kassai 2006	NR	PubMed	95
	Whiting 2010	“sensitivity and specificity”/  Sensitivity.ti,ab.  di.fs.  Ri.fs.  Du.fs.  Specificity.ti,ab.	Ovid	83 (78‐95)	3 (1‐24)	NNR 29 (4‐89)
	Vincent 2003	1 exp ‘sensitivity and specificity/ 2 sensitivity.tw. 3 di.xs. 4 du.fs. 5 specificity.tw. 6 or/1‐5	NR	100
	Noel‐Storr 2011	“sensitivity and specificity”/  Sensitivity.ti,ab.  di.fs.  Ri.fs.  Du.fs.  Specificity.ti,ab.	Ovid	67 (58‐75)	0.97 (0.77‐1.19)
InterTASC 2011 Aberdeen^$	ORIGINAL	MeSH   Exp sensitivity and specificity/  False positive reactions/  False negative reactions/  Du.fs  Text words .tw   Sensitivity  Distinguish$ Differentiat$  enhancement  Predictive adj4 value$  Identif$  Detect$  Diagnos$  Compar$
	Ritchie 2007	NR	Ovid	69	1.2
	Whiting 2010	exp "sensitivity and specificity"/  False positive reactions/  False negative reactions/  Du.fs.  Sensitivity.tw.  (Predictive adj4 value$).tw.  Distinguish$.tw. Differential$.tw. Enhancement.tw.  Identif$.tw.  Detect$.tw.  Diagnos$.tw.  Compare$.t	Ovid	86 (81‐94)	3 (1‐19)	NNR 35 (5‐97)
	Noel‐Storr 2011	exp "sensitivity and specificity"/  False positive reactions/  False negative reactions/  Du.fs.  Sensitivity.tw.  (Predictive adj4 value$).tw.  Distinguish$.tw. Differential$.tw. Enhancement.tw.  Identif$.tw.  Detect$.tw.  Diagnos$.tw.  Compare$.t	Ovid	87 (80‐92)	0.95 (0.78‐1.14)
InterTASC 2011 Southampton A^$ Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/  False positive reactions/  False negative reactions  Exp diagnosis/  Reference‐values  Exp mass screening/  Text words   Diagnos*  Sensitivity  Specificity  ‘sensitivity and specificity’  predictive value*  Reference value*  Roc  Roc in AD (NOT)  Likelihood ratio*  Monitoring
	Ritchie 2007	NR	Ovid	71	1.0
	Whiting 2010	Exp diagnosis/ exp "sensitivity and specificity"/ Reference values/ False negative reactions/ False positive reactions/ Exp mass screening/ Diagnos$.mp. Sensitivity.mp. Specificity.mp. Predictive value$.mp. Reference value$.mp. Roc.mp. NOT roc.in. Likelihood ratio$.mp. Monitoring.mp.	Ovid	86	2	NNR 51
	Noel‐Storr 2011	Exp diagnosis/ exp "sensitivity and specificity"/ Reference values/ False negative reactions/ False positive reactions/ Exp mass screening/ Diagnos$.mp. Sensitivity.mp. Specificity.mp. Predictive value$.mp. Reference value$.mp. Roc.mp. NOT roc.in. Likelihood ratio$.mp. Monitoring.mp.	Ovid	93 (87‐97)	0.96 (0.80‐1.15)
InterTASC 2011 Southampton B^$ Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/  Text words   Specificity  False negative  Accuracy  screening
	Ritchie 2007	NR	Ovid	45	4.6
	Whiting 2010	exp "sensitivity and specificity"/ Specificity.mp. False negative.mp. Accuracy.mp. Screening.mp.	Ovid	69	7	NNR 14
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Specificity.mp. False negative.mp. Accuracy.mp. Screening.mp.	Ovid	55 (46‐64)	2.09 (1.63‐2.63)
InterTASC 2011 Southampton C^$ Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/  Text words ti,ab,mesh   Predictive and value
	Ritchie 2007	NR	Ovid	31	8.5
	Whiting 2010	exp "sensitivity and specificity"/ (Predictive and value$).ti,ab,sh.	Ovid	56	11	NNR 9
	Noel‐Storr 2011	exp "sensitivity and specificity"/ (Predictive and value$).ti,ab,sh.	Ovid	51 (42‐60)	3.04 (2.36‐3.86)
InterTASC 2011 Southampton D^$ Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/  Exp diagnosis/  Exp pathology/  Text words   Sensitivity  Specificity
	Ritchie 2007	NR	Ovid	66	1.1
	Whiting 2010	exp "sensitivity and specificity"/ Exp diagnosis/ Exp pathology/ Sensitivity.mp. Specificity.mp.	Ovid	84	2	NNR 48
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Exp diagnosis/ Exp pathology/ Sensitivity.mp. Specificity.mp.	Ovid	89 (82‐94)	1.13 (0.93‐1.35)
InterTASC 2011 Southampton E^$ Unclear how terms combined	ORIGINAL	MeSH   Exp Diagnosis/  Exp sensitivity and specificity  False positive reactions/  False negative reactions/  Text words ti,ab   Diagnos$ ti,ab hw  Specificit$  Sensitivit$  Predictive value$  Roc  Sroc  Receiver operat$ charactristic$  Receiver oprat$ adj2 curve  False positiv$  False negative$  accuracy
	Ritchie 2007	NR	Ovid	71	1.0
	Whiting 2010	Exp diagnosis/ exp "sensitivity and specificity"/ False positive reactions/ False negative reactions/ Diagnos$.ti,ab,hw. Specificit$.ti,ab. Sensitivit$.ti,ab. Predictive value$.ti,ab. Roc.ti,ab. Sroc.ti,ab. Receiver operat$ characteristic$.ti,ab. (Receiver operat$ adj2 curve).ti,ab False positiv$.ti,ab. False negative$.ti,ab. Accuracy.ti,ab.	Ovid	87	2	NNR 50
	Noel‐Storr 2011	Exp diagnosis/ exp "sensitivity and specificity"/ False positive reactions/ False negative reactions/ Diagnos$.ti,ab,hw. Specificit$.ti,ab. Sensitivit$.ti,ab. Predictive value$.ti,ab. Roc.ti,ab. Sroc.ti,ab. Receiver operat$ characteristic$.ti,ab. (Receiver operat$ adj2 curve).ti,ab False positiv$.ti,ab. False negative$.ti,ab. Accuracy.ti,ab.	Ovid	92 (86‐96)	0.98 (0.81‐1.17)
InterTASC 2011 CRD A Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/ all subheadings  Exp diagnostic errors/ all subheadings  Text Words .ti,ab   Predictive value*  Reproducibility  Logistic regression  Ability near predict*  Logistic model*  Sroc  Roc  Positive rate  Positive rates  Likelihood ratio*  Negative rate  Negative rates  Receiver operating characteristic  Correlation  Correlated  Test or tests near accuracy  Curve  Curves  Test outcome  Pretest probabilities  Posttest probabilities  Roc‐curve.mp  Logistic‐models.mp  Likelihood‐functions.mp  diagnosis
	Ritchie 2007	NR	Ovid	53	2.2
	Whiting 2010	exp "sensitivity and specificity"/ Exp diagnostic errors/ Predictive value$.ti,ab. Reproducibility.ti,ab. Logistic regression.ti,ab. (ability adj5 predict$).ti,ab. Logistic model$.ti,ab. Sroc.ti,ab. Roc.ti,ab. Positive rate.ti,ab. Positive rates.ti,ab. Likelihood ratio$.ti,ab. Negative rate.ti,ab. Negative rates.ti,ab. Receiver operating characteristic.ti,ab. correlation.ti,ab. correlated.ti,ab. ((test or tests) adj5 accuracy).ti,ab. curve.ti,ab. curves.ti,ab. Test outcome.ti,ab. Pretest probabilities.ti,ab. Posttest probabilities.ti,ab. Roc curve.mp. Logistic models.mp. Likelihood functions.mp. diagnosis.ti,ab.	Ovid	73	4	NNR 26
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Exp diagnostic errors/ Predictive value$.ti,ab. Reproducibility.ti,ab. Logistic regression.ti,ab. (ability adj5 predict$).ti,ab. Logistic model$.ti,ab. Sroc.ti,ab. Roc.ti,ab. Positive rate.ti,ab. Positive rates.ti,ab. Likelihood ratio$.ti,ab. Negative rate.ti,ab. Negative rates.ti,ab. Receiver operating characteristic.ti,ab. correlation.ti,ab. correlated.ti,ab. ((test or tests) adj5 accuracy).ti,ab. curve.ti,ab. curves.ti,ab. Test outcome.ti,ab. Pretest probabilities.ti,ab. Posttest probabilities.ti,ab. Roc curve.mp. Logistic models.mp. Likelihood functions.mp. diagnosis.ti,ab.	Ovid	70 (62‐78)	1.23 (0.99‐1.50)
InterTASC 2011 CRD B Unclear how terms combined	ORIGINAL	MeSH   Exp sensitivity and specificity/  Predictive value of tests/  Logistic models/  Roc curve/  Likelihood functions/  Reference standards/  Reference values/  Severity of illness index/  Reproducibility of results/  Observer variation/  Decision making/  Text words ti,ab   Diagnos* near5 efficac*  Diagnos* near5 efficien*  Diagnos* near5 effective*  Diagnos* near5 accura*  Diagnos* near5 correct*  Diagnos* near5 reliable  Diagnos* near5 reliability  Diagnos* near5 error*  Diagnos* near5 mistake*  Diagnos* near5 inaccura*  Diagnos* near5 incorrect  Diagnos* near5 unreliable  Decision making  Sensitivity near5 test  Sensitivity near5 tests  Specificity near5 test Specificity near5 tests  Predictive standard*  Predictive value*  Predictive model*  Predictive factor*  Roc  Reliability near2 standard*  Reliability near2 score*  Reliability near2 tool*  Reliability near2 aid  Reliability near2 aids  Performance near2 test  Performance near2 tests  Performance near2 testing  Performance near2 standard*  Performance near2 score*  Performance near2 tool*  Performance near2 aid  Performance near2 aids  Reference value*  sroc  Receiver operat* characteristic  Receiver operat* curve  Likelihood ratio*
	Ritchie 2007	NR	Ovid	40	4.1
	Whiting 2010	exp "sensitivity and specificity"/ Predictive value of tests/ Logistic models/ Roc curve/ Likelihood functions/ Reference standards/ Reference values/ Severity of illness index/ Reproducibility of results/ Observer variation/ Decision making/ (Diagnos$ adj5 efficac$).ti,ab. (Diagnos$ adj5 efficien$).ti,ab. (Diagnos$ adj5 effective$).ti,ab. (Diagnos$ adj5 accura$).ti,ab. (Diagnos$ adj5 correct$).ti,ab. (Diagnos$ adj5 reliable).ti,ab. (Diagnos$ adj5 reliability).ti,ab. (Diagnos$ adj5 error$).ti,ab. (Diagnos$ adj5 mistake$).ti,ab. (Diagnos$ adj5 inaccura$).ti,ab. (Diagnos$ adj5 incorrect).ti,ab. (Diagnos$ adj5 unreliable).ti,ab. Decision making.ti,ab. (sensitivity adj5 test).ti,ab. (sensitivity adj5 tests).ti,ab. (specificity adj5 test).ti,ab. (specificity adj5 tests).ti,ab. Predictive standard$.ti,ab. Predictive value$.ti,ab. Predictive model$.ti,ab. Predictive factor$.ti,ab. Roc.ti,ab. Receiver operat$ characteristic.ti,ab. Receiver operat$ curve.ti,ab. Likelihood ratio$.ti,ab. Likelihood function.ti,ab. (false adj2 reaction$).ti,ab. False positive$.ti,ab. False negative$.ti,ab. Gold standard$.ti,ab. Reference test.ti,ab. Reference tests.ti,ab. Reference standard$.ti,ab. Criter$ standard$.ti,ab. Criter$ bias.ti,ab. Criter$ test.ti,ab. Criter$ tests.ti,ab. Validat$ standard$.ti,ab. Validat$ test.ti,ab. Validat$ tests.ti,ab. Validat$ bias.ti,ab. Verificat$ bias.ti,ab. Work?up bias.ti,ab. Expectation bias.ti,ab. Indeterminate result$.ti,ab. (observer adj2 bias) .ti,ab. (observer adj10 different) .ti,ab. Observer variat$.ti,ab. Interrater reliability.ti,ab. Interater reliability.ti,ab. Observer reliability.ti,ab. (intra$ adj4 reliability) .ti,ab. (accura$ adj2 test).ti,ab. (accura$ adj2 tests).ti,ab. (accura$ adj2 testing).ti,ab. (accura$ adj2 standard$).ti,ab. (accura$ adj2 score$).ti,ab. (accura$ adj2 tool$).ti,ab. (accura$ adj2 aid).ti,ab. (accura$ adj2 aids).ti,ab. (reliability adj2 test).ti,ab. (reliability adj2 tests).ti,ab. (reliability adj2 testing).ti,ab. (reliability adj2 standard$).ti,ab. (reliability adj2 score$).ti,ab. (reliability adj2 tool$).ti,ab. (reliability adj2 aid).ti,ab. (reliability adj2 aids).ti,ab. (performance adj2 test).ti,ab. (performance adj2 tests).ti,ab. (performance adj2 testing).ti,ab. (performance adj2 standard$).ti,ab. (performance adj2 score$).ti,ab. (performance adj2 tool$).ti,ab. (performance adj2 aid).ti,ab. (performance adj2 aids).ti,ab. Reference value$.ti,ab. Sroc.ti,ab.	Ovid	64	7	NNR 15
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Predictive value of tests/ Logistic models/ Roc curve/ Likelihood functions/ Reference standards/ Reference values/ Severity of illness index/ Reproducibility of results/ Observer variation/ Decision making/ (Diagnos$ adj5 efficac$).ti,ab. (Diagnos$ adj5 efficien$).ti,ab. (Diagnos$ adj5 effective$).ti,ab. (Diagnos$ adj5 accura$).ti,ab. (Diagnos$ adj5 correct$).ti,ab. (Diagnos$ adj5 reliable).ti,ab. (Diagnos$ adj5 reliability).ti,ab. (Diagnos$ adj5 error$).ti,ab. (Diagnos$ adj5 mistake$).ti,ab. (Diagnos$ adj5 inaccura$).ti,ab. (Diagnos$ adj5 incorrect).ti,ab. (Diagnos$ adj5 unreliable).ti,ab. Decision making.ti,ab. (sensitivity adj5 test).ti,ab. (sensitivity adj5 tests).ti,ab. (specificity adj5 test).ti,ab. (specificity adj5 tests).ti,ab. Predictive standard$.ti,ab. Predictive value$.ti,ab. Predictive model$.ti,ab. Predictive factor$.ti,ab. Roc.ti,ab. Receiver operat$ characteristic.ti,ab. Receiver operat$ curve.ti,ab. Likelihood ratio$.ti,ab. Likelihood function.ti,ab. (false adj2 reaction$).ti,ab. False positive$.ti,ab. False negative$.ti,ab. Gold standard$.ti,ab. Reference test.ti,ab. Reference tests.ti,ab. Reference standard$.ti,ab. Criter$ standard$.ti,ab. Criter$ bias.ti,ab. Criter$ test.ti,ab. Criter$ tests.ti,ab. Validat$ standard$.ti,ab. Validat$ test.ti,ab. Validat$ tests.ti,ab. Validat$ bias.ti,ab. Verificat$ bias.ti,ab. Work?up bias.ti,ab. Expectation bias.ti,ab. Indeterminate result$.ti,ab. (observer adj2 bias) .ti,ab. (observer adj10 different) .ti,ab. Observer variat$.ti,ab. Interrater reliability.ti,ab. Interater reliability.ti,ab. Observer reliability.ti,ab. (intra$ adj4 reliability) .ti,ab. (accura$ adj2 test).ti,ab. (accura$ adj2 tests).ti,ab. (accura$ adj2 testing).ti,ab. (accura$ adj2 standard$).ti,ab. (accura$ adj2 score$).ti,ab. (accura$ adj2 tool$).ti,ab. (accura$ adj2 aid).ti,ab. (accura$ adj2 aids).ti,ab. (reliability adj2 test).ti,ab. (reliability adj2 tests).ti,ab. (reliability adj2 testing).ti,ab. (reliability adj2 standard$).ti,ab. (reliability adj2 score$).ti,ab. (reliability adj2 tool$).ti,ab. (reliability adj2 aid).ti,ab. (reliability adj2 aids).ti,ab. (performance adj2 test).ti,ab. (performance adj2 tests).ti,ab. (performance adj2 testing).ti,ab. (performance adj2 standard$).ti,ab. (performance adj2 score$).ti,ab. (performance adj2 tool$).ti,ab. (performance adj2 aid).ti,ab. (performance adj2 aids).ti,ab. Reference value$.ti,ab. Sroc.ti,ab.	Ovid	67 (58‐75)	1.69 (1.40‐2.10)
InterTASC 2011 CRD C Unclear how terms combined	ORIGINAL	MeSH   Exp Sensitivity and specificity/  False positive reactions/  False negative reactions/  Logistic models/  Roc curve/  Likelihood functions/  diagnosis/  Exp diagnostic errors/  Exp diagnostic techniques and procedures/  Exp laboratory techniques and procedures/  Text words ti,ab   Specificit$  Sensitivit$  False negative$  False positive$  True negative$  True positive$  Positive rate$  Negative rate$  Screening  Accuracy  Reference value$  Likelihood ratio$  Sroc  Srocs  Roc  Rocs  Receiver operat$ curve$  Receiver operat$ character$  Diagnos$ adj3 efficac$  Diagnos$ adj3 efficien$  Diagnos$ adj3 effectiv$ Diagnos$ adj3 accura$  Diagnos$ adj3 correct$  Diagnos$ adj3 reliable  Diagnos$ adj3 reliability Diagnos$ adj3 error$  Diagnos$ adj3 mistake$  Diagnos$ adj3 inaccura$  Diagnos$ adj3 incorrect  Diagnos$ adj3 unreliable  Diagnostic yield.mp  Misdiagnos$  Reproductivity.mp  Logistical regression.mp  Logistical model$  Ability adj2 predict$  Reliable adj3 test  Reliable adj3 tests  Reliable adj3 testing  Reliable adj3 standard  Reliability adj3 test  Reliability adj3 tests  Reliability adj3 testing  Reliability adj3 standard  Performance adj3 test  Performance adj3 tests  Performance adj3 testing  Performance adj3 standard$  Predictive adj value$  Predictive adj standard$  Predictive adj model$  Predictive adj factor$  Reference adj test  Reference adj tests  Reference adj testing  Index adj test  Index adj tests  Index adj testing
	Ritchie 2007	NR	Ovid	69	1.2
	Whiting 2010	exp "sensitivity and specificity"/ False positive reactions/ False negative reactions/ Logistic models/ Roc curve/ Likelihood functions/ Diagnosis/ Exp diagnostic errors/ exp "Diagnostic Techniques and Procedures"/ exp "laboratory techniques and procedures"/ Specificit$.ti,ab. Sensitivity$.ti,ab. False negative$.ti,ab. False positive$.ti,ab. True negative$.ti,ab. True positive$.ti,ab. Positive rate$.ti,ab. Negative rate$.ti,ab. Screening.ti,ab. Accuracy.ti,ab. Reference value$.ti,ab. Likelihood ratio$.ti,ab. Sroc.ti,ab. Srocs.ti,ab. Roc.ti,ab. Rocs.ti,ab. Receiver operat$ curve$.ti,ab. Receiver operat$ character$.ti,ab. (Diagnos$ adj3 efficac$).ti,ab. (Diagnos$ adj3 efficien$).ti,ab. (Diagnos$ adj3 effectiv$).ti,ab. (Diagnos$ adj3 accura$).ti,ab. (Diagnos$ adj3 correct$).ti,ab. (Diagnos$ adj3 reliable).ti,ab. (Diagnos$ adj3 reliability).ti,ab. (Diagnos$ adj3 error$).ti,ab. (Diagnos$ adj3 mistake$).ti,ab. (Diagnos$ adj3 inaccura$).ti,ab. (Diagnos$ adj3 incorrect$).ti,ab. (Diagnos$ adj3 unreliable).ti,ab. Diagnostic yield.mp. Misdiagnos$.ti,ab. Reproductivity.mp. Logistical regression.mp. Logistical model$.ti,ab. (ability adj2 predict$).ti,ab. (reliable adj3 test).ti,ab. (reliable adj3 tests).ti,ab. (reliable adj3 testing).ti,ab. (reliable adj3 standard).ti,ab. (reliability adj3 test).ti,ab. (reliability adj3 tests).ti,ab. (reliability adj3 testing).ti,ab. (reliability adj3 standard).ti,ab. (performance adj3 test).ti,ab. (performance adj3 tests).ti,ab. (performance adj3 testing).ti,ab. (performance adj3 standard$).ti,ab. (Predictive adj value$).ti,ab. (Predictive adj standard$).ti,ab. (Predictive adj model$).ti,ab. (Predictive adj factor$).ti,ab. (Reference adj test).ti,ab. (Reference adj tests).ti,ab. (Reference adj testing).ti,ab. (index adj test).ti,ab. (index adj tests).ti,ab. (index adj testing).ti,ab.	Ovid	85	2	NNR 46
	Noel‐Storr 2011	exp "sensitivity and specificity"/ False positive reactions/ False negative reactions/ Logistic models/ Roc curve/ Likelihood functions/ Diagnosis/ Exp diagnostic errors/ exp "Diagnostic Techniques and Procedures"/ exp "laboratory techniques and procedures"/ Specificit$.ti,ab. Sensitivity$.ti,ab. False negative$.ti,ab. False positive$.ti,ab. True negative$.ti,ab. True positive$.ti,ab. Positive rate$.ti,ab. Negative rate$.ti,ab. Screening.ti,ab. Accuracy.ti,ab. Reference value$.ti,ab. Likelihood ratio$.ti,ab. Sroc.ti,ab. Srocs.ti,ab. Roc.ti,ab. Rocs.ti,ab. Receiver operat$ curve$.ti,ab. Receiver operat$ character$.ti,ab. (Diagnos$ adj3 efficac$).ti,ab. (Diagnos$ adj3 efficien$).ti,ab. (Diagnos$ adj3 effectiv$).ti,ab. (Diagnos$ adj3 accura$).ti,ab. (Diagnos$ adj3 correct$).ti,ab. (Diagnos$ adj3 reliable).ti,ab. (Diagnos$ adj3 reliability).ti,ab. (Diagnos$ adj3 error$).ti,ab. (Diagnos$ adj3 mistake$).ti,ab. (Diagnos$ adj3 inaccura$).ti,ab. (Diagnos$ adj3 incorrect$).ti,ab. (Diagnos$ adj3 unreliable).ti,ab. Diagnostic yield.mp. Misdiagnos$.ti,ab. Reproductivity.mp. Logistical regression.mp. Logistical model$.ti,ab. (ability adj2 predict$).ti,ab. (reliable adj3 test).ti,ab. (reliable adj3 tests).ti,ab. (reliable adj3 testing).ti,ab. (reliable adj3 standard).ti,ab. (reliability adj3 test).ti,ab. (reliability adj3 tests).ti,ab. (reliability adj3 testing).ti,ab. (reliability adj3 standard).ti,ab. (performance adj3 test).ti,ab. (performance adj3 tests).ti,ab. (performance adj3 testing).ti,ab. (performance adj3 standard$).ti,ab. (Predictive adj value$).ti,ab. (Predictive adj standard$).ti,ab. (Predictive adj model$).ti,ab. (Predictive adj factor$).ti,ab. (Reference adj test).ti,ab. (Reference adj tests).ti,ab. (Reference adj testing).ti,ab. (index adj test).ti,ab. (index adj tests).ti,ab. (index adj testing).ti,ab.	Ovid	90 (83‐94)	1.15 (0.95‐1.38)
InterTASC 2011 HTBS Unclear how terms combined	ORIGINAL	MeSH   Exp Sensitivity and specificity/  Exp Diagnostic errors/  Likelihood functions/  Reproducibility of results/  Text words .tw   Sensitivit$  Specificit$  Accurac$  Predictive adj2 value$ False$ adj2 positive$  False$ adj2 negative$  False$ adj2 rate$  roc  Receiver operat$ adj2 curve$  Receiver operat$ characteristic$  Likelihood$ adj2 ratio$  Likelihood$ adj2 function$
	Ritchie 2007	NR	Ovid	46	3.7
	Whiting 2010	exp "sensitivity and specificity"/ Exp diagnostic errors/ Likelihood functions/ Reproducibility of results/ Sensitivity$.tw. Specificit$.tw. Accuracy$.tw. (Predictive adj2 value$).tw. (False$ adj2 positive$).tw. (false$ adj2 negative$).tw. (false$ adj2 rate$).tw. Roc.tw. (receiver operat$ adj2 curve$).tw. (receiver operat$ characteristic$).tw (likelihood$ adj2 ratio$).tw. (likelihood$ adj2 function$).tw.	Ovid	69	8	NNR 12
	Noel‐Storr 2011	exp "sensitivity and specificity"/ Exp diagnostic errors/ Likelihood functions/ Reproducibility of results/ Sensitivity$.tw. Specificit$.tw. Accuracy$.tw. (Predictive adj2 value$).tw. (False$ adj2 positive$).tw. (false$ adj2 negative$).tw. (false$ adj2 rate$).tw. Roc.tw. (receiver operat$ adj2 curve$).tw. (receiver operat$ characteristic$).tw (likelihood$ adj2 ratio$).tw. (likelihood$ adj2 function$).tw.	Ovid	56 (47‐65)	2.04 (1.60‐2.57)
Shipley Miner 2002^$	ORIGINAL	1 exp "sensitivity and specificity"/  2 (sensitivity or specificity).ti.ab.  3 likelihood functions/  4 exp diagnostic errors/  5 area under curve/  6 reproducibility of results/  7 (predictive adj value$1).ti.ab.  8 (likelihood adj ratio$1).ti.ab.  9 (false adj (negative$1 or positive$1).ti.ab.  10 diagnosis, differential/  11 random allocations/  12 random$.ti,ab. 13 ((single or double or triple) adj blind$3).ti,ab.  14 double blind method/ or single blind method/  15 (randomized controlled trial or controlled clinical trial).pt.  16 practice guideline.pt.  17 consensus development conference$.pt.  18 1 or 2 or 8 or 3  19 or/1‐17
	Ritchie 2007	NR	Ovid	48	1.8
	Whiting 2010	exp "sensitivity and specificity"/ (sensitivity or specificity).ti,ab. Likelihood functions/ Exp diagnostic errors/ Area under curve/ Reproducibility of results/ (predictive adj value$1).ti,ab. (likelihood adj ratio$1).ti,ab. (false adj (negative$1 or positive$1)).ti,ab. Diagnosis, differential/ Random allocation/ Random$.ti,ab. ((single or double or triple) adj blind$3).ti,ab. Double blind method/ Single blind method/ Randomized controlled trial.pt. Controlled clinical trial.pt. Practice guideline.pt. Consensus development conference$.pt.	Ovid	72	5	NNR 19
	Noel‐Storr 2011	exp "sensitivity and specificity"/ (sensitivity or specificity).ti,ab. Likelihood functions/ Exp diagnostic errors/ Area under curve/ Reproducibility of results/ (predictive adj value$1).ti,ab. (likelihood adj ratio$1).ti,ab. (false adj (negative$1 or positive$1)).ti,ab. Diagnosis, differential/ Random allocation/ Random$.ti,ab. ((single or double or triple) adj blind$3).ti,ab. Double blind method/ Single blind method/ Randomized controlled trial.pt. Controlled clinical trial.pt. Practice guideline.pt. Consensus development conference$.pt.	Ovid	63 (54‐72)	1.71 (1.35‐2.12)
University of Rochester 2002^$	ORIGINAL	Unable to access – website no longer valid
University of Rochester 2002^$	Vincent 2003	1 exp ‘sensitivity and specificity’/ 2 false negative reactions/ or false positive reactions/ 3 (sensitivity or specificity).ti,ab. 4 (predicitive adj value$1).ti,ab. 5 (likelihood adj ratio$10.TI,AB. 6 (false adj (negative$1 or positive$1)).ti,ab. 7 or/1‐7	NR	79
North Thames 2002	ORIGINAL	Unable to access – website no longer valid
North Thames 2002	Vincent 2003	1 exp ‘sensitivity and specificity’ 2 exp diagnostic errors 3 mass screening 4 or/1‐3	NR	53
Abbreviations used: TSR = Tympanometry systematic review; NPSR = Natriuretic peptides systematic review. ^$ Filter no longer available from source cited by evaluation studies.

Open in a new tab

Characteristics of studies

Characteristics of included studies [ordered by study ID]

Astin 2008.

Methods	Method of identification of reference set records ‐ Handsearching Method of deriving filter terms ‐ Analysis of reference set
Data	Reference set years ‐ Development set 1985 Clin Radiol, 1988 Am J Neuroradiol; validation set 2000 Number of gold standard records ‐ 333 in development set; 186 in validation set Number of non‐gold standard records ‐ 2222 in development set; 1070 in validation set
Comparisons	Reference set also contained non‐gold standard records ‐Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 1
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Filter developed to retrieve radiology DTA studies. High concerns about applicability
Independent internal validation?	Yes	Discrete set of references in a derivation set from six handsearched journals and a validation set from six different handsearched journals
Externally validated?	No	High concerns about applicability

Open in a new tab

Bachmann 2002.

Methods	Method of identification of reference set records ‐ Handsearching Method of deriving filter terms ‐ Analysis of reference set
Data	Reference set years ‐ 1989, 1994 and 1999 Number of gold standard records ‐ 83 in 1989 test set; 53 in 1994 validation set; 61 in 1999 validation set Number of non‐gold standard records ‐ 1646 in 1989 test set; 1744 in 1994 validation set; 7875 in 1999 validation set
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 1
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant, systematic review not used
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	Yes	Terms derived from 1989 reference set; filter validated in 1994 validation set
Externally validated?	Yes	References from search of different journals and year to the derivation and internal validation set. Low concerns about applicability

Open in a new tab

Bachmann 2003.

Methods	Method of identification of reference set records ‐ Handsearching Method of deriving filter terms ‐ Analysis of reference set
Data	Reference set years ‐ 1999 Number of gold standard records ‐ 61 Number of non‐gold standard records ‐ 6082
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ All records retrieved by search that were not classified as gold standard studies
Outcomes	Number of filters developed ‐ 8
Notes	EMBASE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	No	Terms for filters derived through word frequency analysis of the same references as the validation set
Externally validated?	No

Open in a new tab

Berg 2005.

Methods	Method of identification of reference set records ‐ Handsearching Method of deriving filter terms ‐ Analysis of reference set and adaption of existing filter
Data	Reference set years ‐ Not reported Number of gold standard records ‐ Not reported Number of non‐gold standard records ‐ 238
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 1
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Cancer‐related fatigue topic specific. High concerns about applicability
Independent internal validation?	No	Used the indexing of included citations from gold standard references to derive terms, these references also included in validation
Externally validated?	No	High concerns about applicability

Open in a new tab

Deville 2000.

Methods	Method of identification of reference set records ‐ Handsearching Method of deriving filter terms ‐ Analysis of reference set
Data	Reference set years ‐ 1992‐1995 Number of gold standard records ‐ 75; 33 in meniscal lesions set Number of non‐gold standard records ‐ 2392; meniscal lesions set not reported
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ False positive papers selected by a previously published search strategy, exclusion of some publication types (e.g. reviews and meta‐analyses)
Outcomes	Number of filters developed ‐ 4 Number of filters evaluated ‐ 1
Notes	MEDLINE development and evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Family medicine reference set; physical diagnostic tests for meniscal lesion validation set. High concerns about applicability
Independent internal validation?	No
Externally validated?	Yes

Open in a new tab

Deville 2002.

Methods	Method of identification of reference set records ‐ DTA systematic reviews Method of deriving filter terms ‐ Adaption of existing filter
Data	Reference set years ‐ Not reported Number of gold standard records ‐ Not reported Number of non‐gold standard records ‐ Not reported
Comparisons	Reference set also contained non‐gold standard records ‐ Not reported Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 1
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	The reference cited by the study to the systematic review which was used is unavailable. A meta‐analysis published by the same author on the topic did use a search strategy containing diagnostic terms
Generic gold‐standard records?	No	Studies from a systematic review of diagnostic tests for knee lesions and a systematic review of a urine dipstick test comprised the reference set. High concerns about applicability
Externally validated?	Yes	Real‐world validation sets based on two systematic reviews. Low concerns about applicability

Open in a new tab

Doust 2005.

Methods	Method of identification of reference set records ‐ DTA systematic reviews conducted by authors
Data	Reference set years ‐ Tympanometry 1966‐2001; natriuretic peptides 1994‐2002 Number of gold standard records ‐ Tympanometry n=33; natriuretic peptides n=20 Number of non‐gold standard records ‐ 0
Comparisons	Reference set also contained non‐gold standard records ‐ No Description of non‐gold standard records if used in reference set ‐ Not applicable
Outcomes	Number of filters evaluated ‐ 5
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	No	The authors conducted two systematic reviews whose studies comprised the reference set. The clinical queries filter for diagnostic studies available in PubMed was used.
Generic gold‐standard records?	No	Studies from a systematic review of tympanometry for the diagnosis of otitis media with effusion in children and a systematic review of natriuretic peptides comprised the reference standard. High concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Haynes 1994.

Methods	Method of identification of reference set records ‐ Handsearching for primary studies Method of deriving filter terms ‐ Expert knowledge and analysis of reference set
Data	Reference set years ‐ 1986 and 1991 Number of gold standard records ‐ 92 in 1986 set; 111 in 1991 set Number of non‐gold standard records ‐ 426 in 1986 set; 301 in 1991 set
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 12
Notes	MEDLINE development study. All papers listed under Haynes 1994 used for data extraction
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	No	Terms were collected through expert knowledge but their combination into a strategy was not independent of the references used for validation. The reference standard was used to eliminate terms with <10% sensitivity or combination with <40% sensitivity or <70% specificity
Externally validated?	No	High concerns about applicability

Open in a new tab

Haynes 2004.

Methods	Method of identification of reference set records ‐ Handsearching for primary studies Method of deriving filter terms ‐ Expert knowledge and analysis of reference set
Data	Reference set years ‐ 2000 Number of gold standard records ‐ 147 Number of non‐gold standard records ‐ 48,881
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 11
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	No	Individual search terms with a sensitivity >25% and a specificity >75% (when tested in the reference set) were incorporated into the development of search strategies
Externally validated?	No	High concerns about applicability

Open in a new tab

Kassai 2006.

Methods	Method of identification of reference set records ‐ Primary studies identified through Internet search
Data	Reference set years ‐ 1966‐2002 Number of gold standard records ‐ 237 Number of non‐gold standard records ‐ 1236
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ All studies retrieved by search not classified as gold standard records
Outcomes	Number of filters evaluated ‐ 3
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Venous thrombosis and ultrasonography topic specific. High concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Kastner 2009.

Methods	Method of identification of reference set records ‐ Internet search for DTA systematic reviews
Data	Reference set years ‐ 2006 Number of gold standard records ‐ 441 Number of non‐gold standard records ‐ 0
Comparisons	Reference set also contained non‐gold standard records ‐ No Description of non‐gold standard records if used in reference set ‐ Not applicable
Outcomes	Number of filters evaluated ‐ 1
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	No	Five of the twelve systematic reviews which provided studies for the reference set, used search strategies containing DTA search terms to find primary studies
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Leeflang 2006.

Methods	Method of identification of reference set records ‐ Internet search for DTA systematic reviews
Data	Reference set years ‐ 1999‐2002 Number of gold standard records ‐ 820 Number of non‐gold standard records ‐ 0
Comparisons	Reference set also contained non‐gold standard records ‐ No Description of non‐gold standard records if used in reference set ‐ Not applicable
Outcomes	Number of filters evaluated ‐ 12
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Of the 27 systematic reviews whose studies were used to comprise the reference set, seven did not describe their search strategy. It is unclear, therefore, whether diagnostic terms would have been applied in the search.
Generic gold‐standard records?	Yes	Low concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Mitchell 2005.

Methods	Method of identification of reference set records ‐ Handsearching for primary studies
Data	Reference set years ‐ 1991‐1992; 2002‐2003 Number of gold standard records ‐ 99 Number of non‐gold standard records ‐ 4409
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters evaluated ‐ 6 MEDLINE filters and 6 EMBASE filters
Notes	MEDLINE and EMBASE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Kidney disease topic specific. High concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Noel‐Storr 2011.

Methods	Method of identification of reference set records ‐ DTA systematic reviews conducted by the authors Method of deriving filter terms ‐ Analysis of reference set (authors ran published search filters in MEDLINE combined with a subject search, locating 10 papers that all filters missed and choosing a term from their title/abstract or keywords of each)
Data	Reference set years ‐ 2000‐2001 Number of gold standard records ‐ 128 in September 2010 set with additional 16 found in update search. Therefore, 144 in August 2011 Number of non‐gold standard records ‐ 17,266 in September 2010 set; with additional 1654 found in update search; so 18,920 in August 2011
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ All studies retrieved by search not classified as gold standard records
Outcomes	Number of filters developed ‐ 1
Notes	MEDLINE development and evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Yes
Generic gold‐standard records?	No	Studies included in a systematic review of biomarkers for diagnosing mild cognitive impairment comprised reference set; filter designed to retrieve longitudinal DTA studies. High concerns about applicability
Independent internal validation?	No	The reference was not totally independent of the set used to derive terms, it consisted of 144 gold standard records and 18,920 non‐gold standard records, but the 10 studies used to derive terms for the new filter were included in the reference set during validation
Externally validated?	No	High concerns about applicability

Open in a new tab

Ritchie 2007.

Methods	Method of identification of reference set records ‐ Primary studies identified through Internet search
Data	Reference set years ‐ 1966‐2003 Number of gold standard records ‐ 160 Number of non‐gold standard records ‐ 27,804
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ All studies retrieved by search not classified as gold standard records
Outcomes	Number of filters evaluated ‐ 22
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Childhood urinary tract infection diagnosis topic specific. High concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

van der Weijden 1997.

Methods	Method of identification of reference set records ‐ Personal literature database Method of deriving filter terms ‐ Checking key publications for terms and language used
Data	Reference set years ‐ 1985‐1994 Number of gold standard records ‐ 221 Number of non‐gold standard records ‐ 0
Comparisons	Reference set also contained non‐gold standard records ‐ No Description of non‐gold standard records if used in reference set ‐ Not applicable
Outcomes	Number of filters developed ‐ 3
Notes	MEDLINE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	No	Erythrocyte sedimentation as a diagnostic test topic specific. High concerns about applicability
Independent internal validation?	No	Filters composed of terms that were derived from checking the key publications for terms and language used, not judged to be internally validated as only real‐world external validation carried out
Externally validated?	Yes	Two systematic reviews on ESR and dipstick testing provided references for validation testing. Low concerns about applicability

Open in a new tab

Vincent 2003.

Methods	Method of identification of reference set records ‐ DTA systematic reviews Method of deriving filter terms ‐ Adaption of existing filter and analysis of reference set
Data	Reference set years ‐ 1969‐2000 Number of gold standard records ‐ 126 Number of non‐gold standard records ‐ 0
Comparisons	Reference set also contained non‐gold standard records ‐ No Description of non‐gold standard records if used in reference set ‐ Not applicable
Outcomes	Number of filters developed ‐ 3 Number of filters evaluated ‐ 5
Notes	MEDLINE development and evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	No	At least one of the 16 systematic reviews used to provide studies for the reference set, used diagnostic search terms in the search strategy. Many of the systematic reviews did not provide a full search strategy and, therefore, it is unclear whether they would have used a diagnostic filter or not.
Generic gold‐standard records?	No	Deep vein thrombosis diagnosis topic specific. High concerns about applicability
Independent internal validation?	No	Published filters were adapted by adding and removing terms based on the results of searches of the reference set
Externally validated?	No	High concerns about applicability

Open in a new tab

Whiting 2010.

Methods	Method of identification of reference set records ‐ Systematic reviews conducted by the authors
Data	Reference set years ‐ Not reported Number of gold standard records ‐ 506 Number of non‐gold standard records ‐ 25,880 (number obtained from authors)
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters evaluated ‐ 22
Notes	MEDLINE evaluation study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Yes	The authors conducted the systematic reviews and state that their search strategies did not contain any diagnostic terms
Generic gold‐standard records?	Yes	DTA studies from seven systematic reviews which covered a range of different types of diagnostic test and condition. Low concerns about applicability
Independent internal validation?	Unclear	Not relevant
Externally validated?	Unclear	Not relevant

Open in a new tab

Wilczynski 2005.

Methods	Method of identification of reference set records ‐ Handsearching for primary studies Method of deriving filter terms ‐ Analysis of reference set and expert knowledge
Data	Reference set years ‐ 2000 Number of gold standard records ‐ 97 Number of non‐gold standard records ‐ 27,672
Comparisons	Reference set also contained non‐gold standard records ‐ Yes Description of non‐gold standard records if used in reference set ‐ Not reported
Outcomes	Number of filters developed ‐ 4
Notes	EMBASE development study
*Risk of bias*
Item	Authors' judgement	Description
If relevant, systematic review did not use DTA strategy?	Unclear	Not relevant
Generic gold‐standard records?	Yes	Low concern about applicability
Independent internal validation?	No
Externally validated?	No	High concerns about applicability

Open in a new tab

Contributions of authors

Rebecca Beynon designed the study, ran literature searches, screened literature searches, extracted data, synthesized data and drafted the manuscript. Julie Glanville devised and ran literature searches and drafted the manuscript. Mariska Leeflang designed the study, screened literature searches and edited the manuscript. Ruth Mitchell devised and ran literature searches, screened literature searches and edited the manuscript. Anne Eisinga devised and ran literature searches, and edited the manuscript. Steve McDonald devised and ran literature searches, screened literature searches and edited the manuscript. Penny Whiting edited the manuscript.

Sources of support

Internal sources

National Institute for Health Research, UK.

Incentive award for completion of review

External sources

No sources of support supplied

Declarations of interest

Julie Glanville, together with colleagues from the InterTASC Information Specialist Subgroup, developed the Search Filter Appraisal Checklist that is used in this review for the methodological assessment of the included studies and has published search filters. Julie Glanville, Mariska Leeflang, Ruth Mitchell, Rebecca Beynon and Penny Whiting have published performance evaluations of search filters.

New

References

References to studies included in this review

Astin 2008 {published data only}

Astin MP, Brazzelli MG, Fraser CM, Counsell CE, Needham G, Grimshaw JM. Developing a sensitive search strategy in MEDLINE to retrieve studies on assessment of the diagnostic performance of imaging techniques. Radiology 2008;247(2):365‐73. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Bachmann 2002 {published data only}

Bachmann LM, Coray R, Estermann P, Ter Riet G. Identifying diagnostic studies in MEDLINE: reducing the number needed to read. Journal of the American Medical Informatics Association 2002;9(6):653‐8. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bachmann 2003 {published data only}

Bachmann LM. Identifying diagnostic accuracy studies in EMBASE. Journal of the Medical Library Association 2003;91(3):341‐6. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

Berg 2005 {published data only}

Berg A, Fleischer S, Behrens J. Development of two search strategies for literature in MEDLINE‐PubMed: nursing diagnoses in the context of evidence‐based nursing. International Journal of Nursing Terminologies and Classifications 2005;16(2):26‐32. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Deville 2000 {published data only}

Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. Journal of Clinical Epidemiology 2000;53(1):65‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Deville 2002 {published data only}

Deville WL, Bossuyt PM, Vet HC, Bezemer PD, Bouter LM, Assendelft WJ. Systematic reviews in practice. X. Searching, selecting and the methodological assessment of diagnostic evaluation research [De praktijk van systematische reviews. X. Zoeken, selecteren en methodologisch beoordelen van diagnostisch evaluatieonderzoek]. Nederlands Tijdschrift voor Geneeskunde 2002;146(48):2281‐4. [MEDLINE: ] [PubMed] [Google Scholar]

Doust 2005 {published data only}

Doust JA, Pietrzak E, Sanders S, Glasziou PP. Identifying studies for systematic reviews of diagnostic tests was difficult due to the poor sensitivity and precision of methodologic filters and the lack of information in the abstract. Journal of Clinical Epidemiology 2005;58(5):444‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Haynes 1994 {published data only}

Haynes RB, Wilczynski, McKibbon KA, Walker CJ, Sinclair JC. Developing optimal search strategies for detecting clinically sound studies in MEDLINE. Journal of the American Medical Informatics Association 1994;1(6):447‐58. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]
Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Assessment of methodologic search filters in MEDLINE. Proceedings of the Annual Symposium on Computer Applications in Medical Care 1993:601‐5. [PMC free article] [PubMed] [Google Scholar]
Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Quantitative comparison of pre‐explosions and subheadings with methodologic search terms in MEDLINE. Proceedings of the Annual Symposium on Computer Applications in Medical Care 1994:905‐9. [PMC free article] [PubMed] [Google Scholar]

Haynes 2004 {published data only}

Haynes RB, Wilczynski NL. Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline: analytical survey. BMJ 2004;328(7447):1040. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kassai 2006 {published data only}

Kassai B, Sonie S, Shah NR, Boissel JP. Literature search parameters marginally improved the pooled estimate accuracy for ultrasound in detecting deep venous thrombosis. Journal of Clinical Epidemiology 2006;59(7):710‐4. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kastner 2009 {published data only}

Kastner M, Wilczynski NL, McKibbon AK, Garg AX, Haynes RB. Diagnostic test systematic reviews: bibliographic search filters ("Clinical Queries") for diagnostic accuracy studies perform well. Journal of Clinical Epidemiology 2009;62(9):974‐81. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Leeflang 2006 {published data only}

Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. Journal of Clinical Epidemiology 2006;59(3):234‐40. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Mitchell 2005 {published data only}

Mitchell RL, Rinaldi F, Craig JC. Performance of published search strategies for studies of diagnostic test accuracy (SDTAs) in MEDLINE and EMBASE. XIII Cochrane Colloquium; 22‐26 Melbourne, Australia. 2005.

Noel‐Storr 2011 {published data only}

Noel‐Storr A. The development of a methodological filter for studies of diagnostic accuracy in dementia. IXX Cochrane Colloquium, 19‐22 October Madrid, Spain. 2011.

Ritchie 2007 {published data only}

Ritchie G, Glanville J, Lefebvre C. Do published search filters to identify diagnostic test accuracy studies perform adequately?. Health Information and Libraries Journal 2007;24(3):188‐92. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

van der Weijden 1997 {published data only}

Weijden T, IJzermans CJ, Dinant GJ, Duijn NP, Vet R, Buntinx F. Identifying relevant diagnostic studies in MEDLINE. The diagnostic value of the erythrocyte sedimentation rate (ESR) and dipstick as an example. Family Practice 1997;14(3):204‐8. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Vincent 2003 {published data only}

Vincent S, Greenley S, Beaven O. Clinical Evidence diagnosis: developing a sensitive search strategy to retrieve diagnostic studies on deep vein thrombosis: a pragmatic approach. Health Information and Libraries Journal 2003;20(3):150‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Whiting 2010 {published data only}

Whiting P, Westwood M, Beynon R, Burke M, Sterne JA, Glanville J. Inclusion of methodological filters in searches for diagnostic test accuracy studies misses relevant studies. Journal of Clinical Epidemiology 2010;64(6):602‐7. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Wilczynski 2005 {published data only}

Wilczynski NL, Haynes RB, Hedges Team. EMBASE search strategies for identifying methodologically sound diagnostic studies for use by clinicians and researchers. BMC Medicine 2005;3:7. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Additional references

Bossuyt 2003

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clinical Chemistry 2003;49(1):7‐18. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

CASP 2002

Critical Appraisal Skills Programme. Search Filters. http://www.phru.nhs.uk/casp/search_filters.htm (No longer available) 2006.

Deeks 2010

Deeks JJ, Bossuyt PM, Gatsonis C (editors). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0. The Cochrane Collaboration, 2010. Available from http://srdta.cochrane.org (accessed 25 April 2013).

DeVet 2008

Vet HCW, Eisinga A, Riphagen II, Aertgeerts B, Pewsner D, Mitchell R. Chapter 7: Searching for studies. In: Deeks JJ, Bossuyt PM, Gatsonis C (editors). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. The Cochrane Collaboration, 2008. Available from http://srdta.cochrane.org (accessed 25 April 2013).

Deville 2002a

Deville WL, Buntinx F, Bouter LM, Montori VM, Vet HC, Windt DA. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Medical Research Methodology 2002;2:9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Falck‐Ytter 2004

Falck‐Ytter YT, Motschall E. New search filter for diagnostic studies: Ovid and PubMed versions not the same [2004]. available at http://www.bmj.com/content/328/7447/1040?tab=responses (accessed 25 April 2013).

Fielding 2002

Fielding AM, Powell A. Using Medline to achieve an evidence‐based approach to diagnostic clinical biochemistry. Annals of Clinical Biochemistry 2002;39(Pt 4):345‐50. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Glanville 2008

Glanville J, Bayliss S, Booth A, Dundar Y, Fleeman ND, Foster L, et al. on behalf of the InterTASC Information Specialists' Subgroup. So many filters, so little time: the development of a search filter appraisal checklist. Journal of the Medical Library Association 2008;96(4):356‐61. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Haynes 2005

Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR, Hedges Team. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. BMJ 2005;330(7501):1179‐84. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Haynes 2005a

Haynes RB, Kastner M, Wilczynski NL, Hedges Team. Developing optimal search strategies for detecting clinically sound and relevant causation studies in EMBASE. BMC Medical Information and Decision Making 2005;5:8. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Horsley 2011

Horsley T, Dingwall O, Sampson M. Checking reference lists to find additional studies for systematic reviews. Cochrane Database of Systematic Reviews 2011, Issue 8. [DOI: 10.1002/14651858.MR000026.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]

InterTASC 2011

InterTASC Information Specialists' Sub‐Group (ISSG). The InterTASC Information Specialists' Sub‐Group Search Filter Resource: diagnostic studies. Available at http://www.york.ac.uk/inst/crd/intertasc/diag.htm. York: Centre for Reviews and Dissemination, (accessed 25th April 2013).

Lefebvre 2011

Lefebvre C, Manheimer E, Glanville J. Chapter 6: Searching for studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from http://www.cochrane‐handbook.org (accessed 25 April 2013).

NLM 2005

US National Library of Medicine. Clinical Queries using Research Methodology Filters [updated Jan 2005]. Available from http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Clinical_Queries_Filters (accessed 25 April 2013).

North Thames 2002

North Thames. Diagnostic procedures. http://www.londonlinks.ac.uk/evidence_strategies/ovid_filters.htm#diagnostic. (No longer available).

Ovid 2010

Wolfer Kluer Health. Clinical queries in Ovid. available at: http://ovidsupport.custhelp.com/cgi‐bin/ovidsupport.cfg/php/enduser/std_adp.php?p_faqid=1599&;p_created=1087487498&p_sid=B86UCj8j&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PSZwX3Jvd19jbnQ9MTUsMTUmcF9wcm9kcz0wJnBfY2F0cz0wJnBfcHY9JnBfY3Y9JnBfc2VhcmNoX3R5cGU9YW5zd2Vycy5zZWFyY2hfbmwmcF9wYWdlPTEmcF9zZWFyY2hfdGV4dD1jbGluaWNhbCBxdWVyaWVz&p_li=&p_topview=1 First published 2004; updated 2010.

OvidSP 2013

Wolters Kluwer Health. MEDLINE® 2013 Database Guide. available at http://ovidsp.tx.ovid.com/sp‐3.8.1a/ovidweb.cgi?&S=BLIMFPMFDFDDHIFFNCOKFBFBJDCBAA00&Database+Field+Guide=37 [2012] (accessed 25 April 2013).

OvidSP 2013a

Wolters Kluwer Health. Embase: Excerpta Medica Database Guide. http://ovidsp.tx.ovid.com/sp‐3.8.1a/ovidweb.cgi?&S=BLIMFPMFDFDDHIFFNCOKFBFBJDCBAA00&Database+Field+Guide=10 [2012] (accessed 25 April 2013).

Shipley Miner 2002

Shipley MC. Evidence based filters for Ovid MEDLINE. http://www.urmc.rochester.edu/hslt/miner/digital_library/tip_sheets/OVID_eb_filters.pdf. Rochester: Edward G Miner Library, University of Rochester.

University of Rochester 2002

Miner Library Reference Librarians. Evidence based filters for Ovid MEDLINE. Miner Library, University of Rochester 2002.

Whiting 2008

Whiting P, Westwood M, Burke M, Sterne J, Harbord R, Glanville J. Can diagnostic filters offer similar sensitivity and a reduced number needed to read compared to searches based on index test and target condition? [abstract]. Methods for Evaluating Medical Tests. Symposium. 2008 Jul 24‐25.

Whiting 2011

Whiting P, Westwood M, Beynon R, Burke M, Sterne JA, Glanville J. Inclusion of methodological filters in searches for diagnostic accuracy studies misses relevant studies. Journal of Clinical Epidemiology 2011;64(6):602‐7. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

Wilczynski 1995

Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Reasons for the loss of sensitivity and specificity of methodologic MeSH terms and textwords in MEDLINE. Proceedings ‐ the Annual Symposium on Computer Applications in Medical Care 1995:436‐40. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

Wilczynski 2003

Wilczynski NL, Haynes RB, Hedges Team. Developing optimal search strategies for detecting clinically sound causation studies in MEDLINE. AMIA ‐ Annual Symposium Proceedings/AMIA Symposium 2003:719‐23. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

Wilczynski 2004

Wilczynski NL, Haynes RB, Hedges Team. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey. BMC Medicine 2004;2(1):23. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Wilczynski 2005a

Wilczynski NL, Haynes RB. Optimal search strategies for detecting clinically sound prognostic studies in EMBASE: an analytic survey. Journal of the American Medical Informatics Association 2005;12(4):481‐5. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Wilczynski 2007

Wilczynski NL, Haynes RB. Indexing of diagnosis accuracy studies in MEDLINE and EMBASE. AMIA ‐ Annual Symposium Proceedings/AMIA Symposium 2007:801‐5. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0001] Astin MP, Brazzelli MG, Fraser CM, Counsell CE, Needham G, Grimshaw JM. Developing a sensitive search strategy in MEDLINE to retrieve studies on assessment of the diagnostic performance of imaging techniques. Radiology 2008;247(2):365‐73. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0002] Bachmann LM, Coray R, Estermann P, Ter Riet G. Identifying diagnostic studies in MEDLINE: reducing the number needed to read. Journal of the American Medical Informatics Association 2002;9(6):653‐8. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0003] Bachmann LM. Identifying diagnostic accuracy studies in EMBASE. Journal of the Medical Library Association 2003;91(3):341‐6. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0004] Berg A, Fleischer S, Behrens J. Development of two search strategies for literature in MEDLINE‐PubMed: nursing diagnoses in the context of evidence‐based nursing. International Journal of Nursing Terminologies and Classifications 2005;16(2):26‐32. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0005] Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. Journal of Clinical Epidemiology 2000;53(1):65‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0006] Deville WL, Bossuyt PM, Vet HC, Bezemer PD, Bouter LM, Assendelft WJ. Systematic reviews in practice. X. Searching, selecting and the methodological assessment of diagnostic evaluation research [De praktijk van systematische reviews. X. Zoeken, selecteren en methodologisch beoordelen van diagnostisch evaluatieonderzoek]. Nederlands Tijdschrift voor Geneeskunde 2002;146(48):2281‐4. [MEDLINE: ] [PubMed] [Google Scholar]

[MR000022-bib-0007] Doust JA, Pietrzak E, Sanders S, Glasziou PP. Identifying studies for systematic reviews of diagnostic tests was difficult due to the poor sensitivity and precision of methodologic filters and the lack of information in the abstract. Journal of Clinical Epidemiology 2005;58(5):444‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0008] Haynes RB, Wilczynski, McKibbon KA, Walker CJ, Sinclair JC. Developing optimal search strategies for detecting clinically sound studies in MEDLINE. Journal of the American Medical Informatics Association 1994;1(6):447‐58. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0009] Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Assessment of methodologic search filters in MEDLINE. Proceedings of the Annual Symposium on Computer Applications in Medical Care 1993:601‐5. [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0010] Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Quantitative comparison of pre‐explosions and subheadings with methodologic search terms in MEDLINE. Proceedings of the Annual Symposium on Computer Applications in Medical Care 1994:905‐9. [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0011] Haynes RB, Wilczynski NL. Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline: analytical survey. BMJ 2004;328(7447):1040. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0012] Kassai B, Sonie S, Shah NR, Boissel JP. Literature search parameters marginally improved the pooled estimate accuracy for ultrasound in detecting deep venous thrombosis. Journal of Clinical Epidemiology 2006;59(7):710‐4. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0013] Kastner M, Wilczynski NL, McKibbon AK, Garg AX, Haynes RB. Diagnostic test systematic reviews: bibliographic search filters ("Clinical Queries") for diagnostic accuracy studies perform well. Journal of Clinical Epidemiology 2009;62(9):974‐81. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0014] Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. Journal of Clinical Epidemiology 2006;59(3):234‐40. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0015] Mitchell RL, Rinaldi F, Craig JC. Performance of published search strategies for studies of diagnostic test accuracy (SDTAs) in MEDLINE and EMBASE. XIII Cochrane Colloquium; 22‐26 Melbourne, Australia. 2005.

[MR000022-bib-0016] Noel‐Storr A. The development of a methodological filter for studies of diagnostic accuracy in dementia. IXX Cochrane Colloquium, 19‐22 October Madrid, Spain. 2011.

[MR000022-bib-0017] Ritchie G, Glanville J, Lefebvre C. Do published search filters to identify diagnostic test accuracy studies perform adequately?. Health Information and Libraries Journal 2007;24(3):188‐92. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0018] Weijden T, IJzermans CJ, Dinant GJ, Duijn NP, Vet R, Buntinx F. Identifying relevant diagnostic studies in MEDLINE. The diagnostic value of the erythrocyte sedimentation rate (ESR) and dipstick as an example. Family Practice 1997;14(3):204‐8. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0019] Vincent S, Greenley S, Beaven O. Clinical Evidence diagnosis: developing a sensitive search strategy to retrieve diagnostic studies on deep vein thrombosis: a pragmatic approach. Health Information and Libraries Journal 2003;20(3):150‐9. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0020] Whiting P, Westwood M, Beynon R, Burke M, Sterne JA, Glanville J. Inclusion of methodological filters in searches for diagnostic test accuracy studies misses relevant studies. Journal of Clinical Epidemiology 2010;64(6):602‐7. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0021] Wilczynski NL, Haynes RB, Hedges Team. EMBASE search strategies for identifying methodologically sound diagnostic studies for use by clinicians and researchers. BMC Medicine 2005;3:7. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0022] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clinical Chemistry 2003;49(1):7‐18. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0023] Critical Appraisal Skills Programme. Search Filters. http://www.phru.nhs.uk/casp/search_filters.htm (No longer available) 2006.

[MR000022-bib-0024] Deeks JJ, Bossuyt PM, Gatsonis C (editors). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0. The Cochrane Collaboration, 2010. Available from http://srdta.cochrane.org (accessed 25 April 2013).

[MR000022-bib-0025] Vet HCW, Eisinga A, Riphagen II, Aertgeerts B, Pewsner D, Mitchell R. Chapter 7: Searching for studies. In: Deeks JJ, Bossuyt PM, Gatsonis C (editors). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. The Cochrane Collaboration, 2008. Available from http://srdta.cochrane.org (accessed 25 April 2013).

[MR000022-bib-0026] Deville WL, Buntinx F, Bouter LM, Montori VM, Vet HC, Windt DA. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Medical Research Methodology 2002;2:9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0027] Falck‐Ytter YT, Motschall E. New search filter for diagnostic studies: Ovid and PubMed versions not the same [2004]. available at http://www.bmj.com/content/328/7447/1040?tab=responses (accessed 25 April 2013).

[MR000022-bib-0028] Fielding AM, Powell A. Using Medline to achieve an evidence‐based approach to diagnostic clinical biochemistry. Annals of Clinical Biochemistry 2002;39(Pt 4):345‐50. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0029] Glanville J, Bayliss S, Booth A, Dundar Y, Fleeman ND, Foster L, et al. on behalf of the InterTASC Information Specialists' Subgroup. So many filters, so little time: the development of a search filter appraisal checklist. Journal of the Medical Library Association 2008;96(4):356‐61. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0030] Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR, Hedges Team. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. BMJ 2005;330(7501):1179‐84. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0031] Haynes RB, Kastner M, Wilczynski NL, Hedges Team. Developing optimal search strategies for detecting clinically sound and relevant causation studies in EMBASE. BMC Medical Information and Decision Making 2005;5:8. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0032] Horsley T, Dingwall O, Sampson M. Checking reference lists to find additional studies for systematic reviews. Cochrane Database of Systematic Reviews 2011, Issue 8. [DOI: 10.1002/14651858.MR000026.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0033] InterTASC Information Specialists' Sub‐Group (ISSG). The InterTASC Information Specialists' Sub‐Group Search Filter Resource: diagnostic studies. Available at http://www.york.ac.uk/inst/crd/intertasc/diag.htm. York: Centre for Reviews and Dissemination, (accessed 25th April 2013).

[MR000022-bib-0034] Lefebvre C, Manheimer E, Glanville J. Chapter 6: Searching for studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from http://www.cochrane‐handbook.org (accessed 25 April 2013).

[MR000022-bib-0035] US National Library of Medicine. Clinical Queries using Research Methodology Filters [updated Jan 2005]. Available from http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Clinical_Queries_Filters (accessed 25 April 2013).

[MR000022-bib-0036] North Thames. Diagnostic procedures. http://www.londonlinks.ac.uk/evidence_strategies/ovid_filters.htm#diagnostic. (No longer available).

[MR000022-bib-0037] Wolfer Kluer Health. Clinical queries in Ovid. available at: http://ovidsupport.custhelp.com/cgi‐bin/ovidsupport.cfg/php/enduser/std_adp.php?p_faqid=1599&;p_created=1087487498&p_sid=B86UCj8j&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PSZwX3Jvd19jbnQ9MTUsMTUmcF9wcm9kcz0wJnBfY2F0cz0wJnBfcHY9JnBfY3Y9JnBfc2VhcmNoX3R5cGU9YW5zd2Vycy5zZWFyY2hfbmwmcF9wYWdlPTEmcF9zZWFyY2hfdGV4dD1jbGluaWNhbCBxdWVyaWVz&p_li=&p_topview=1 First published 2004; updated 2010.

[MR000022-bib-0038] Wolters Kluwer Health. MEDLINE® 2013 Database Guide. available at http://ovidsp.tx.ovid.com/sp‐3.8.1a/ovidweb.cgi?&S=BLIMFPMFDFDDHIFFNCOKFBFBJDCBAA00&Database+Field+Guide=37 [2012] (accessed 25 April 2013).

[MR000022-bib-0039] Wolters Kluwer Health. Embase: Excerpta Medica Database Guide. http://ovidsp.tx.ovid.com/sp‐3.8.1a/ovidweb.cgi?&S=BLIMFPMFDFDDHIFFNCOKFBFBJDCBAA00&Database+Field+Guide=10 [2012] (accessed 25 April 2013).

[MR000022-bib-0040] Shipley MC. Evidence based filters for Ovid MEDLINE. http://www.urmc.rochester.edu/hslt/miner/digital_library/tip_sheets/OVID_eb_filters.pdf. Rochester: Edward G Miner Library, University of Rochester.

[MR000022-bib-0041] Miner Library Reference Librarians. Evidence based filters for Ovid MEDLINE. Miner Library, University of Rochester 2002.

[MR000022-bib-0042] Whiting P, Westwood M, Burke M, Sterne J, Harbord R, Glanville J. Can diagnostic filters offer similar sensitivity and a reduced number needed to read compared to searches based on index test and target condition? [abstract]. Methods for Evaluating Medical Tests. Symposium. 2008 Jul 24‐25.

[MR000022-bib-0043] Whiting P, Westwood M, Beynon R, Burke M, Sterne JA, Glanville J. Inclusion of methodological filters in searches for diagnostic accuracy studies misses relevant studies. Journal of Clinical Epidemiology 2011;64(6):602‐7. [MEDLINE: ] [DOI] [PubMed] [Google Scholar]

[MR000022-bib-0044] Wilczynski NL, Walker CJ, McKibbon KA, Haynes RB. Reasons for the loss of sensitivity and specificity of methodologic MeSH terms and textwords in MEDLINE. Proceedings ‐ the Annual Symposium on Computer Applications in Medical Care 1995:436‐40. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0045] Wilczynski NL, Haynes RB, Hedges Team. Developing optimal search strategies for detecting clinically sound causation studies in MEDLINE. AMIA ‐ Annual Symposium Proceedings/AMIA Symposium 2003:719‐23. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0046] Wilczynski NL, Haynes RB, Hedges Team. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey. BMC Medicine 2004;2(1):23. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0047] Wilczynski NL, Haynes RB. Optimal search strategies for detecting clinically sound prognostic studies in EMBASE: an analytic survey. Journal of the American Medical Informatics Association 2005;12(4):481‐5. [MEDLINE: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

[MR000022-bib-0048] Wilczynski NL, Haynes RB. Indexing of diagnosis accuracy studies in MEDLINE and EMBASE. AMIA ‐ Annual Symposium Proceedings/AMIA Symposium 2007:801‐5. [MEDLINE: ] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE

Rebecca Beynon

Mariska MG Leeflang

Steve McDonald

Anne Eisinga

Ruth L Mitchell

Penny Whiting

Julie M Glanville

Abstract

Background

Objectives

Search methods

Selection criteria

Data collection and analysis

Main results

Authors' conclusions

Plain language summary

Background

Objectives

Methods

Criteria for considering studies for this review

Types of studies

Types of data

Types of methods

Types of outcome measures

Primary outcomes

Secondary outcomes

Search methods for identification of studies

Electronic searches

Searching other resources

Data collection and analysis

Selection of studies

Data extraction and management

Assessment of risk of bias in included studies

Data synthesis

Results

Description of studies

1.

MEDLINE search filters

Description of development studies

1. Summary of study designs of MEDLINE filter development studies.

2. Study characteristics and methods of MEDLINE development studies.

3. Performance of diagnostic filters from MEDLINE development studies.

Method of identification of reference set records

Composition of reference set

Method of identification of search terms

Description of studies that evaluated published MEDLINE filters

4. Summary of study design characteristics of MEDLINE filter evaluation studies.

5. Study characteristics and methods of included MEDLINE filter evaluation studies.

Method of identification of reference set records

Composition of reference set

Description of evaluated filters

EMBASE search filters

Description of development studies

6. Summary of study design characteristics of EMBASE filter development studies.

7. Study characteristics and methods of EMBASE filter development studies.

Method of identification of reference set records

Composition of reference set

Method of identification of search terms

Description of studies that evaluated published EMBASE filters

8. Summary of study design characteristics of EMBASE filter evaluation studies.

9. Study characteristics and methods of studies evaluating EMBASE filters.

Method of identification of reference set records

Composition of reference set

Description of evaluated filters

Risk of bias in included studies

1. Use of systematic reviews to compile reference set search strategy

MEDLINE development and evaluation studies

EMBASE development and evaluation studies

2. Choice of gold standard records

MEDLINE development and evaluation studies

EMBASE development and evaluation studies

3. Validation of filters

MEDLINE development studies

EMBASE development studies

Effect of methods

1. Performance of MEDLINE filters as reported in development studies

2. Performance of evaluated MEDLINE filters

10. MEDLINE filters evaluated by two or more studies (values given in percentages).