Box 1.
Systematic review A study design that uses explicit, systematic methods to collect data from primary studies, critically appraises the data, and synthesizes the findings descriptively or quantitatively in order to address a clearly formulated research question [65, 68, 69]. Typically, a systematic review includes a clearly stated objective, pre-defined eligibility criteria for primary studies, a systematic search that attempts to identify all studies that meet the eligibility criteria, risk of bias assessments of the included primary studies, and a systematic presentation and synthesis of findings of the included studies [65]. Systematic reviews can provide high quality evidence to guide decision making in healthcare, owing to the trustworthiness of the findings derived through systematic approaches that minimize bias [70] Outcome domain Refers to what is being measured (e.g., fatigue, physical function, blood glucose, pain) [1, 2]. Other terms include construct, concept, latent trait, factor, attribute Outcome measurement instrument (OMI) Refers to how the outcome is being measured, i.e., the OMI used to measure the outcome domain. Different types of OMIs exist such as questionnaires or patient-reported outcome measures (PROMs) and their variations, clinical rating scales, performance-based tests, laboratory tests, scores obtained through a physical examination or observations of an image, or responses to single questions [1, 2]. An OMI consists of a set of components and phases, i.e., ‘equipment’, ‘preparatory actions’, ‘collection of raw data’, ‘data processing and storage’, and ‘assignment of the score’ [57]. A specific type of OMIs is clinical outcome assessments (COAs) [71], which specifically focus on outcomes related to clinical conditions, often emphasizing the patient’s experience and perspective Report A document with information about a particular study or a particular OMI. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information such as a manual for an OMI or the PROM itself [68]. A study report is a document with information about a particular study like a journal article or a preprint Record The title and/or abstract of a report indexed in a database or website. Records that refer to the same report (such as the same journal article) are “duplicates” [68] Study The empirical investigation of a measurement property in a specific population, with a specific aim, design and analysis Quality The technical concept ‘quality’ is used to address three different aspects defined by COSMIN, OMERACT, and GRADE: 1) quality of the OMI refers to the measurement properties; 2) quality of the study refers to the risk of bias; and 3) quality of the evidence refers to the certainty assessment [2, 5, 72] Measurement properties The quality aspects of an OMI, referring to the validity, reliability, and responsiveness of the instrument’s score [64]. Each measurement property requires its own study design and statistical methods for evaluation. Different definitions for measurement properties are being used. COSMIN has a taxonomy with consensus-based definitions for measurement properties [64]. Another term for measurement properties is psychometric properties Feasibility The ease of application and the availability of an OMI, e.g., completion time, costs, licensing, length of an OMI, ease of administration, etc. [5, 26]. Feasibility is not a measurement property, but is important when selecting an OMI [2] Interpretability The degree to which one can assign meaning to scores or change in scores of an OMI in particular contexts (e.g., if a patient has a score of 80, what does this mean?) [64]. Norm scores, minimal important change and minimal important difference are also relevant concepts related to interpretability. Like feasibility, interpretability is not a measurement property, but is important to interpret the scores of an OMI and when selecting an OMI [2] Measurement properties’ results The findings of a study on a measurement property. Measurement properties’ results have different formats, depending on the measurement property. For example, reliability results might be the estimate of the intraclass correlation coefficient (ICC), or structural validity results might be the factor loadings of items to their respective scales and the percentage of variance explained Measurement properties’ ratings The comparison of measurement properties’ results against quality criteria, to give a judgement (i.e., rating) about the results. For example, the ICC of an OMI might be 0.75; this is the result. A quality criterion might prescribe that the ICC should be >0.7. In this case the result (0.75) is thus rated to be sufficient Risk of bias Risk of bias refers to the potential that measurement properties’ results in primary studies systematically deviate from the truth due to methodological flaws in the design, conduct or analysis [68, 73]. Many tools have been developed to assess the risk of bias in primary studies. The COSMIN Risk of Bias checklist for PROMs was specifically developed to evaluate the risk of bias of primary studies on measurement properties [44]. It contains standards referring to design requirements and preferred statistical methods of primary studies on measurement properties, and is specifically intended for PROMs. The COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of OMIs can be used for any type of OMI [57] Synthesis Combining quantitative or qualitative results of two or more studies on the same measurement property and the same OMI. Results can be synthesized quantitatively or qualitatively. Meta-analysis is a statistical method to synthesize results. Although this can be done for some measurement properties (i.e., internal consistency, reliability, measurement error, construct validity, criterion validity, and responsiveness), it is not very common in systematic reviews of OMIs because the point estimates of the results are not used. Instead, the score obtained with an OMI is used. End-users therefore only need to know whether the result of a measurement property is sufficient or not. For some measurement properties it is not even possible to statistically synthesize the results by meta-analysis or pooling (i.e., content validity, structural validity, and cross-cultural validity/measurement invariance). In general, most often the robustness of the results is described (e.g., the found factor structure, the number of confirmed and unconfirmed hypotheses), or a range of the results is provided (e.g., the range of Cronbach’s alphas or ICCs) Certainty (or confidence) assessment Together with the synthesis, often an assessment of the certainty (or confidence) in the body of evidence is provided. Authors conduct such an assessment to reflect how certain (or confident) they are that the synthesized result is trustworthy. These assessments are often based on established criteria, which include the risk of bias, consistency of findings across studies, sample size, and directness of the result to the research question [2]. A common framework for the assessment of certainty (or confidence) is GRADE (Grading of Recommendations Assessment, Development, and Evaluation) [72]. A modified GRADE approach has been developed for communicating the certainty (or confidence) in systematic reviews of OMIs [2] OMI recommendations Systematic reviews of OMIs provide a comprehensive overview of the measurement properties of OMIs and support evidence-based recommendations for the selection of suitable OMIs for a particular use. Unlike systematic reviews of interventions, systematic reviews of OMIs often make recommendations about the suitability of OMIs for a particular use, although in some cases this might not be appropriate (e.g., if restricted by the funder). Making recommendations also facilitates much needed standardization in use of OMIs, although their quality and score interpretation might be context dependent. Making recommendations essentially involves conducting a synthesis at the level of the OMI, across different measurement properties, taking feasibility and interpretability into account as well. Various methods and tools for OMI recommendation exist (e.g., from COSMIN, OMERACT and others) [2, 74, 75] |