Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 29.
Published in final edited form as: J Rheumatol. 2014 Apr 1;41(5):1005–1010. doi: 10.3899/jrheum.131311

Updating the OMERACT Filter: Discrimination and Feasibility

George Wells 1, Peter Tugwell 2, Maarten Boers 3, John Richard Kirwan 4, Dorcas Beaton 5, Clifton O Bingham III 6, Annelies Boonen 7, Peter Brooks 8, Philip G Conaghan 9, Maria-Antonietta D'Agostino 10, Maxime Dougados 11, Daniel E Furst 12, Laure Gossec 13, Francis Guillemin 14, P Helliwell 15, Sarah Hewlett 16, Tore K Kvien 17, Robert BM Landewé 18, Lyn March 19, Philip J Mease 20, Mikkel Ostergaard 21, Lee Simon 22, Jasvinder A Singh 23, Vibeke Strand 24, Désirée van der Heijde 25, E Choy 26
PMCID: PMC4212640  NIHMSID: NIHMS636987  PMID: 24692522

Introduction

Discrimination

The “Discrimination” part of the OMERACT Filter asks the question whether or not the measure discriminates between situations that are of interest. The situations can be states at one time (for classification or prognosis) or states at different times (to measure change). The word captures the issues of reliability and sensitivity to change (responsiveness). The “Discrimination” part of the Filter has been helpful but was agreed on primarily by consensus of OMERACT participants rather than through explicit evidence-based guidelines. In Filter 2.0 we want to improve this definition and provide specific guidance and advice to participants.

Various conceptual models have been considered in OMERACT for discrimination. For example, a classification system for studies of discrimination was designed to help organize the specific purpose of such studies, and identify those with the potential to provide information on minimal clinically important difference (MCID). A 3 dimensional cube was developed (Beaton 2001); and a simplified version of the cube is provided in Figure 1 into which studies of discrimination can be categorized based on their evaluation of 3 attributes, namely:

  • setting which identifies whether the study results were targeted
    • to individuals or
    • to groups;
  • comparison which identifies whether discrimination was considered as
    • differences between individuals or groups at one point in time,
    • change within individuals or groups over time, or
    • differences in the change within individuals or groups over time;
  • extent of difference which identifies whether the difference being assessed is
    • the minimum detectable,
    • the minimum relevant or important, or
    • a higher and possibly specified level of importance.

Figure 1.

Figure 1

The simplified cube of discrimination

This classification system helps to focus attention on the specific type of discrimination of interest in specific assessment circumstances. It reinforces an understanding that an instrument which is able to discriminate between states as represented by one cell within the cube will not necessarily be able to discriminate between states as represented by another cell. In particular, it is easiest for an instrument to show discrimination in the bottom, left, front corner and most difficult in the top, right, back corner of the cube.

In the context of clinical trials the setting is usually treatment groups and the comparison is change within groups - more particularly differences between changes within groups. The sensitivity required may depend on the stakeholder (e.g. physician, patient or policy maker; group or individual) and the intended use (e.g. clinical service design or research exploration). Various measures have been proposed for considering important changes and states including the MCID (Jaeschke 1989) which is not without its critics (Kirwan 2001, Rennard 2005, Gatchel 2010), and more recently the patient acceptable symptom state (PASS) (Dougados 2007). Most have been targeted to the individual (patient) level. Other novel methodologies may be informative such as identifying a system of levels of improvements that, for example, patients are striving to achieve.

As part of the research agenda, various paradigms for considering discrimination that integrate the different measures, perspectives and purposes were explored. Similarly, while different methods used to define MCID or clinically important changes or important differences were previously considered and categorized according to a version of the ‘cube’ classification system (Wells 2001), literature on other methods within the derived paradigm will be considered. Also, OMERACT working groups and patient groups were consulted to identify different ways that results can be viewed and discrimination assessed by physicians and patients in their area of interest.

Feasibility

Feasibility in the OMERACT filter encompasses the practical considerations of using an instrument, including its ease of use, time to complete, monetary costs and interpretability of the question(s) included in the instrument. Considerations include cost of equipment, training for observers, burden/difficulty for the patient and, in the case of patient self-report, the perceived length, wording of questions, reading level and ease of response options (clarity, ease or retrieval of information, ease of responding on that scale).

Feinstein coined the term “sensibility” to reflect an enlightened common sense appraisal of the instrument under consideration. (Feinstein 1987) He is credited with encouraging the clinical research community to accept that this is as important as some statistically based measurement property. In Feinstein’s framework, feasibility is a key concern addressed by six questions:

  • Is it easy to understand?

  • Are the items, their scaling and the aggregate score simple and easy to use?

  • Does the data collection sheet conform to basic principles of questionnaire design, are there instructions and definitions provided and are procedures standardized?

  • Is it acceptable to the patient/participant and to the observer?

  • Is the format for administration appropriate for your purpose or does it require special tests or special skills?

  • Is the administration time suitable?

Auger reviewed potential instruments to conduct this “common sense appraisal” and suggested the following main domains for assessment of feasibility – which were termed “applicability”: respondent burden, examiner burden, distributional issues and format issues. (Auger 2006) The questions asked within Feinstein’s feasibility assessment are consistent with the domains and subdomains described by Auger (Figure 2). Note that OMERACT has traditionally used the term ‘applicable’ to a measurement instrument that has passed all the Filter requirements of Truth, Discrimination and Feasibility (Boers 1998).

Figure 2.

Figure 2

Auger’s domains of applicability

Feasibility is most often appraised by a researcher or clinician who is selecting the instrument. And it is the most frequently and quickly endorsed step in the OMERACT filter. In Filter 2.0 we are seeking a more thoughtful reflection on each of the components, and consideration of a combined point of view from the researcher/clinician and patients. We proposed a merger of Feinstein and Auger’s points.

Breakout discussion groups

Following a plenary presentation of the topics reviewed above, conference participants were divided into 5 pairs of breakout (discussion) groups. Examples of the conduct of some discrimination exercises that had already taken place in different areas of OMERACT activity were presented to each pair of breakout groups. These were taken from work on gout, ultrasound, psoriatic arthritis, MCID and worker productivity (Table 1, column 1). They served to provide concrete examples of the discrimination issues being addressed, and to help the discussions focus on the main questions of discrimination and feasibility for each breakout group.

Table 1.

Summary of the report back and discussion from the breakout groups

Discrimination topic Specific discrimination topic How was the discrimination topic addressed? Additional feasibility topic*
Determining differences when only non-inferiority trial evidence available What general methods and procedures could be considered for determining differences (a) between groups; (b) response in a patient when only non-inferiority head-to-head trial data are available?
  • Assess assay sensitivity of treatment to active control and constancy of active control to placebo

  • Use adjusted indirect methods for deriving indirect treatment comparison of treatment to placebo

  • Use indirect estimates as one uses direct estimates for minimum detectable, minimum important and major differences

Determining minimum detectable, minimum important and major differences What general methods and procedures could be considered for determining minimum detectable, minimum important and major differences (a) between groups; (b) response in a patient?
  • Consider the ‘cube of discrimination’ for looking at changes within patients and at differences between groups

  • Anchor vs distribution based methods

  • Consider contextual factors

  • Importance of scaling method used and signal-to-noise ratio

  • Determinations for improvement may not be same as for worsening

  • Need for patient involvement

  • Format compatibility needs to be more explicit, does there need to be testing of paper vs computer formats

  • Should there be a feasibility ‘score’

  • Capturing longitudinal and frequency data

  • Capturing information by paper or computer in different sites globally

Determining the sensitivity to measure What different ‘situations of interest’ could be considered for assessing discrimination?
  • Consider in terms of providing examples for study designs and clinical practice for each situation in ‘cube of discrimination’

  • Need for two RCTs questioned

  • Part of ‘development loop’ to be considered from the beginning to the end of the process in developing an instrument

  • Should be piloted

Determining a responder index What general methods and procedures could be considered for determining a responder index?
  • Index must be sensitive to both improvement and worsening

  • Standardization of the technique is needed in order to assess change

  • Need to consider feasibility early in process; integral to the design of the instrument itself

Practical checklist for discrimination What items constitute a practical checklist for discrimination?
  • Filter 1 presentation best way but more examples

  • Basic principle for responsiveness - consider a clinical trial where there is a known treatment effect and then assess the change in the outcome

  • Look for responsiveness first at group level then at individual level

  • Anchor treatment effect to a PRO if want it to be understandable to patient

  • COSMIN was not discussed in great detail and this must be more closely considered

  • Role of effect size important

*

How can the feasibility of a measure be made taking into consideration aspects such as cost, burden and interpretability?

Report back and plenary discussion

The summary of the discussions from the breakout groups is provided in Table 1.

Discrimination

Two groups deliberated on general methods and procedures for assessing discrimination when only non-inferiority head-to-head trial data are available. The following points were noted. If a current standard treatment is effective then placebo controlled trials may not be possible, since they are likely unethical. New treatments can then only be compared with active treatments so there will be no comparison of the new treatment against placebo and hence no measure of the ‘actual’ effect of the new treatment. If superiority of the new treatment is not anticipated, but the new treatment may be safer, cheaper and/or easier to administer, then a head-to-head non-inferiority trial could be considered. A non-inferiority trial is designed to demonstrate the efficacy of a new treatment by showing that it is not less efficacious than the active control (standard treatment) by more than a specified margin, known as the non-inferiority margin. An important fact is that a well-designed and properly conducted non-inferiority trial that correctly demonstrates the treatments to be similar cannot be distinguished in itself from a poorly executed trial that fails to find a true difference. The ability of a trial to demonstrate a difference between treatments if such a difference truly exists is known as ‘assay sensitivity’. A non-inferiority trial that finds the effects of the treatments to be similar has not demonstrated assay sensitivity, and must rely on an assumption of assay sensitivity on the basis of information external to the trial. Use of past placebo-controlled trials may accomplish this and we must have available historical data in which it has been established that the standard treatment is superior to placebo. Further, we must have constancy, namely, that the historical difference between the standard and placebo is assumed to hold in the setting of the new trial if a placebo control had been used. How to use information from these trials to determine pertinent differences between groups and within patients must be identified. We would now have available direct evidence comparing the new treatment to standard and the standard to placebo, and using the standard as the common linking treatment we can consider the indirect evidence of the new treatment to placebo on which we could base these differences. In making this assessment, in addition to assay sensitivity and constancy, an adjusted indirect treatment comparison method must be used in which the comparison of the treatments of interest is adjusted by results of their direct comparison with the standard group, and so partially using the strength of the randomized controlled trial (RCT). In the simplest, but widely applicable setting, the method by Bucher (Bucher 1997), and generalized by Wells (Wells 2007), is one such method.

Two breakout groups considered what general methods and procedures could be considered for determining minimum detectable, minimum relevant (important) and major differences. The breakout groups reviewed the ‘discrimination cube‘, looking at changes within patients and at differences between groups. Considering the MCID, several issues were raised and discussed. More consideration and explanation on whether the MCID applies only to patient reported outcome (PRO) measures or whether it also applies to composite measures is needed. Can physicians determine what an MCID is for an objective measure such as the erythrocyte sedimentation rate (ESR)? The scaling method used (e.g. NRS versus Likert scaling) may change the signal-to-noise ratio; as an example, the MCID established using the anchor based method and distribution based method for the Health Assessment Questionnaire (HAQ) in psoriatic arthritis (PsA) with etanercept was greater than that for rheumatoid arthritis (RA), which may be a problem when considering a nonlinear score such as the HAQ. The MCID calculated for improvement may not be the same as the MCID for worsening; in this regard, it has been found that the MCID for improvement is more than that for deterioration for health-related quality of life in systemic lupus erythematosus (SLE). (Strand 2005) Patient involvement in the determination of MCID was raised, noting that the choice of anchors can determine the MCID and there may be importance in obtaining patient input for determining the appropriate question. And it may make a difference if asking about the state you would be comfortable in if it is given as a global assessment, PASS or the determination of the amount of change. Also the anchor question may be dependent on the study design or the primary outcome, and for a composite measure examining multiple aspects of disease (e.g. skin/joint) the MCID may be different for different areas involved. Finally, the determination of MCID may be dependent on contextual factors such as the initial disease state, the disease experience (including duration and coping mechanisms) which could lead to a response shift in the MCID as well as expectations for treatment.

Two other breakout groups considered what different ‘situations of interest’ could be considered for assessing discrimination. The difficulty in defining ‘situations of interest’ may be most usefully considered in terms of providing examples for RCTs, longitudinal observational studies and clinical practice for each situation in the ‘cube of discrimination’. In particular, OMERACT has required information from two RCTs before endorsing a measure or a responder index, and the need for two RCTs was questioned. There may be a situation where it is not possible to obtain results from two RCTs or there is a negative trial with no difference in outcome between groups.

Another two breakout groups deliberated on what general methods and procedures could be considered for determining a responder index. A responder index is a combination of a series of indicators, each a threshold value on a measurement instrument. ACR20 and EULAR response criteria are reasonable examples of responder indices. Such an index should ideally be discriminative in situations of both improvement and worsening.

The final pair of breakout groups each considered what items constitute a practical checklist for discrimination. While noting the COnsensus-based standards for the selection of Health Measurement INstruments (COSMIN) checklist (Mokkink 2010), discussion on this was limited because necessary information was lacking and time was too short. One breakout group noted that discrimination as presented in Filter 1 (Boers 1998) appears to be the best way to proceed but there is a need for more examples. To test responsiveness, the group believed that one should consider a RCT where there is a known treatment effect and then assess the change in the outcome of interest. Further, one should look for responsiveness first at group level, and then at individual level. The treatment effect should be anchored to a patient reported outcome for it to be best understood by patients. The discussion on diminishing the role of effect size in assessing discrimination was not resolved.

Feasibility

Three pairs of breakout groups also deliberated on how the feasibility of a measure or group of measures could be assessed, taking into consideration aspects such as cost, burden and interpretability. Two groups felt feasibility should be considered early, as part of the ‘development loop’ from the beginning to the end of the process in developing an instrument, and that pilot testing for feasibility should be conducted. The issue of capturing information by paper or computer in different sites globally was raised as an important issue related to feasibility. Longitudinal data capture and frequency of data collection can be burdensome issues related to feasibility and, for that matter, to validity. The possibility of developing a feasibility ‘score’ was raised.

Summary and conclusions

This OMERACT session was designed to evaluate key aspects of discrimination and feasibility proposed for the Filter 2.0 framework which have been discussed prior to the meeting and/or arisen as issues in using the discrimination and feasibility descriptions of the original filter over the years. Using specific topics on discrimination and a general question on feasibility raised in questions, OMERACT 11 participants were able to probe the theoretical and practical implications of the framework, and to examine areas of strength and weakness. Though specific aspects and issues were raised regarding the various topics related to discrimination and feasibility, providing guidance and research agenda topics, there was general agreement that the more explicit explanations of discrimination and feasibility to be included in Filter 2.0 would be a help to developers of core outcome measures.

Contributor Information

George Wells, Cardiovascular Research Methods Centre, Department of Epidemiology and Community Medicine University of Ottawa.

Peter Tugwell, Department of Medicine, University of Ottawa, Canada.

Maarten Boers, Departments of Epidemiology and Biostatistics, and Rheumatology, VU University Medical Center, PK 6Z 185, PO Box 7057, 1007 MB, Amsterdam, Netherlands.

John Richard Kirwan, University of Bristol Academic Rheumatology Unit, Bristol Royal Infirmary, Bristol, UK.

Dorcas Beaton, Department of Occupational Sciences and Occupational Therapy, Institute for Health Policy Management and Evaluation, University of Toronto, Toronto, Canada.

Clifton O. Bingham, III, Division of Rheumatology, Johns Hopkins University, Baltimore, MD, US.

Annelies Boonen, Department of Internal Medicine, Division of Rheumatology, Maastricht University Medical Center and Caphri Research Institute, Maastricht University, The Netherlands.

Peter Brooks, Australian Health Workforce Institute (AHWI), School of Population Health, University of Melbourne, Australia.

Philip G. Conaghan, Division of Musculoskeletal Disease, University of Leeds, & NIHR Leeds Musculoskeletal Biomedical Research Unit, UK.

Maria-Antonietta D'Agostino, Versailles-Saint Quentin En Yvelines University, Department of Rheumatology, Ambroise Paré Hospital, APHP, Boulogne-Billancourt, France.

Maxime Dougados, Paris-Descartes University, Medicine Faculty, APHP, Cochin Hospital, Rheumatology B Dept., PARIS, France.

Daniel E Furst, Department of Rheumatology, Geffen School of Medicine at the University of California in Los Angeles.

Laure Gossec, Université Pierre et Marie Curie (UPMC) - Paris 6, GRC-UMPC 08 (EEMOIS); AP-HP Pitié Salpêtrière Hospital, Department of Rheumatology, Paris, France.

Francis Guillemin, Université de Lorraine, Université Paris Descartes, EA 4360 APEMAC, Nancy, France.

P. Helliwell, University of Leeds Section of Musculoskeletal Disease, LIMM Chapel Allerton Hospital, Chapel town Road Leeds West Yorkshire LS7 4SA United Kingdom.

Sarah Hewlett, University of the West of England, Academic Rheumatology Unit, Bristol Royal Infirmary, UK.

Tore K. Kvien, Dept. of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway

Robert B.M. Landewé, Department of Clinical Immunology & Rheumatology, Academic Medical Center, University of Amsterdam & Atrium Medical Center Heerlen, the Netherlands.

Lyn March, Institute of Bone and Joint Research and Conjoint Professor, Sydney Medical School and School of Public Health, University of Sydney, Senior Staff Specialist, Department of Rheumatology Royal North Shore, St Leonards, NSW, Australia.

Philip J. Mease, Seattle Rheumatology Associates, Chief, Swedish Medical Center Rheumatology Research Division, Clinical Professor, University of Washington School of Medicine, Seattle, Washington, USA.

Mikkel Ostergaard, Department of Rheumatology, Copenhagen University Hospital at Glostrup, Copenhagen, Denmark.

Lee Simon, Cambridge, MA 02138, USA.

Jasvinder A. Singh, Division of Immunology/Rheumatology, University of Alabama at Birmingham and Birmingham VA Medical center, Birmingham, Alabama, USA.

Vibeke Strand, Division of Immunology/Rheumatology, Stanford University School of Medicine, Palo Alto, CA, USA.

Désirée van der Heijde, Department of Rheumatology, Leiden University Medical Center, Leiden, Netherlands.

E Choy, Section of Rheumatology, Cardiff University School of Medicine, Cardiff, UK.

References

  1. Wells G, Beaton D, Shea B, Boers M, Simon L, Strand V, Brooks P, Tugwell P. Minimal clinically important differences: review of methods. J Rheumatol. 2001 Feb;28(2):406–412. [PubMed] [Google Scholar]
  2. Beaton DE, Bombardier C, Katz JN, Wright JG, Wells G, Boers M, Strand V, Shea B. Looking for important change/differences in studies of responsiveness. OMERACT MCID Working Group. Outcome Measures in Rheumatology. Minimal Clinically Important Difference. J Rheumatol. 2001 Feb;28(2):400–405. [PubMed] [Google Scholar]
  3. Dougados M, Moore A, Yu S, Gitton X. Evaluation of the Patient Acceptable Symptom State in a Pooled Analysis of two Multicentre, Randomised, Double-blind, Placebo-controlled Studies Evaluating Lumiracoxib and Celecoxib in Patients with Osteoarthritis. Arthritis Research & Therapy. 2007;9:R11. doi: 10.1186/ar2118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Jaeschke R, Singer J, Guyatt GH. Ascertaining the minimal clinically important difference. Cont Clin Trials. 1989;10:407–415. doi: 10.1016/0197-2456(89)90005-6. [DOI] [PubMed] [Google Scholar]
  5. Kirwan JR. Minimum Clinically Important Difference: The Crock of Gold at the End of the Rainbow. Rheumatol. 2001 Feb;28(2):439–444. [PubMed] [Google Scholar]
  6. Rennard SI. Minimal clinically important difference, clinical perspective: an opinion. COPD. 2005 Mar;2(1):51–55. doi: 10.1081/copd-200050641. [DOI] [PubMed] [Google Scholar]
  7. Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research. Curr Opin Rheumatol. 2002 Mar;14(2):109–114. doi: 10.1097/00002281-200203000-00006. [DOI] [PubMed] [Google Scholar]
  8. Gatchel RJ, Mayer TG. Testing minimal clinically important difference: consensus or conundrum? Spine J. 2010 Apr;10(4):321–327. doi: 10.1016/j.spinee.2009.10.015. [DOI] [PubMed] [Google Scholar]
  9. Scientific Advisory Committee of the Medical Outcomes Trust. Assessing health status and quality-of-life instruments: Attributes and review criteria. Quality of Life Research. 2002;11(3):193–205. doi: 10.1023/a:1015291021312. [DOI] [PubMed] [Google Scholar]
  10. Rowe BH, Oxman AD. An Assessment of the Sensibility of a Quality of Life Instrument. Am J Emergency Medicine. 1993;11(4):374–380. doi: 10.1016/0735-6757(93)90171-7. [DOI] [PubMed] [Google Scholar]
  11. Feinstein AR. Clinimetrics. New Haven, CT: Yale University Press; 1987. [Google Scholar]
  12. Auger C, Demers L, Swaine B. Making sense of pragmatic criteria for the selection of geriatric rehabilitation measurement tools. Archives of Gerontology and Geriatrics. 2006;43(1):65–83. doi: 10.1016/j.archger.2005.09.004. [DOI] [PubMed] [Google Scholar]
  13. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The Results of Direct and Indirect Treatment Comparisons in Meta-Analysis of Randomized Controlled Trials. J Clin Epidemiol. 1997;50:683–691. doi: 10.1016/s0895-4356(97)00049-8. [DOI] [PubMed] [Google Scholar]
  14. Wells G, Sultan S, Chen L, Coyle D. Indirect Evidence: Indirect Treatment Comparisons in Meta-Analysis. Canadian Agency for Drugs and Technologie in Heath (CADTH) Health Technology Report. 2007 [Google Scholar]
  15. Strand V, Crawford B. Improvement in health-related quality of life in patients with SLE following sustained reductions in anti-dsDNA antibodies. Expert Rev. Pharmacoeconomics Outcomes Res. 2005;(3):317–326. doi: 10.1586/14737167.5.3.317. [DOI] [PubMed] [Google Scholar]
  16. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HCW. The COSMIN Checklist for Assessing the Methodological Quality of Studies on Measurement Properties of Health Status Measurement Instruments: An International Delphi Study. Qual Life Res. 2010 May;19(4):539–549. doi: 10.1007/s11136-010-9606-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Boers M, Brooks P, Strand V, Tugwell P. The OMERACT Filter for outcome measures in rheumatology. J Rheumatol. 1998;25:198–199. [PubMed] [Google Scholar]

RESOURCES