Table 2.
Level | Therapy/prevention, aetiology/harm | Prognosis | Diagnosis | Differential diagnosis/symptom prevalence study | Economic and decision analyses |
---|---|---|---|---|---|
1a | SR (with homogeneitya) of RCTs | SR (with homogeneitya) of inception cohort studies; CDRb validated in different populations | SR (with homogeneitya) of level 1 diagnostic studies; CDRb with 1b studies from different clinical centers | SR (with homogeneitya) of prospective cohort studies | SR (with homogeneitya) of level 1 economic studies |
1b | Individual RCT (with narrow confidence intervalc) | Individual inception cohort study with >80% follow-up; CDRb validated in a single population | Validatingd cohort study with goode reference standards; or CDRb tested within one clinical center | Prospective cohort study with good follow-upf | Analysis based on clinically sensible costs or alternatives systematic review(s) of the evidence; and including including multi-way sensitivity analyses |
1c | All or noneg | All or none case-series | Absolute SpPins and SnNoutsh | All or none case-series | Absolute better-value or worse-value analysesi |
2a | SR (with homogeneitya) of cohort studies | SR (with homogeneitya) of either retrospective cohort studies or untreated control groups in RCTs | SR (with homogeneitya) of level >2 diagnostic studies | SR (with homogeneitya) of 2b and better studies | SR (with homogeneitya) of level >2 economic studies |
2b | Individual cohort study (including low-quality RCT; e.g., <80% follow-up) | Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDRb or validated on split-samplej only | Exploratoryd cohort study with goode reference standards; CDRb after derivation, or validated only on split-samplej or databases | Retrospective cohort study, or poor follow-up | Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses |
32c | “Outcomes” research; ecological studies | “Outcomes” research | Ecological studies | Audit or outcomes research | |
3a | SR (with homogeneitya) of case-control studies | SR (with homogeneitya) of 3b and better studies | SR (with homogeneitya) of 3b and better studies | SR (with homogeneitya) of 3b and better studies | |
3b | Individual case-control study | Non-consecutive study; or study without consistently applied reference standards | Non-consecutive cohort study, or very limited population | Analysis based on limited alternatives or costs, poor-quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations | |
4 | Case-series (and poor-quality cohort and case-control studiesk) | Case-series (and poor-quality prognostic cohort studiesl) | Case-control study, poor or non-independent reference standard | Case-series or superseded reference standards | Analysis with no sensitivity analysis |
5 | Expert opi⊁on without explicit critical appraisal, or based on physiology, bench research, or “first principles” | Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles” | Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles” | Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles” | Expert opinion without explicit critical appraisal, or based on economic theory or “first principles” |
Users can add a minus-sign “2” to denote the level that fails to provide a conclusive answer because of: EITHER a single result with a wide confidence interval (such that, for example, an ARR in an RCT is not statistically significant but whose confidence intervals fail to exclude clinically important benefit or harm) (Note #1), OR a systematic review with troublesome (and statistically significant) heterogeneity (Note #2). Such evidence is inconclusive, and therefore can only generate grade D recommendations (Note #3)
SR, Systematic review; RCT, Randomized controlled trial; ARR, absolute risk reduction
a By homogeneity, we mean a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a “–” at the end of their designated level
b Clinical decision rule. These are algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category
c See note #2 for advice on how to understand, rate, and use trials or other studies with wide confidence intervals
d Validating studies test the quality of a specific diagnostic test, based on prior evidence. An exploratory study collects information and trawls the data (e.g., using a regression analysis) to find which factors are “significant”
e Good reference standards are independent of the test, and are applied blindly or objectively to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a nonindependent reference standard (where the “test” is included in the “reference”, or where the “testing” affects the “reference”) implies a level 4 study
f Good follow-up in a differential diagnosis study is ≫80%, with adequate time for alternative diagnoses to emerge (e.g., 1–6 months, acute; 1–5, years, chronic)
g Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it
h An “absolute SpPin” is a diagnostic finding whose specificity is so high that a positive result rules-in the diagnosis. An “absolute SnNout” is a diagnostic finding whose sensitivity is so high that a negative result rules-out the diagnosis
i Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and equally or more expensive
j Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into “derivation” and “validation” samples
k By poor-quality cohort study, we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and nonexposed individuals, and/or failed to identify or appropriately control known confounders, and/or failed to carry out a sufficiently long and complete follow-up of patients. By poor-quality case-control study, we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or failed to identify or appropriately control known confounders
l By poor-quality prognostic cohort study, we mean one in which sampling was biased in favor of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, nonobjective way, or there was no correction for confounding factors Good, better, bad, and worse refer to the comparisons between treatments in terms of their clinical risks and benefits