. 2006 Feb;13(1):2–6. doi: 10.1007/s00534-005-1045-5

Table 1.

Categories of evidence (see footnote for explanation of terms).6 The evidence-based classification used at the Cochrane Library: Oxford Centre for Evidence-based Medicine, Levels of Evidence (May 2001) (http://www.cebm.net/levels_of_evidence.asp#levels)5 was used as a basis to evaluate evidence presented in each item of literature, and the quality of evidence for each parameter associated with the diagnosis and treatment of acute pancreatitis was determined

Level	Therapy/prevention, etiology/harm	Prognosis	Diagnosis	Differential diagnosis/symptom prevalence study	Economic and decision analyses
1a	SR (with homogeneity^*) of RCTs	SR (with homogeneity^*) of inception cohort studies; CDR^† validated in different populations	SR (with homogeneity^*) of Level 1 diagnostic studies; CDR^† with 1b studies from different clinical centers	SR (with homogeneity^*) of prospective cohort studies	SR (with homogeneity^*) of Level 1 economic studies
1b	Individual RCT (with narrow confidence Interval^†)	Individual inception cohort study with >80% follow-up; CDR^†validated in a single population	Validating^** cohort study with good^††† reference standards; or CDR^† tested within one clinical center	Prospective cohort study with good follow-up^****	Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multiway sensitivity analyses
1c	All or none^§	All or none case series	Absolute SpPins and SnNouts^††	All or none case series	Absolute better-value or worse-value analyses^††††
2a	SR (with homogeneity^*) of cohort studies	SR (with homogeneity^*) of either retrospective cohort studies or untreated control groups in RCTs	SR (with homogeneity^*) of Level >2 diagnostic studies	SR (with homogeneity^*) of 2b and better studies	SR (with homogeneity^*) of Level >2 economic studies
2b	Individual cohort study (including low-quality RCT; e.g., <80% follow-up)	Retrospective cohort study or follow-up of untreated control patients in an RCT; derivation of CDR^† or validated on split-sample^§§§ only	Exploratory^** cohort study with good^††† reference standards; CDR^† after derivation, or validated only on split-sample^§§§ or databases	Retrospective cohort study, or poor follow-up	Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multiway sensitivity analyses
2c	“Outcomes” research; ecological studies	“Outcomes” research		Ecological studies	Audit or outcomes research
3a	SR (with homogeneity^*) of case-control studies		SR (with homogeneity^*) of 3b and better studies	SR (with homogeneity^*) of 3b and better studies	SR (with homogeneity^*) of 3b and better studies
3b	Individual case-control study		Nonconsecutive study; or without consistently applied reference standards	Nonconsecutive cohort study, or very limited population	Analysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations
4	Case series (and poor-quality cohort and case-control studies^§§)	Case series (and poor-quality prognostic cohort studies^***)	Case control study, poor or nonindependent reference standard	Case series or superseded reference standards	Analysis with no sensitivity analysis
5	Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles”	Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles”	Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles”	Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles”	Expert opinion without explicit critical appraisal, or based on economic theory or “first principles”

Users can add a minus sign to denote the level that fails to provide a conclusive answer because of:

NOTE 1 EITHER a single result with a wide confidence interval (such that, for example, an ARR in an RCT is not statistically significant but whose confidence intervals fail to exclude clinically important benefit or harm)

NOTE 2 OR a systematic review with troublesome (and statistically significant) heterogeneity

NOTE 3 Such evidence is inconclusive, and therefore can only generate Grade D recommendations

SR, Systematic review; RCT, randomized controlled trial; ARR, absolute risk ratio

^* By “homogeneity,” the Publishing Committee means a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a minus sign at the end of their designated level

^† Clinical decision rule (these are algorithms or scoring systems that lead to a prognostic estimation or a diagnostic category)

^† See NOTE 2 for advice on how to understand, rate, and use trials or other studies with wide confidence intervals

^§ Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it

^§§ By “poor-quality cohort study,” the Publishing Committee means one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both exposed and nonexposed individuals, and/or failed to identify or appropriately control known confounders, and/or failed to carry out a sufficiently long and complete follow-up of patients. By “poor-quality case-control study,” the Publishing Committee means one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both cases and controls, and/or failed to identify or appropriately control known confounders

^§§§ Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into “derivation” and “validation” samples

^†† An “absolute SpPin” is a diagnostic finding whose specificity is so high that a positive result rules in the diagnosis. An “absolute SnNout” is a diagnostic finding whose sensitivity is so high that a negative result rules out the diagnosis

^††† “Good reference standards” are independent of the test, and are applied blindly or objectively to all patients. “Poor reference standards” are haphazardly applied, but are still independent of the test. Use of a nonindependent reference standard (where the “test” is included in the “reference,” or where the “testing” affects the “eference”) implies a Level 4 study

^†††† “Better-value treatments” are clearly as good but cheaper, or better at the same or reduced cost. “Worse-value treatments” are as good and more expensive or worse and equally or more expensive

^** “Validating studies” test the quality of a specific diagnostic test, based on prior evidence. An “exploratory study” collects information and trawls the data (e.g., using a regression analysis) to find which factors are “significant”

^*** By “poor-quality prognostic cohort study,” the Publishing Committee means a study in which sampling was biased in favor of patients who already had the target outcome, or the measurement of outcomes was accomplished in fewer than 80% of study patients, or outcomes were determined in an unblinded, nonobjective way, or there was no correction for confounding factors

^**** Good follow-up in a differential diagnosis study is more than 80%, with adequate time for alternative diagnoses to emerge (e.g., 1–6 months, acute; 1–5 years, chronic) “Good,” “better,” “bad,” and “worse” refer to the comparisons between treatments in terms of their clinical risks and benefits