Skip to main content
. 2019 Aug 20;19:178. doi: 10.1186/s12874-019-0811-z

Table 3.

Strength-of-evidence (SOE) rating tools

Name of SOE method, year Audience and Purpose for Evaluation Number of levels of SOE Definition of the highest level of SOE Placement of prospective cohort studies in the framework of SOE
Tools developed by major agencies, for application in a variety of domains
Grading of Recommendations, Assessment, Development and Evaluation (GRADE), 2004 (35) from Cochrane Collaboration

Audience: Users of systematically developed clinical practice guidelines and recommendations (e.g., clinicians, patients, policymakers)

Purpose: To provide a systematic and explicit approach to making judgments about the quality of evidence and the strength of recommendations

4 levels:

- High

- Moderate

- Low

- Very low

Randomized trials begin as high quality of evidence and observational studies as low quality of evidence. Randomized trials remain high if they provide:

• Direct evidence without important study limitations

• Low imprecision (i.e., large number participants and/or higher number of events with small confidence intervals), and

• Low publication bias

Observational studies without special strengths constitute low quality evidence, though study characteristics can increase or decrease a study’s starting quality. The following strengths can increase the SOE rating from observational studies:

• Strong evidence of association—significant relative risk (RR) > 2 (< or 0.5) based on consistent evidence from ≥2 observational studies, with no plausible confounders (+ 1), or

• Very strong evidence of association—significant RR > 5 (< or 0.2) based on direct evidence with no major threats to validity (+ 2);

• Evidence of a dose response gradient (+ 1);

• Presence of all plausible residual confounding would have reduced the observed effect (+ 1)

Note: Rigorous observational studies provide stronger evidence than uncontrolled case series.

Community Preventive Services Task Force (CPSTF), 2000 (47) No specific titled tool.

Audience: Community interventionists and clinical practitioners who need effectiveness recommendations for various treatments

Purpose: To develop evidence-based, clinically effective recommendations for community-based interventions, various clinical treatments, and population-based interventions

3 levels:

- Strong

- Sufficient

- Insufficient

3 possible paths to a “Strong” ratinga:

• ≥2 studies with “good” execution, “greatest” design suitability, and consistent effect sizes of “sufficient” size

• ≥5 studies with “good” execution, “greatest or moderate” design suitability, and consistent effect sizes of “sufficient” size

• ≥5 studies with “good or fair” execution, “greatest” design suitability, and consistent effect sizes of “sufficient” size

It is possible for a prospective cohort study to fulfill the requirements for the “Greatest” rating.

Specific study designs are not rigidly placed within the framework; the suitability for answering the research question is assessed in reference to potential threats to validity.

US Preventive Services Task Force (USPST), 2012 (48)

No specific titled tool.

Audience:

Primary: primary care clinicians

Secondary: consumer organizations, federal agencies, and other stakeholders involved in primary care delivery

Purpose: To develop evidence-based recommendations about clinical preventive services and health promotion and evidence-based practice to improve the health of Americans

5 levels:

- A: High certainty of substantial net benefit

- B: High certainty of moderate net benefit or moderate certainty of moderate to substantial net benefit

- C: Moderate certainty net benefit is small

- D: Recommends against service, no net benefit or harm outweighs benefits

- I: Insufficient evidence

• > 1 well-designed study

• Consistent study results

• Conducted in representative primary-care populations

• Unlikely to be strongly affected by results of future studies

Prospective cohort studies and other specific study designs are not directly mentioned in this method.

The highest level of evidence is described as coming from “... well-conducted studies in representative, primary care populations… [to]… assess the effects of preventive service on health outcomes...”

US Food and Drug Administration assessment of health claims for food products, 2003 (36)

No specific titled tool.

Audience: Consumers of products with authorized or qualified health claimsb

Purpose: To systematically evaluate the SOE for a proposed health claim,b including both authorized and qualified health claims

2 levels:

- (1): Authorized health claim (has significant scientific agreement among qualified experts)

- (2): Qualified health claims- weaker scientific evidence must be accompanied by a disclaimer or be qualified in their wording (e.g., limited, very little, or highly uncertain scientific evidence)

• Studies with overall high methodologic quality rating

• Results from intervention studies (as compared to observational studies) provide stronger evidence

• Larger number of studies and sample sizes

• Body of scientific evidence supports a health claim relationship for the US population or the target subgroup

• Study results supporting the proposed claim have been replicated

• Overall consistency in the total body of evidence showing a beneficial relationship

Observational studies:

• Cannot be used to rule out the findings from well done intervention studies

• Only included when findings are consistent with several RCTs

• Any number of observational studies are trumped by several consistent RCTs

• Hierarchy of evidence: Cohort design >nested case-control or case-cohort studies > case-control studies > cross-sectional studies > ecological studies and case reports

American College of Cardiology / American Heart Association

Task Force on Practice Guidelines Levels of Evidence, 2005 (54)

Audience: Clinicians and researchers with an interest in cardiovascular health

Purpose: To summarize SOE for the purpose of assigning classes of clinical practice recommendations

3 levels:

- A: Data derived from multiple randomized clinical trials (RCTs) or meta-analyses.

- B: Data derived from a single RCT or non-randomized studies.

- C: Consensus opinion of experts, case studies, or standard of care

Multiple RCTs or meta-analyses of RCTs Prospective cohort studies are not referenced in this method.
National Evidence Library Grading Rubric, 2015 (49)

Audience:

Primary: US Dietary Guidelines Committee Secondary: Health professionals and the public who read the Dietary Guidelines for Advisory Committee Report

Purpose: To summarize the SOE to make conclusion statements possible to inform policy (e.g., informing the Dietary Guidelines)

4 levels:*

- Grade I: Strong

- Grade II: Moderate

- Grade III: Limited

- Grade IV: Grade Not Assignable

*Grading based on 5 elements: risk of bias; quantity of studies; consistency of findings; impact (directness of studied outcomes and magnitude of effect); generalizability to the US population of interest

• Bias - Studies of strong design free from design flaws, bias and execution problems

• Quantity - Several good quality studies; large number of studies with sufficiently large sample size for adequate statistical power

• Consistency - Findings generally consistent in direction, effect size or degree of association, and statistical significance with very minor exceptions

• Impact - Studied outcome relates directly to the question and effect size is clinically meaningful

• Generalizability - Studied populations, intervention and outcomes are free from serious doubts about generalizability

Prospective cohort studies are not directly mentioned in this method.

The “risk of bias” component of the rubric mentions “studies of strong design” and “studies of weaker design for answering the question” but does not define them further.

Evidence Analysis Library® Methodology and Process Evidence Grading System from the Academy of Nutrition and Dietetics, 2016 (50)

Audience: Dietitians, clinicians, and researchers

Purpose: To summarize the SOE for the purpose of making dietary recommendations

5 levels*:

- I: Good/Strong

- II: Fair

- III: Limited/Weak

- IV: Expert Opinion Only

- V: Grade Not

Assignable

* Levels based on quality, consistency, quantity, clinical impact, and generalizability

• Quality: Strong study design for question; free from design flaws, bias and execution problems

• Consistency: Findings generally consistent in direction and size of effect or degree of association, and statistical significance with minor exceptions

• Quantity: ≥1 good quality studies with large sample sizes; studies with negative results have sufficiently large sample size for adequate statistical power Clinical impact: Studied outcome relates directly to the question; size of effect is clinically meaningful; large, statistically significant difference

• Generalizability: Studied populations, interventions and outcomes are free from serious doubts about generalizability

Specific study designs are not mentioned or explicitly tied to a specific level of evidence.

The quality rating for the highest level of evidence specifies “studies of strong design for the question.”

Evidence-based Practice Center (EPC) method for grading SOE, 2009 (51)

Audience: Clinicians, researchers, and other health professionals

Purpose: Summarize SOE for the purpose of guiding clinical practice recommendations and to improve the quality of healthcare

4 levels:

- High

- Moderate

- Low

- Insufficient

Evaluation is based on 5 required domains and, where appropriate, 3 more optional domains:

5 required domains:

• Study limitations/risk of bias: Low

• Directness: High

• Consistency: High

• Precision: High

• Reporting bias: Low

3 optional domains:

• Dose-response association: Present

• Uncontrolled confounding that can diminish an observed effect: Low

• Strength of association (i.e., large magnitude of effect): High

• Domain and total SOE grading should be done separately for RCT evidence and observational study evidence.

• Initially, RCTs start with a provisional high SOE grade and observational studies with a provisional low SOE grade.

• These grades are adjusted as stronger or weaker based on study limitations or other factors.

Joanna Briggs Institute Levels of Evidence*, 2013 (52)

*No longer in current use, organization recently switched to using GRADE; grading for research questions of effectiveness is presented here as the most relevant domain for lifestyle medicine-type interventions

Audience: Researchers

Purpose: Summarize the SOE

4 levels under effectiveness heading:*

- Level 1: Experimental Designs

- Level 2: Quasi-Experimental DesignsLevel 3: Observational-Analytic Designs

- Level 4: Observational-Descriptive Studies

- Level 5: Expert Opinion and Bench Research

Each level contains sub-levels

Effectiveness Level 1 categories are defined as follows:

• Level 1.a – Systematic review of RCTs

• Level 1.b – Systematic review of RCTs and other study designs

• Level 1.c – RCTs

• Level 1.d – Pseudo-RCTs

Prospective cohort studies* appear only in Level 3 categories (not Levels 1 or 2)

• Level 3.a – Systematic review of comparable cohort studies

• Level 3.b – Systematic review of comparable cohort and other lower study designs

• Level 3.c – Cohort study with control group

• Level 3.e – Observational study without a control group)

“Inception cohort studies” do appear in Level 1 under prognosis heading

Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence, 2011 (53)

Audience: Physicians

Purpose: To provide traditional critical appraisal and summarize SOE for clinicians and patients to quickly guide decisions to clinical questions

5 levels:

- Level 1

- Level 2

- Level 3

- Level 4

- Level 5

Each of the 5 levels are defined separately for each of the 7 clinical questions.

Level 1 evidence definitions for each of seven clinical questions:

• 1. How common is the problem? Local and current random sample surveys (or censuses)

• 2. Is this diagnostic or monitoring test accurate? (Diagnosis) Systematic review of cross-sectional studies with consistently applied reference standard and blinding

• 3. What will happen if we do not add a therapy? (Prognosis) Systematic review of inception cohort studies

• 4. Does this intervention help? (Treatment Benefits) Systematic review of randomized trials or n-of-1 trials

• 5. What are the COMMON harms? (Treatment Harms) Systematic review of randomized trials, systematic review of nested case-control studies, n-of-1 trial with the patient you are raising the question about, or observational study with dramatic effect

• 6. What are the RARE harms? (Treatment Harms) Systematic review of randomized trials or n-of-1 trial

• 7. Is this (early detection) test worthwhile? (Screening) Systematic review of randomized trials

Prospective cohort studies c appear in the following clinical questions:

• 3. What will happen if we do not add a therapy? (Prognosis)

o Level 1: Systematic review of inception cohort studies

o Level 2: Inception cohort studies

o Level 3: Cohort study or control arm of randomized trial. Level may be graded down on the basis of study quality, imprecision, indirectness (study PICO does not match questions PICO), because of inconsistency between studies, or because the absolute effect size is very small; Level may be graded up if there is a large or very large effect size.)

• Does this intervention help? (Treatment Benefits)

o Level 2: includes observational study with dramatic effect

o Level 3: Non-randomized controlled cohort/follow-up study

• 7. Is this (early detection) test worthwhile? (Screening)

o Level 3: Non-randomized controlled cohort/follow-up study

Author-defined / lesser-known methods
Modified form of coding system, 2000 (37)

Audience: Researchers

Purpose: To evaluate SOE related to correlates of physical activity in children and adolescents

3 levels:

- Association (either positive or negative): 60–100% of studies reviewed support association

- Indeterminate: 34–59% of studies reviewed support association

- No association: 0–33% of studies reviewed support association

Highest level is achieved when 60% or more of studies (regardless of design or total N) reviewed have a consistent positive or negative association.

Study design is not referenced in this method.

All studies’ results would count equally towards SOE score; no instructions are given with respect to weighting of different study designs.

Topic-specific SOE rating system for evaluating research on back pain, 1996 (38, 39)

Audience: Researchers and clinicians with an interest in back pain

Purpose: To guide clinical practice guidelines for back pain

4 levels:

- Strong

- Moderate

- Limited

- No evidence

Multiple high-quality RCTs with consistent positive outcomes Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation).
Best evidence synthesis: a rating system based on a best-evidence synthesis used previously for PA interventions, 1995 (40–43)

Audience: Researchers

Purpose: To summarize the SOE

4 levels:

- Level 1: Strong

- Level 2: Moderate

- Level 3: Limited

- Level 4: No evidence

Multiple RCTs of high quality with consistent positive results. Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation).
Criteria for determining level of evidence in meta-analyses of RCTs for walking training in stroke, 2008 (44)

Audience: Researchers and clinicians

Purpose: To determine SOE in relation to rehabilitation after stroke

4 levels

- High

- Moderate

- Low

- No evidence

At least 2 high-quality RCTs with similar results Prospective cohort studies are not referenced (i.e., they are not relevant to this kind of evaluation)
Overall SOE, 1999 (45, 46)

Audience: Researchers and clinicians

Purpose: To predict the onset of functional status decline in people without initial functional status impairment

4 levels:

+++ [Strong]

++ [Moderate]

+ [Limited]

(+) [Weak]d

• Evidence in > 3 “high quality studies” with a consistent positive or negative association

• Analyses have no identified methodological limitations

• Studies exclude individuals with functional status impairment at baseline

• Studies report a significant positive association between risk factor and functional status decline in people

Study design is not referenced in this method. All study designs can count equally in the SOE score, provided they were not identified as having methodological limitations (so were therefore classified as “appropriate”); no instructions are given with respect to weighting of different study designs.

aSufficient effect sizes are defined on a case-by-case basis and are based on Task Force opinion. Each study is categorized as having good, fair, or limited quality of execution based on the number of limitations noted, studies with 0–1, 2–4, and 5 or more limitations are categorized as having good, fair, and limited execution respectively. The suitability of study design has 3 levels: Greatest, Moderate, and Least. Greatest: Concurrent comparison groups and prospective measurement of exposure and outcome; Moderate: All retrospective designs or multiple pre or post-measurements but no concurrent comparison group; Least: Single pre and post-measurements and no concurrent comparison group or exposure and outcome measured in a single group at the same point in time.

bHealth claims characterize the relationship between a substance (such as a food or food component) and a disease or health-related condition.

cProspective cohort studies are mentioned in question 5 (What are the COMMON harms?) and question 6 (What are the RARE harms?). However, they are not described in this table because of their limited relevance to lifestyle medicine interventions, which typically do not cause the harmful side effects seen in pharmaceutical treatment trials.d[descriptors] added for this table