. 2019 Aug 20;19:178. doi: 10.1186/s12874-019-0811-z

Table 6.

Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) Strength of Evidence (SOE) Approach*

HEALM contains three scoring** levels of SOE: Grade A (Strong/decisive); Grade B (Moderate/suggestive); Grade C (Insufficient/inconclusive)
As in other SOE evaluation methods, included studies’ methodological quality and risk of bias should be graded prior to assessment with HEALM established tools for rating individual study quality. Two examples are Cochrane’s Risk of Bias Tool⁵⁴ for randomized controlled trials (RCTs) and the Newcastle-Ottawa Tool⁵⁵ for cohort and case-control studies. Q1: Are there established mechanisms of action? (a plurality* of evidence from bench science and animal models) Yes = 2 Uncertain* = 1 No = 0 Q2: Are there intervention studies in people that provide evidence of causality/attribution? (a plurality* of high-quality intervention trials, randomized controlled trials, interim measures, and surrogate markers as outcomes) Yes = 3 Uncertain = 1 No = 0 Q3: Are there observational studies to establish generalizability to large, populations? (a plurality* of high-quality evidence from large prospective, cohort studies) Yes = 2 Uncertain = 1 No = 0 Q4: Are there observational studies to support effects over time periods measured in decades, lifetimes, or generations? (a plurality*** of evidence from high quality, long-term observational studies; retrospective cohort studies; ethnography; transcultural studies) Yes = 2 Uncertain = 1 No = 0
The HEALM tool is presented here to illustrate potential approaches to scoring evidence across research categories; it does not represent the single, specific approach recommended by the project expert panel on the basis of a formal process consensus process. Scoring Answers to scoring questions should be based on expert consensus in evaluating available evidence. Evidence is conclusive when it can be identified as sufficient in quantity and quality, and consistent in findings, fostering clear consensus among experts. This would generally mean a replicated finding, and consistent effects among a clear plurality* of high quality, related publications.Evidence is uncertain when studies are few, small, poor quality, or conflicting- but generally suggestive of a particular finding. While expert consensus is critical in evaluation, a framework to inform discussion based on quantitative criteria used in previous umbrella reviews⁵⁶ is suggested: 1. Total sample and number of cases of included studies 2. Significance of association based on p-values (highly significant defined as p < 0.0001 vs. nominally significant defined as p < 0.05) and confidence intervals that exclude vs. include the null value 3. When considering studies that include meta-analyses, a target threshold of 1000 cases, no evidence of small-study effects or excess significance bias, a 95% prediction interval excluding the null value and no large, unexplained, between-study heterogeneity (I² < 50%) Grade A: Strong evidence = ≥7 (this would require decisive evidence in all other categories, AND at least suggestive evidence from intervention trials in people; OR- strong evidence from intervention trials in people, and decisive evidence in other two categories; OR strong evidence from intervention trials, decisive evidence in any other category, and suggestive evidence in the remaining two. Lends a primacy to RCT evidence but allows for strong evidence even with nothing more than suggestive evidence in intervention trial category. Grade B: Moderate/suggestive = 5 or 6. Achievable with decisive intervention trial evidence, and strong evidence in ANY other category. OR, strong evidence in all categories other than intervention trials. Grade C: Insufficient/weak/C = < 5 **Plurality may vary depending on the total number of existing studies conducted on a particular research question and must be determined on a case-by-case basis. For example, three consistent studies from a variety of study design with no opposing studies may constitute a plurality. Were there to be opposing studies the target number would be more than three. A clear numerical plurality of studies but with overall poor quality may constitute a rating of “Uncertain”.

HEALM contains three scoring** levels of SOE: Grade A (Strong/decisive); Grade B (Moderate/suggestive); Grade C (Insufficient/inconclusive)

As in other SOE evaluation methods, included studies’ methodological quality and risk of bias should be graded prior to assessment with HEALM established tools for rating individual study quality. Two examples are Cochrane’s Risk of Bias Tool⁵⁴ for randomized controlled trials (RCTs) and the Newcastle-Ottawa Tool⁵⁵ for cohort and case-control studies.

Q1: Are there established mechanisms of action?

(a plurality*** of evidence from bench science and animal models)

Yes = 2

Uncertain*** = 1

No = 0

Q2: Are there intervention studies in people that provide evidence of causality/attribution?

(a plurality*** of high-quality intervention trials, randomized controlled trials, interim measures, and surrogate markers as outcomes)

Yes = 3

Uncertain = 1

No = 0

Q3: Are there observational studies to establish generalizability to large, populations?

(a plurality*** of high-quality evidence from large prospective, cohort studies)

Yes = 2

Uncertain = 1

No = 0

Q4: Are there observational studies to support effects over time periods measured in decades, lifetimes, or generations?

(a plurality*** of evidence from high quality, long-term observational studies; retrospective cohort studies; ethnography; transcultural studies)

Yes = 2

Uncertain = 1

No = 0

*The HEALM tool is presented here to illustrate potential approaches to scoring evidence across research categories; it does not represent the single, specific approach recommended by the project expert panel on the basis of a formal process consensus process.

**Scoring

Answers to scoring questions should be based on expert consensus in evaluating available evidence. Evidence is conclusive when it can be identified as sufficient in quantity and quality, and consistent in findings, fostering clear consensus among experts. This would generally mean a replicated finding, and consistent effects among a clear plurality** of high quality, related publications.Evidence is uncertain when studies are few, small, poor quality, or conflicting- but generally suggestive of a particular finding.

While expert consensus is critical in evaluation, a framework to inform discussion based on quantitative criteria used in previous umbrella reviews⁵⁶ is suggested:

1. Total sample and number of cases of included studies

2. Significance of association based on p-values (highly significant defined as p < 0.0001 vs. nominally significant defined as p < 0.05) and confidence intervals that exclude vs. include the null value

3. When considering studies that include meta-analyses, a target threshold of 1000 cases, no evidence of small-study effects or excess significance bias, a 95% prediction interval excluding the null value and no large, unexplained, between-study heterogeneity (I² < 50%)

Grade A: Strong evidence = ≥7 (this would require decisive evidence in all other categories, AND at least suggestive evidence from intervention trials in people; OR- strong evidence from intervention trials in people, and decisive evidence in other two categories; OR strong evidence from intervention trials, decisive evidence in any other category, and suggestive evidence in the remaining two. Lends a primacy to RCT evidence but allows for strong evidence even with nothing more than suggestive evidence in intervention trial category.

Grade B: Moderate/suggestive = 5 or 6. Achievable with decisive intervention trial evidence, and strong evidence in ANY other category. OR, strong evidence in all categories other than intervention trials.

Grade C: Insufficient/weak/C = < 5

**Plurality may vary depending on the total number of existing studies conducted on a particular research question and must be determined on a case-by-case basis. For example, three consistent studies from a variety of study design with no opposing studies may constitute a plurality. Were there to be opposing studies the target number would be more than three. A clear numerical plurality of studies but with overall poor quality may constitute a rating of “Uncertain”.