Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Oct 1.
Published in final edited form as: World J Surg. 2012 Oct;36(10):2273–2275. doi: 10.1007/s00268-012-1630-0

Lack of Uniformity in Levels of Evidence and Recommendation Grades in Surgical Oncology Guidelines

Haejin In 1,2,*, Caprice C Greenberg 1,3
PMCID: PMC3469720  NIHMSID: NIHMS389351  PMID: 22543721

Practice guidelines exist to bridge the gap between research and clinical practice. “Ensuring Quality Cancer Care” published in 1999 by the Institute of Medicine’s National Cancer Policy Board [1] promoted practice guidelines as a means to improve the quality of care and brought about a flurry of activity with various societies producing practice guidelines for oncology. These various organizations used a myriad of methods to grade the level of evidence and strength of recommendations[2], resulting in both non-uniform expression of and inconsistency in conclusions. As guidelines are increasingly used to create performance indicators to measure quality of care[3], the stakes are higher making variability in guideline recommendations problematic.

Eight cancer-specific guidelines generated by organizations in the United States and applicable to surgical oncology were selected to examine the rating schemes and processes used to assign level of evidence and grade of recommendation. Table 1 describes the guideline organizations considered. For brevity, the full name is spelled out in Table 1 and the text refers to organizations by acronym.

Table 1.

Heterogeneity of annotation schemes used to express level of evidence and recommendation grade

Level of
evidence
Recommendation
grade
American Association of Clinical Endocrinologists (AACE) guidelines (2004) 1 to 4 A to D
American Society of Clinical Oncology (ASCO) Guideline Recommendations (2001) I to V A to D
American Society of Clinical Oncology (ASCO) Guideline Recommendations (2004) -- --
American Society of Colon and Rectal Surgeons (ASCRS), Practice Parameters (2004) I to III, A to C --
American Society of Colon and Rectal Surgeons (ASCRS), Practice Parameters (2005) I to V A to D
American Thyroid Association (ATA) guidelines -- A to I
College of American Pathologists (CAP) Consensus Statement 1999 -- Category I to IV
National Comprehensive Cancer Network (NCCN) Practice guidelines in Oncology -- Category 1 to 3
National Cancer Institute (NCI) Treatment Guidelines 1i to 3iii --
Society for Surgery of the alimentary tract (SSAT) patient care guidelines -- --

A guideline recommendation is generally developed through two distinct steps and expressed through organization-specific rating schemes. First, the level of evidence (LOE) is evaluated. This is done through an objective determination of a level according to the study design type used in the supporting evidence and a subjective interpretation of the evidence by panel members about the scientific rigor of the studies. Organizations were inconsistent about which inputs to use in determining LOE; while some organizations used only type of study design or the subjective judgment of the panel members, some organizations used a combination of both. Organizations that used type of study design as its criteria for level determination readily documented a LOE, while organizations that predominantly used subjective judgment documented LOE less frequently.

Second, a recommendation grade (RG) is determined. The grade expresses how strongly a guideline organization wishes to advocate a recommendation. A guideline panel will generally assign a grade using a combination of the LOE and the judgment and opinion of the expert panel about the practice and policy implications of a recommendation. While LOE was the prevailing consideration in many guidelines (AACE (2004), ASCO(2001), ASCRS (2005)), other guideline organizations favored the use of subjective judgment regarding the quality of the evidence (ATA, CAP, NCCN) in determining RG.

LOE or RG was not used by all guideline organizations. While 3 of the 8 guidelines used both, 2 ranked only LOE, 3 ranked only RG and 2 of the guidelines ranked neither LOE nor RG (Table 1). The annotation used by guideline organizations was also variable. While a numeric annotation for LOE and alphabetic annotation for RG was most common, this was not always the case. Additionally, LOE and RG were sectioned in various ways, ranging from 3 tiers (‘I’-‘III’) to 6 tiers (‘1i’ to ‘3iii’). Although a uni-directional scale was generally used to rank levels of evidence, either uni-directional or bi-directional scale could be used to express grades of recommendation. Meta-analysis and systemic reviews of randomized control trials (RCT) were considered by all organizations to provide the highest level of evidence while expert opinion was considered to provide the lowest level of evidence. However, the rating for study designs between the 2 extremes varied significantly by organization (Table 2).

Table 2.

Varying levels of evidence assigned by study design

Meta-
analysis,
systemic
reviews of
RCTs
Multiple
RCTs
Single
RCT
low quality
RCTs
(high
likelihood
of bias)
Multiple
cohort
studies
individual
cohort
studies
multiple
case-
control
studies
individual
case
control
studies
time
series
case
series/
descriptive
studies
expert
opinion
AACE (2004) 1 1 2 3 2 2/3 3 3 3 4
ASCO (2001) I I II II III III III III III IV V
ASCRS (2004) I (A/B) I (A/B) I (A/B) II (A/B/C) II (A/B/C) II (A/B/C) II (A/B/C) II (A/B/C) II (A/B/C) III (C) III (C)
ASCRS (2005) I I II II III III III III III IV V
NCI 1 1 1 2 3 3 3 3 3 3 --

AACE, American Association of Clinical Endocrinologists; ASCO, American Society of Clinical Oncology; ASCRS, American Society of Colon and Rectal Surgeons; NCI, National Cancer Institute

Almost all guideline organizations used rigorous and explicit methodologies to evaluate the evidence and develop guideline recommendations. However, the lack of consistency in the usage of LOE and RG, the non-uniformity of inputs used to determine LOE and RG and the diversity of rating systems used to express LOE and RG raise two concerns. First, a lack of uniform methodology to expressly document the quality of evidence used to create a recommendation undermines the utility of guidelines as a tool to narrow the evidence to practice gap. Studies examining physician behavior show that physicians will change practice to accept recommendations in response to good evidence [45]. However, guidelines often offer inconsistent [6] or even conflicting advice [7] complicating integration into practice. Lack of clear documentation exacerbates this concern and throws further doubt on its reliability.

A second concern regards the increased use of guideline recommendations as quality and performance measures to evaluate and rank institutions and physicians. The lack of a set criterion to determine and express LOE and RG make it difficult to justify using a particular organization’s guideline recommendation. Given the ongoing call for quality measures by the public and payers, improving the standardization across guidelines takes on an increased urgency.

Prior efforts have mostly focused on the evaluation of guidelines (“guidelines for guidelines”) and less on standardization. There are at least 24 different instruments available for assessing the quality of clinical practice guidelines [89], including the Conference on Guideline Standardization (COGS) and the Appraisal of Guidelines, Research and Evaluation (AGREE) collaboration. Efforts promoting standardization have been scarce, but a universal approach to grading the quality of evidence and strength of recommendations developed by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group[10] has gained momentum. The AACE (version 2010) and various other organizations such as American College of Chest Physicians and National Institute for Clinical Excellence have recently incorporated the GRADE methodology.

Consumers of medical information are increasingly flooded by information from multiple sources and are forced to decide what they should and should not adopt into practice and policy. Guidelines initially created to aid in this process may instead complicate matters as they increase in number and offer inconsistent conclusions. A systemic and uniform method to develop and present LOE and RG that allow comparisons across different guidelines would 1) enhance usability by providing credence to recommendations, 2) promote the use of guidelines as recommendations to inform practice, and 3) enhance physician buy-in for the use of quality measures derived from practice guidelines. Such an approach would aid efforts to align policy and protocols with clinical practice as well as administrative incentives with good medicine.

REFERENCE

  • 1.Moulton G. IOM report on quality of cancer care highlights need for research, data expansion. Institute of Medicine. J Natl Cancer Inst. 1999;91(9):761–762. doi: 10.1093/jnci/91.9.761. [DOI] [PubMed] [Google Scholar]
  • 2.Schunemann HJ, Fretheim A, Oxman AD. Improving the use of research evidence in guideline development: 9. Grading evidence and recommendations. Health Res Policy Syst. 2006;4:21. doi: 10.1186/1478-4505-4-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeCristofaro A, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348(26):2635–2645. doi: 10.1056/NEJMsa022615. [DOI] [PubMed] [Google Scholar]
  • 4.Stafford RS, Furberg CD, Finkelstein SN, Cockburn IM, Alehegn T, Ma J. Impact of clinical trial results on national trends in alpha-blocker prescribing, 1996–2002. JAMA. 2004;291(1):54–62. doi: 10.1001/jama.291.1.54. [DOI] [PubMed] [Google Scholar]
  • 5.Simunovic M, Baxter NN. Knowledge translation research: a review and new concepts from a surgical case study. Surgery. 2009;145(6):639–644. doi: 10.1016/j.surg.2008.11.011. [DOI] [PubMed] [Google Scholar]
  • 6.Swales JD. Guidelines on guidelines. J Hypertens. 1993;11(9):899–903. doi: 10.1097/00004872-199309000-00003. [DOI] [PubMed] [Google Scholar]
  • 7.Thomson R, McElroy H, Sudlow M. Guidelines on anticoagulant treatment in atrial fibrillation in Great Britain: variation in content and implications for treatment. BMJ. 1998;316(7130):509–513. doi: 10.1136/bmj.316.7130.509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Oxman AD, Schunemann HJ, Fretheim A. Improving the use of research evidence in guideline development: 16. Evaluation. Health Res Policy Syst. 2006;4:28. doi: 10.1186/1478-4505-4-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Vlayen J, Aertgeerts B, Hannes K, Sermeus W, Ramaekers D. A systematic review of appraisal tools for clinical practice guidelines: multiple similarities and one common deficit. Int J Qual Health Care. 2005;17(3):235–242. doi: 10.1093/intqhc/mzi027. [DOI] [PubMed] [Google Scholar]
  • 10.Calonge N. New Promise for Uniform Evidence-based Guideline Development: The GRADE Approach. [cited June 13, 2010];2010 Available from: http://www.guideline.gov/expert/commentary.aspx?file=GRADE.inc. [Google Scholar]

RESOURCES