Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 May 1.
Published in final edited form as: Med Care. 2019 May;57(5):391–398. doi: 10.1097/MLR.0000000000001108

Clinical Scenarios for which Spinal Mobilization and Manipulation Are Considered by an Expert Panel to be Inappropriate (and Appropriate) for Patients with Chronic Low Back Pain

Patricia M Herman 1, Eric L Hurwitz 2, Paul G Shekelle 3, Margaret D Whitley 4, Ian D Coulter 5
PMCID: PMC6459705  NIHMSID: NIHMS1522100  PMID: 30870390

Abstract

Background.

Spinal mobilization and manipulation are two therapies found to be generally safe and effective for chronic low back pain (CLBP). However, the question remains whether they are appropriate for all CLBP patients.

Research Design.

An expert panel used a well-validated approach, including an evidence synthesis and clinical acumen, to develop and then rate the appropriateness of the use of spinal mobilization and manipulation across an exhaustive list of clinical scenarios which could present for CLBP. Decision tree analysis (DTA) was used to identify the key patient characteristics that affected the ratings.

Results.

Nine hundred clinical scenarios were defined and then rated by a 9-member expert panel as to the appropriateness of spinal mobilization and manipulation. Across clinical scenarios more were rated appropriate than inappropriate. However, the number patients presenting with each scenario is not yet known. Nevertheless, DTA indicates that all clinical scenarios that included major neurologic findings, and some others involving imaging findings of central herniated nucleus pulposus, spinal stenosis, or free fragments, were rated as inappropriate for both spinal mobilization and manipulation. DTA also identified the absence of these imaging findings and no previous laminectomy as the most important patient characteristics in predicting ratings of appropriate.

Conclusions.

A well-validated expert panel-based approach was used to develop and then rate the appropriateness of the use of spinal mobilization and manipulation across the clinical scenarios which could present for CLBP. Information on the clinical scenarios for which these therapies are inappropriate should be added to clinical guidelines for CLBP.

Keywords: chronic low back pain, appropriateness of care, RAND/UCLA Appropriateness Method, decision tree analysis, spinal mobilization and manipulation

Introduction

The ultimate goal of all medical research is to ensure that patients receive care that is appropriate, or “suitable or proper in the circumstances.”(1) Appropriate care has been defined as: “Health care in which the expected clinical benefits (e.g., improved symptoms) of care outweigh the expected negative effects (e.g. adverse drug effects) to such an extent that the treatment is justified.”(2) It is estimated that 20 percent of health care costs are wasteful—i.e., going to inappropriate or useless care.(3) While most people would agree that all patients should get appropriate care, the challenges are defining appropriateness, determining which therapies are appropriate, and ensuring delivery of this care.

In response to wide variations in clinical practice patterns, the RAND Corporation and the University of California, Los Angeles, (UCLA) pioneered a method to study the appropriateness of care.(48) This RAND/UCLA Appropriateness Method (RUAM) takes advantage of the available evidence base but also draws on the clinical acumen and experience of practitioners. The RUAM has been the most widely used and studied method for defining and identifying appropriate care. The estimates of appropriateness generated by the RUAM have been found to be reliable with test-retest reliability >0.9 using the same panelists 6-8 months later.(9) Also, the results across several panels with similar discipline compositions for the same procedure are reproducible with kappa statistics (0.5 to 0.7) similar to those of some common diagnostic tests.(10, 11) The RUAM estimates have also been found to be valid. Panelists’ ratings of appropriateness were consistent with the literature, and follow a logical clinical rationale.(9) When sets of appropriateness results from the RUAM and another approach were applied to a sample of patients who received a procedure, the RUAM clinical scenarios accounted for all patients in the sample, whereas the other approach only addressed 70% of patients and mainly missed patients for which the procedure was inappropriate.(12) Of importance, the RUAM results have favorable predictive ability—i.e., patients treated in accordance with the criteria have better outcomes than those who receive another or no treatment. Favorable predictive ability has been found for coronary angiography,(13, 14) carotid endarterectomy,(15) and coronary revascularization.(16, 17) It has also been found that later clinical trials targeting specific patient types have validated panelists’ ratings made before that (or much other) evidence existed.(15) Finally, the sensitivity and specificity of the RUAM method to identify inappropriate overuse and underuse of healthcare has been estimated at between 68% and 99% and 94% and 97%, respectively.(18)

Chronic low back pain (CLBP) is the most common type of chronic pain,(19, 20) and is costly to the healthcare system and employers.(2125) According to numerous systematic reviews and meta-analyses,(2628) a number of nonpharmacologic therapies have been found effective for chronic back pain and are now included in guidelines.(2931) Spinal mobilization and manipulation, two of the recommended therapies, (3033) are most commonly delivered by chiropractors, osteopaths, and physical therapists.(34) In the US 30 to 50 percent of those with spinal pain have seen a chiropractor,(23, 24, 35) and 15 to 34 percent have used physical therapy.(23, 24) Therefore, it is reasonable to believe that a large number of those with CLBP are receiving spinal mobilization and manipulation. Although systematic reviews have shown these therapies to be generally safe and effective, the question remains whether they are appropriate for all patients with CLBP.

This study used the RUAM to determine the appropriateness of spinal mobilization and manipulation for different types of patients with CLBP, and the key patient characteristics associated with appropriateness. A separate paper will apply these results to determine the prevalence of appropriate and inappropriate care in chiropractic practice.

Methods

The RAND/UCLA Appropriateness Method(6, 36, 37) (RUAM) used a modified-Delphi panel of clinical and content experts, and their knowledge and clinical acumen, to translate the available evidence on a therapy into ratings of appropriate, inappropriate or equivocal for each patient type (clinical scenario) considered. Nine panelists were chosen based on their clinical expertise across different specialties and disciplines, and diversity of geographic location. Our panel included one orthopedist, one osteopath, one internist, two chiropractors, one physical therapist, one radiologist, and two health services researchers. Each received a $1000 honorarium for their participation.

Panelists were first presented with the latest evidence on the effectiveness and safety of each therapy in terms of a detailed systematic review,(38) and then asked to rate, using a 1-9 scale, the extent to which the benefits of the therapy outweigh its risks for each clinical scenario. Ratings of 7-9 (i.e., the therapy is appropriate) were to be given if: “The expected health benefit (e.g., increased life expectancy, relief of pain, reduction in anxiety, improved functional capacity) exceeds the expected negative consequences (e.g., mortality, morbidity, anxiety, pain, work time lost) by a sufficiently wide margin that the procedure is worth doing, exclusive of cost.”(5) The instructions given to panelists and definitions of terms used in the rating process are found in a detailed, publicly available RAND report.(39)

The clinical scenarios (patient types) to rate were organized into sections for ease of rating.(39) Once one (the first) clinical scenario in a section was rated, the others only differed by one or two patient characteristics and could be evaluated quickly. The project staff compiled the list of clinical scenarios using the literature review, clinical expert advice, and the clinical scenario list used for an earlier study on spinal manipulation for acute low back pain.(40, 41) These scenarios categorized patients in terms of their history, symptoms, physical and radiographic findings, and response to prior treatment. The list of clinical scenarios to rate needed to be comprehensive enough to capture all types of patients with CLBP, detailed enough that the procedure would be equally appropriate or inappropriate for all patients in a scenario, and manageable so that all scenarios could be rated within a reasonable amount of time. Scenarios deemed implausible were dropped. On average, once used to the process, a panelist can rate about 150-200 indications per hour.(6) The RUAM was applied to 900 clinical scenarios for spinal mobilization and 900 for spinal manipulation. In each case, 450 clinical scenarios were rated under two conditions: (1) there has been no other adequate conservative care for this episode or (2) nonmanipulative conservative care for this episode had failed.

For this study, we defined manipulation of the low back as a controlled, judiciously applied dynamic thrust (adjustment), which could include extension and rotation of the lumbar region, of high or low velocity and low-amplitude force directed to spinal joint segment within patient tolerance. Mobilization of the low back was defined as a controlled, judiciously applied force of low velocity and variable amplitude directed to spinal joint segments. Mobilization procedures usually do not take joints beyond the passive range of motion and do not result in joint cavitation.

Panelists rated each clinical scenario twice: 1) first individually at home; and 2) then during a one-day face-to-face meeting and after discussion with the other panelists. At the beginning of the face-to-face meeting each panelist was given a personalized printout showing their at-home ratings and the distribution, but not the identities, of all other panelists’ ratings. The home and face-to-face rating sessions occurred in February and March of 2015, respectively.

Panelists were asked to make their ratings using their own best clinical judgement and content knowledge (rather than their perceptions of what other experts might say) and considering an average patient currently presenting to an average North American practitioner who performs this procedure in an average care-providing facility. Consensus is reported, but not required. More detail on the RUAM is found in the RUAM manual.(6)

Analysis

The 1-9 appropriateness ratings given by each panelist after the face-to-face meeting were analyzed to generate one of three overall ratings for spinal mobilization and manipulation for each clinical scenario: appropriate, equivocal and inappropriate. The first analysis determined whether there was disagreement across the panelists’ appropriateness ratings for any clinical scenario. For a classic 9-member panel, agreement was defined by having at least 7 of the ratings in any 3-point region of the scale, and disagreement was defined as having at least three panelists’ ratings in the 1-3 range and at least three in the 7-9 range. If there was no disagreement and the median value of the ratings across the panel is 1-3, then the therapy was rated as inappropriate for that clinical scenario. If there is no disagreement and the median value of the ratings is 7-9, the therapy was rated as appropriate. The appropriateness for a therapy for a clinical scenario was rated as equivocal if: 1) most panelists gave a rating of 4, 5 or 6—i.e., most believed that benefits generally equaled risks; 2) panelists gave widely polarized ratings—i.e., there was disagreement; or 3) panelists’ ratings were scattered across the scale—i.e., there was substantial uncertainty as to appropriateness—and the median value was in the 4-6 range. The last two of these identify potential targets for future research.

The amount of agreement and disagreement, the dispersion of the ratings measured by the mean absolute deviation (MAD) from the median, and the proportions of clinical scenarios rated as appropriate, equivocal and inappropriate were compared between the at-home and in-meeting ratings. Calculations of agreement and appropriateness were conducted using Microsoft Excel and Java.

We used decision tree analysis (DTA) to see if simplified rules could be identified regarding the elements (patient characteristics) of the clinical scenarios that predict the appropriateness of spinal mobilization and spinal manipulation.(42) DTA looks for the smallest number of patient characteristics or combinations of characteristics that can provide an accurate prediction of appropriate or inappropriate ratings. These simplified rules can provide information that is not always obvious from individual ratings across hundreds of clinical scenarios. We identified the set of patient characteristics (Appendix) that variously made up the clinical scenarios, defined each scenario as the presence or absence (or for some such as pain, a particular level) of each characteristic, and included all as predictor variables in the DTA.

Twenty-six clinical scenarios (Chapter 11(39)) were not included in the DTA because they were each made up of single patient characteristics not included in any other scenario. These single characteristics (e.g., Grade IV spondylolisthesis) were each included in a scenario “that would otherwise be rated as appropriate,” and their ratings are described separately. Since the clinical scenarios did not always mention all patient characteristics, we assumed that if a characteristic wasn’t mentioned it was absent in that scenario. When predicting a rating of inappropriate (appropriate) we compared clinical scenarios with that rating to those without that rating—i.e., to those with ratings of either appropriate (inappropriate) or equivocal.

The DTA was conducted using the C4.5 algorithm(43) of the R statistical package (available at: https://cran.r-project.org/). Tree branches are formed based on the characteristic that provides the most information gain at each step, and the algorithm ends by returning to remove branches that are no longer useful.

The project was reviewed and determined to be exempt by _____ Human Subjects Protection Committee.

Results

The panelists reported varying times for the ratings done at home, but three hours was roughly the norm. Table 1 compares the initial at-home ratings to the final face-to-face ratings in terms of the ratings given, their dispersion (MAD) across panelists, and the number of clinical scenarios where there was agreement, a spread of ratings (uncertain), and clear disagreement. Ratings and agreement increased, and dispersion and disagreement decreased between the two sets of ratings. For both sets and therapies, appropriateness ratings were significantly higher when nonmanipulative conservative care for this episode had failed than when it had not been tried (paired t-tests p<.01).

Table 1.

Change in median ratings and extent of agreement and disagreement between the initial at-home ratings and final face-to-face meeting ratings by panelists using the RAND/UCLA Appropriateness Method

Item Spinal Mobilization Spinal Manipulation

No other adequate conservative care Nonmanipulative conservative care has failed No other adequate conservative care Nonmanipulative conservative care has failed

Initial Ratings Final Ratings Initial Ratings Final Ratings Initial Ratings Final Ratings Initial Ratings Final Ratings
Average Median (1-9 scale) 4.9 5.2 5.2 5.5 4.7 5.2 5.0 5.4
MAD from Median 1.3 0.9 1.2 0.9 1.4 1.0 1.4 1.0
Number (%) Agreement 89 (19.8%) 134 (29.8%) 90 (20.0%) 128 (28.4%) 78 (17.3%) 80 (17.8%) 80 (17.8%) 107 (23.8%)
Number (%) Uncertain 349 (77.6%) 315 (70.0%) 346 (76.9%) 322 (71.6%) 348 (77.3%) 369 (82.0%) 350 (77.8%) 343 (76.2%)
Number (%) Disagreement 12 (2.7%) 1 (0.2%) 13 (3.1%) 0 (0.0%) 24 (5.3%) 1 (0.2%) 20 (4.4%) 0 (0.0%)

MAD = mean absolute deviation

Table 2 gives the number of clinical scenarios rated as appropriate, inappropriate or equivocal for the final face-to-face ratings. More clinical scenarios were rated appropriate than inappropriate. However, about two-thirds of clinical scenarios were rated equivocal, and most of these cases were a result of a spread of ratings (uncertain) with a median rating in the 4-6 range. Details on the ratings given to each clinical scenario are found in the published RAND report.(39)

Table 2.

Final appropriateness ratings across clinical scenarios by panelists using the RAND/UCLA Appropriateness Method

Rating Spinal Mobilization Spinal Manipulation

No other adequate conservative care Nonmanipulative conservative care has failed No other adequate conservative care Nonmanipulative conservative care has failed

Number % Number % Number % Number %
Inappropriate 53 11.8% 45 10.0% 58 12.9% 48 10.7%
Equivocal 328 72.9% 296 65.8% 318 70.7% 303 67.3%
 Agreement and equivocal 73 16.2% 58 12.9% 23 5.1% 43 9.6%
 Disagreement 1 0.2% 0 0.0% 1 0.2% 0 0.0%
 Uncertain and equivocal 254 56.4% 238 52.9% 294 65.3% 260 57.8%
Appropriate 69 15.3% 109 24.2% 74 16.4% 99 22.0%
Total 450 100% 450 100% 450 100% 450 100%

Figures 1 and 2 show the results of the decision tree analyses identifying the patient characteristics that best predict a clinical scenario being rated as inappropriate (versus appropriate or equivocal) for mobilization and manipulation, respectively. As can be seen the decision trees for each therapy are very similar. They only differ by the addition of a question about the presence of minor neurologic findings (i.e., at least one of the following: asymmetrically decreased reflexes in lower extremity; documented dermatomal or peripheral nerve sensory changes which may include deficit, paresthesia, and hyperesthesia; non-progressive unilateral muscle weakness and/or parasthesia that follows a radicular pattern) in the mobilization flowchart.

Figure 1.

Figure 1.

Results of the decision tree analysis of the patient characteristics that best define the clinical scenarios where mobilization was rated inappropriate

CLBP = chronic low back pain; HNP = Herniated nucleus pulposus—i.e., herniated disc

1Major neurologic findings = At least one of the following: neurologic signs of lumbar myelopathy; progressive unilateral muscle weakness and/or motor loss documented by repeat exam over time; sensory deficits other than related to dermatomes or peripheral nerves; and/or electrodiagnostic findings of acute and/or progressive radiculopathy.

2Minor neurologic findings = At least one of the following: asymmetrically decreased reflexes in lower extremity; documented dermatomal or peripheral nerve sensory changes which may include deficit, paresthesia, and hyperesthesia; non-progressive unilateral muscle weakness and/or parasthesia that follows a radicular pattern. Note: if major neurologic findings are present and minor neurologic findings are not mentioned they are assumed to be present.

3Continued biomechanical or psychosocial stress; Biomechanical stress = Postural, lifestyle, or occupational factors associated with low back pain or related complaints. Psychosocial stress = Depression (requiring drug treatment); alcohol or narcotic dependence; recent suicide attempt; severe anxiety; evidence of stressful life situation such as bereavement, job change, job or family dissatisfaction, litigation or compensation issues.

Figure 2.

Figure 2.

Results of the decision tree analysis of the patient characteristics that best define the clinical scenarios where manipulation was rated inappropriate

CLBP = chronic low back pain; HNP = Herniated nucleus pulposus, aka herniated disc

1Major neurologic findings = At least one of the following: neurologic signs of lumbar myelopathy; progressive unilateral muscle weakness and/or motor loss documented by repeat exam over time; sensory deficits other than related to dermatomes or peripheral nerves; and/or electrodiagnostic findings of acute and/or progressive radiculopathy.

2Continued biomechanical or psychosocial stress; Biomechanical stress = Postural, lifestyle, or occupational factors associated with low back pain or related complaints. Psychosocial stress = Depression (requiring drug treatment); alcohol or narcotic dependence; recent suicide attempt; severe anxiety; evidence of stressful life situation such as bereavement, job change, job or family dissatisfaction, litigation or compensation issues.

For both therapies the presence of major neurologic findings (i.e., at least one of the following: neurologic signs of lumbar myelopathy; progressive unilateral muscle weakness and/or motor loss documented by repeat exam over time; sensory deficits other than related to dermatomes or peripheral nerves; and/or electrodiagnostic findings of acute and/or progressive radiculopathy) was the best predictor of a clinical scenario being rated inappropriate. Some clinical scenarios without major neurological findings but with imaging findings of central herniated nucleus pulposus, spinal stenosis, or free fragments, AND no physical findings of joint dysfunction—i.e., no clear indication for manipulation or mobilization—were also rated as inappropriate. These cases occurred for mobilization when the patient also had a laminectomy AND minor neurological findings; OR no laminectomy, but biomechanical or psychosocial stress AND did not have a favorable response to prior manipulation or mobilization. The cases where manipulation was rated inappropriate were similar with the exception that minor neurological findings were not required.

Some of the patient characteristics that helped define the clinical scenarios (Appendix; e.g., sciatic nerve irritation, whether spine radiographs were done, current pain, previous conservative care) were not important in terms of a scenario being rated as inappropriate in the DTA. The DTA was also fairly accurate. Only 7 clinical scenarios out of 874 (900 - 26) for mobilization were misclassified as appropriate or equivocal when they were actually rated as inappropriate, and only one clinical scenario was misclassified as inappropriate when it was actually rated as appropriate or equivocal; an overall error rate of 0.9%. The same numbers for manipulation were 6, 3, and 1.0%. The main differences between the DTA and actual ratings seem to relate to whether there were physical findings of joint dysfunction.

We also performed DTA predicting a rating of appropriate, versus inappropriate or equivocal, for both therapies. It turns out to be more complex to predict appropriateness than to predict inappropriateness as the decision trees were less accurate and they required inclusion of all patient characteristics. Seventeen clinical scenarios out of 874 for mobilization were misclassified as appropriate when they were actually rated as inappropriate or equivocal, and 13 clinical scenarios were misclassified as inappropriate or equivocal when they were actually rated as appropriate; a 3.4% error rate. The same numbers for manipulation were 26, 9, and 4.0%. In any case, it is likely more important for patient safety that inappropriate care be identified.

Table 3 shows the percent of clinical scenarios that each patient characteristic helped classify and the direction of its influence for both therapies and both predictions. As can be seen, the presence of major neurologic findings, as the first split in the tree, was involved in the accurate prediction of an inappropriate rating for all (100%) of clinical scenarios. Findings on imaging provided information on inappropriateness for 92.6% of scenarios—i.e., all those without major neurologic findings that made it past the first filter. The patient characteristics most useful to the accurate prediction of an appropriate rating were no or minor findings on imaging (100%), followed by previous laminectomy (associated with a lower likelihood of appropriateness). The complexity of predicting appropriateness is illustrated by the spread and substantial size of the percentages across all patient characteristics for those analyses.

Table 3.

The usefulness of each patient characteristic in predicting whether a clinical scenario containing that characteristic was rated inappropriate or appropriate for each therapy—i.e., the percent of scenarios each characteristic helps classify

Patient characteristics* Predicting a rating of inappropriate Predicting a rating of appropriate

Mobilization Manipulation Mobilization Manipulation
Presence of major neurologic findings 100.0% 100.0% 52.7% 52.9%
No studies, or no or minor findings on imaging 92.6% 92.6% 100.0% 100.0%
Previous laminectomy 20.3% 20.3% 76.0% 76.0%
No biomechanical or psychosocial stress 17.2% 17.1% 1.0% 2.6%
Presence of minor neurologic findings 3.1% --- 48.6% 33.3%
Favorable response to prior manipulation 3.1% 4.7% 15.4% 11.5%
More physical findings of joint dysfunction 3.1% 3.1% 37.8% 39.7%
Pain is worse --- --- 65.1% 65.2%
Presence of sciatic nerve irritation --- --- 36.2% 48.8%
Previous non-manipulative conservative care failed --- --- 22.5% 3.1%
Previous radiographs exist --- --- 16.4% 15.5%
Presence of clinical risk factors --- --- 9.3% 9.1%
*

The patient characteristics are all fully defined in Appendix.

↑ = This patient characteristic predicts this rating.

↓ = This patient characteristic predicts against this rating.

As mentioned above, 26 clinical scenarios were excluded from the DTA because they were each rated as being added to an unspecified scenario “that would otherwise be rated as appropriate.” Both therapies were rated inappropriate for three types of patients (those with possible abdominal aortic aneurysms suspected by physical exam, with definite abdominal aortic aneurysm by history or imaging, or with radiographic contraindications to spinal mobilization or manipulation) and manipulation was rated as inappropriate for patients with Grade IV spondylolisthesis. Both therapies were rated as appropriate for Grade I and Grade II spondylolisthesis. All other grades of spondylolisthesis, clotting disorders with various results regarding prothrombin time, and possible abdominal aortic aneurysm with vascular calcifications on radiography, but not suspected by physical exam, were all rated equivocal.

Discussion

This study applied the internationally recognized and well-validated RUAM to obtain expert panel ratings of the appropriateness of spinal mobilization and manipulation for CLBP. Nine hundred clinical scenarios were developed to cover the full range of patients presenting with CLBP. Of these 178 were rated as appropriate, 98 were rated as inappropriate, and 624 were rated as equivocal (agreement that benefits roughly equal risks, or uncertainty as to whether benefits are greater or less than risks but with a median rating of their being roughly equal) for spinal mobilization. The numbers for spinal manipulation were 173 as appropriate, 106 as inappropriate, and 621 as equivocal. Between the initial (at-home) and final (face-to-face) ratings agreement and the number rated appropriate increased, and the dispersion of ratings and disagreement decreased. Decision tree analyses indicated that the main contributor to a rating of inappropriate for both therapies was the presence of major neurologic findings.

Over half of the clinical scenarios received a rating of equivocal because of a lack of agreement in ratings across panelists and a median rating in the 4-6 range (benefits roughly equal to risks). These ratings may indicate that these scenarios require further research.

One feature of the RUAM is that panelists rate a comprehensive array of potential clinical presentations (scenarios) for the procedure of interest. This is done is to meet one goal of the RUAM: to enable classification of all possible patients. Studies have shown that the RUAM was far better than informal methods at including descriptions of all patients and especially those for whom the procedure was inappropriate,(12) and that panel disagreement was not concentrated on rarely seen scenarios.(44) Nevertheless, there is some evidence that panel reproducibility is higher for clinical scenarios regularly seen in practice.(10)

Along these lines note that the numbers of clinical scenarios for which a therapy is rated appropriate, equivocal, and inappropriate provide no indication of the actual number of patients affected. A future article from this study will present results from the examination of the medical records of a representative sample of chiropractic patients with CLBP to determine the proportion of patients that present with each of these scenarios and the proportion of patients receiving appropriate and inappropriate chiropractic care. Our constructed scenarios are intended to provide a category for each type of patient with CLBP and do not relate to any data about the frequency these scenarios might be encountered in a practice. Nevertheless, information on the clinical scenarios rated inappropriate is still useful for guideline development.

This study had the advantages of using an internationally-recognized and well-validated method to translate available evidence and expert clinical acumen into ratings of the appropriateness of two therapies for an exhaustive list of 900 clinical scenarios which could present as CLBP. The approach, however, is not without its limitations. Panelists were presented with a full synthesis of all available evidence on the safety and effectiveness/efficacy of spinal mobilization and manipulation for CLBP. However, the available evidence has gaps, including the fact that clinical trials only include a subset of patients with CLBP and the analyses of trial data don’t often present results by distinct clinical scenarios. Therefore, panelists’ clinical acumen was essential to the process and not without its own biases. Panelists were also asked to rate the appropriateness of 900 clinical scenarios for each therapy, and it is difficult to perform so many ratings without error. However, the ability of the DTA to predict so accurately was one indication of at least the internal consistency of the ratings. Finally, our results relate to appropriateness but not to medical necessity, which would have required another rating step in the RUAM.

A well-validated expert panel-based approach was used to develop and then rate the appropriateness of the use of spinal mobilization and manipulation across an exhaustive list of clinical scenarios which could present for CLBP. For both therapies, more clinical scenarios were rated appropriate than inappropriate, but the majority were rated equivocal either due to agreement that the benefits of the therapy are roughly equivalent to its risks, or due to a range of ratings whose median lies in the range of rough equivalence. If these scenarios turn out to represent a substantial number of patients seen, this last could be a fruitful target for future research. Nonetheless, all clinical scenarios that included major neurologic findings, and some others involving imaging findings of central herniated nucleus pulposus, or spinal stenosis, or free fragments were found to be rated as inappropriate for both spinal mobilization and manipulation. This information should be added to clinical guidelines recommending these therapies for patients with chronic low back pain.

Supplementary Material

Supplemental Data File (.doc, .tif, pdf, etc.)

Acknowledgments

The authors would like to acknowledge and thank the nine members of the panel for spinal manipulation and mobilization for chronic low back pain. We also wish to acknowledge the contributions of Katharina Best and Seifu Chonde who ran the decision tree analysis.

Funding: The work has been funded by a cooperative agreement (U19) from the National Center for Complementary and Integrative Health (NCCIH). Grant No. 1U19AT007912-01.

Footnotes

Conflicts of Interest: The authors have no potential conflicts of interest for the past three years.

Contributor Information

Patricia M Herman, RAND Corporation, 1776 Main Street, PO Box 2138, Santa Monica, CA 90403.

Eric L Hurwitz, Office of Public Health Studies, University of Hawaii, 1960 East-West Road, Honolulu, HI 96822; Ph: 808-956-7425; Fax: 808-956-3368; ehurwitz@hawaii.edu.

Paul G Shekelle, RAND Corporation, 1776 Main Street, PO Box 2138, Santa Monica, CA 90403; Ph: 310-393-0411 ×6669; Fax: 310-260-8161; shekelle@rand.org.

Margaret D Whitley, RAND Corporation, 1776 Main Street, PO Box 2138, Santa Monica, CA 90403; Ph: 310-393-0411 ×7225; Fax: 310-260-8161; mwhitley@rand.org.

Ian D Coulter, RAND Corporation, 1776 Main Street, PO Box 2138, Santa Monica, CA 90403; Ph: 310-393-0411 ×7455; Fax: 310-260-8161; coulter@rand.org.

References

  • 1.Definition of appropriate in US English. Oxford Dictionaries. Oxford, UK: Oxford University Press; 2018 [Google Scholar]
  • 2.Segen’s Medical Dictionary. Appropriate care. 2011. Available at: https://medical-dictionary.thefreedictionary.com/appropriate+care Accessed Feb 22, 2018
  • 3.Berwick DM, Hackbarth AD. Eliminating waste in US health care. JAMA 2012;307:1513–1516 [DOI] [PubMed] [Google Scholar]
  • 4.Brook R Appropriateness: The next frontier. Br Med J 1994;308:218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Brook RH, Chassin MR, Fink A, et al. A method for the detailed assessment of the appropriateness of medical technologies. Int J Technol Assess Health Care 1986;2:53–63 [DOI] [PubMed] [Google Scholar]
  • 6.Fitch K, Bernstein SJ, Aguilar MD, et al. RAND/UCLA Appropriateness Method User’s Manual. Santa Monica, CA: RAND Corporation; 2001 [Google Scholar]
  • 7.McClellan M, Brook RH. Appropriateness of care: A comparison of global and outcome methods to set standards. Med Care 1992:565–586 [PubMed] [Google Scholar]
  • 8.Shekelle P The appropriateness method. Med Decis Making 2004;Mar-Apr:228–231 [DOI] [PubMed] [Google Scholar]
  • 9.Merrick NJ, Fink A, Park RE, et al. Derivation of clinical indications for carotid endarterectomy by an expert panel. Am J Public Health 1987;77:187–190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shekelle PG, Kahan JP, Bernstein SJ, et al. The reproducibility of a method to identify the overuse and underuse of medical procedures. New Engl J Med 1998;338:1888–1895 [DOI] [PubMed] [Google Scholar]
  • 11.Tobacman JK, Scott IU, Cyphert S, et al. Reproducibility of measures of overuse of cataract surgery by three physician panels. Med Care 1999;37:937–945 [DOI] [PubMed] [Google Scholar]
  • 12.Kahn KL, Park RE, Vennes J, et al. Assigning appropriateness ratings for diagnostic upper gastrointestinal endoscopy using two different approaches. Med Care 1992:1016–1028 [DOI] [PubMed] [Google Scholar]
  • 13.Selby JV, Fireman BH, Lundstrom RJ, et al. Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. New Engl J Med 1996;335:1888–1896 [DOI] [PubMed] [Google Scholar]
  • 14.Normand S-LT, Landrum MB, Guadagnoli E, et al. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol 2001;54:387–398 [DOI] [PubMed] [Google Scholar]
  • 15.Shekelle PG, Chassin MR, Park RE. Assessing the predictive validity of the RAND/UCLA appropriateness method criteria for performing carotid endarterectomy. Int J Technol Assess Health Care 1998;14:707–727 [DOI] [PubMed] [Google Scholar]
  • 16.Kravitz RL, Laouri M, Kahan JP, et al. Validity of criteria used for detecting underuse of coronary revascularization. JAMA 1995;274:632–638 [PubMed] [Google Scholar]
  • 17.Hemingway H, Crook AM, Feder G, et al. Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization. New Engl J Med 2001;344:645–654 [DOI] [PubMed] [Google Scholar]
  • 18.Shekelle PG, Park RE, Kahan JP, et al. Sensitivity and specificity of the RAND/UCLA Appropriateness Method to identify the overuse and underuse of coronary revascularization and hysterectomy. J Clin Epidemiol 2001;54:1004–1010 [DOI] [PubMed] [Google Scholar]
  • 19.Johannes CB, Le TK, Zhou X, et al. The prevalence of chronic pain in United States adults: results of an internet-based survey. J Pain 2010;11:1230–1239 [DOI] [PubMed] [Google Scholar]
  • 20.Institute of Medicine. Relieving Pain in America: A Blueprint for Transforming Prevention, Care, Education, and Research. Washington, DC: The National Academies Press; 2011 [PubMed] [Google Scholar]
  • 21.Davis MA. Where the United States spends its spine dollars: expenditures on different ambulatory services for the management of back and neck conditions. Spine 2012;37:1693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gaskin DJ, Richard P. The economic costs of pain in the United States. J Pain 2012;13:715–724 [DOI] [PubMed] [Google Scholar]
  • 23.Gore M, Sadosky A, Stacey BR, et al. The burden of chronic low back pain: clinical comorbidities, treatment patterns, and health care costs in usual care settings. Spine 2012;37:E668–E677 [DOI] [PubMed] [Google Scholar]
  • 24.Ivanova JI, Birnbaum HG, Schiller M, et al. Real-world practice patterns, health-care utilization, and costs in patients with low back pain: the long road to guideline-concordant care. Spine J 2011;11:622–632 [DOI] [PubMed] [Google Scholar]
  • 25.Smith M, Davis MA, Stano M, et al. Aging baby boomers and the rising cost of chronic back pain: secular trend analysis of longitudinal Medical Expenditures Panel Survey data for years 2000 to 2007. J Manipulative Physiol Ther 2013;36:2–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chou R, Deyo R, Friedly J, et al. Nonpharmacologic therapies for low back pain: a systematic review for an American College of Physicians Clinical Practice Guideline. Ann Intern Med 2017;166:493–505 [DOI] [PubMed] [Google Scholar]
  • 27.Chou RMD, Atlas SJ, Stanos SPDO, et al. Nonsurgical Interventional Therapies for Low Back Pain: A Review of the Evidence for an American Pain Society Clinical Practice Guideline. [Review]. Spine 2009;34:1066–1077, 1078–1093 [DOI] [PubMed] [Google Scholar]
  • 28.Agency for Healthcare Research and Quality. Noninvasive Nonpharmacological Treatment for Chronic Pain: A Systematic Review Effective Health Care Program. Rockville, MD: Agency for Healthcare Research and Quality; 2018 [PubMed] [Google Scholar]
  • 29.Brosseau L, Wells GA, Poitras S, et al. Ottawa Panel evidence-based clinical practice guidelines on therapeutic massage for low back pain. J Bodywork Movement Ther 2012;16:424–455 [DOI] [PubMed] [Google Scholar]
  • 30.The Diagnosis and Treatment of Low Back Pain Work Group. VA/DoD Clinical Practice Guideline for Diagnosis and Treatment of Low Back Pain, Version 2.0. Washington, DC: The Office of Quality, Safety and Value, VA, & Office of Evidence Based Practice, U.S. Army Medical Command; 2017 [Google Scholar]
  • 31.Qaseem A, Wilt TJ, McLean RM, et al. Noninvasive treatments for acute, subacute, and chronic low back pain: a clinical practice guideline from the American College of Physicians. Ann Intern Med 2017;166:514–530 [DOI] [PubMed] [Google Scholar]
  • 32.Chou R, Qaseem A, Snow V, et al. Diagnosis and treatment of low back pain: a joint clinical practice guideline from the American College of Physicians and the American Pain Society. Ann Intern Med 2007;147:478–491 [DOI] [PubMed] [Google Scholar]
  • 33.Wenger HC, Cifu AS. Treatment of low back pain. JAMA 2017;318:743–744 [DOI] [PubMed] [Google Scholar]
  • 34.Hurwitz EL. Epidemiology: spinal manipulation utilization. J Electromyography Kinesiol 2012;22:648–654 [DOI] [PubMed] [Google Scholar]
  • 35.Martin BI, Gerkovich MM, Deyo RA, et al. The association of complementary and alternative medicine use and health care expenditures for back and neck problems. Med Care 2012;50:1029–1036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Brook RH. Assessing the appropriateness of care—its time has come. JAMA 2009;302:997–998 [DOI] [PubMed] [Google Scholar]
  • 37.Nair R, Aggarwal R, Khanna D. Methods of formal consensus in classification/diagnostic criteria and guideline development. Semin Arthritis Rheum: Elsevier; 2011:95–105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Coulter ID, Crawford C, Hurwitz EL, et al. Manipulation and mobilization for treating chronic low back pain: a systematic review and meta-analysis. Spine J 2018;18:866–879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Coulter ID, Whitley MD, Hurwitz EL, et al. Determining the Appropriateness of Spinal Manipulation and Mobilization for Chronic Low Back Pain Indications and Ratings by a Multidisciplinary Expert Panel. Santa Monica, CA: RAND Corporation; 2018 [Google Scholar]
  • 40.Shekelle PG, Adams AH, Chassin MR, et al. The appropriateness of spinal manipulation for low-back pain : indications and ratings by an all-chiropractic expert panel. Santa Monica, CA: RAND; 1992 [Google Scholar]
  • 41.Shekelle PG, Adams AH, Chassin MR, et al. The Appropriateness of Spinal Manipulation for Low-Back Pain: Project Overview and Literature Review. Santa Monica, CA: RAND Corporation; 1991 [Google Scholar]
  • 42.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer; 2016 [Google Scholar]
  • 43.Quinlan JR. C4.5: Programs for Machine Learning. Burlington, MA: Morgan Kaufmann Publishers; 1993 [Google Scholar]
  • 44.Park RE, Fink A, Brook RH, et al. Physician ratings of appropriate indications for three procedures: theoretical indications vs indications used in practice. Am J Public Health 1989;79:445–447 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data File (.doc, .tif, pdf, etc.)

RESOURCES