A variety of ingenious new therapies have been introduced for the purpose of relieving back pain. Often, the early reports describing these therapies suggest substantial clinical benefit that is not confirmed by later studies. This occurrence poses questions: What are the hallmarks of a study that will have lasting scientific validity? What are the design features of a good clinical trial? Evaluating the benefit of therapies for back pain has complexities not found in all clinical trials, and specialized strategies are therefore needed for such studies. How can studies be structured to objectively assess a novel treatment of back pain?
Scientific validity is obtained in a clinical trial only when all sources of significant bias have been eliminated or minimized. Bias can be defined as any systematic error due to the design or conduct of a study. In any study of efficacy, biases may occur because of the way patients are selected, treatment is administered, or outcome data are acquired. Investigators in clinical trials must consider these potential sources of bias and must design their studies to minimize them. The Bellwether method is the prospective randomized trial in which outcomes are compared for patients randomized prospectively to treatment groups. Although this strategy minimizes enrollment bias by equilibrating potential confounding effects between groups, it is not the only method for obtaining objective results.
Many of the published studies of back pain therapies can be characterized as cohort studies, in which a series of patients subjected to the same therapy is evaluated. This is a study design without randomization and without a comparison group. A cohort study can provide useful information regarding the costs, complications, and selection criteria for therapy. The problem with a cohort study is that the effects of patient selection, treatment, and natural history of the disorder cannot be distinguished. Therefore, the cohort study is unreliable for measuring the therapeutic effect of a new treatment. Cohort studies of back pain therapies are numerous. Many of them show success rates greater than 80% for treatments such as spinal manipulation, epidural blocks, exercise, and even a placebo procedure (1). In most instances, more rigorous studies fail to confirm the high success rates of the early cohort.
How can high success rates be observed after intervention with ineffective procedures? Positive outcomes in a cohort may be explained by the tendency of patients with persistent symptoms to seek treatment at a time when their symptoms are most severe. For any condition that fluctuates in severity, when patients are monitored after therapy, they tend to have less severe symptoms than when they seek treatment. This tendency is an example of the statistical rule of regression toward the mean. In patients with back pain, fluctuation in severity is the rule. Because of regression toward the mean, patients with back pain treated with a placebo may seem to improve.
To determine the effect of treatment, treated patients must be compared with differently treated or untreated patients. Comparing two differently treated cohorts is one possible study design; however, a comparison of cohorts without the use of randomization may lead to biases that significantly undermine scientific validity. For example, cohorts from different clinics or different institutions may differ in sex, previous treatment, age, or stage at which treatment is initiated. The differences may be recognized by the investigators or may go unrecognized. In general, in retrospective cohort studies, equivalence between the cohorts is not feasible (2). Therefore, the cohort study generally has biases not found in prospective, randomized, controlled trials. The results of blinded randomized controlled trials are more reproducible than the results of retrospective cohort studies (2).
The appropriate comparison treatment to use in a controlled randomized trial of a procedure to treat back pain requires some judicious choices. Although a sham treatment may be scientifically correct, it may be ethically questionable. In a study of back pain, the control might be no treatment or treatment with another procedure. In a benchmark study to evaluate the benefits of lumbar diskectomy for the treatment of disk herniation, Weber (3) evaluated a series of patients, selected those who were considered candidates for diskectomy, and then assigned these patients on the basis of a randomization strategy to surgical treatment or to control treatment without surgical therapy. This study helped to measure the effectiveness of lumbar diskectomy and to establish the natural history of disk herniation. It is not, however, an easy study design to replicate.
Enrollment bias may be the most common flaw in cohort studies (4). If two groups to be compared differ in their initial state (ie, if one group has a factor that results in a better prognosis), the differences in outcomes may reflect these patient factors rather than differences in therapeutic efficacy. Therefore, enrollment criteria should be as concrete and objective as possible to minimize bias. Even the best enrollment criteria usually allow some flexibility. Even with rigid enrollment criteria, bias may still be present if the physicians enrolling patients for one treatment group apply the criteria less stringently for one group than for another treatment group. Ideally, investigators should not themselves have the responsibility to assign patients to the treatment or control group. Randomization of patients to the different treatment groups more effectively eliminates enrollment bias. Randomization may not suffice in every case. If variation in enrollment strategies across physicians is suspected in a randomized trial, physicians can stratify randomization. To minimize enrollment bias, patients should first be enrolled on the basis of explicit criteria and then be assigned, by a randomization strategy, to a treatment or control group.
Enrollment imbalances may be the cause of flawed or misleading results. An example is a randomized trial in which two groups were identified within the placebo-treated group who had significantly different mortality rates. The one group that took the placebo regularly had a different mortality rate than did the group that took the placebo irregularly. Because the mortality rate differences for these two groups of placebo takers cannot be attributed to differences in the placebo dose, the different outcomes must be explained by other factors, which are associated with whether a patient was compliant (1). One may attempt to adjust for the potential biases by controlling for the baseline status of the study participants. In back pain studies, previous surgery, compensation, duration of complaints, and psychologic makeup may be important predictors of outcome (5).
Even in randomized trials, enrollment biases may be present if the study samples are insufficiently large to assure an even distribution of the confounding factors. Therefore, even in a randomized trial, the clinical population must be analyzed in sufficient detail that the possibility of such errors is minimized. Differences in baseline conditions of the patients in the different treatment groups are dealt with by stratification. If the treatment groups are subdivided (stratified) by previous treatment, age, sex, and other factors that are known to be relevant, possible confounds may be identified. In cohort studies, possible enrollment biases may be dealt with by matching individual patients in one cohort to individual patients in another cohort who have the same age, sex, and other relevant features. Multicenter trials raise additional problems for the elimination of differences between treatment and control groups that require additional strategies (6). A detailed description of the enrollment criteria and randomization methods in a scientific report suggests that the investigators have seriously considered the confounding effects of possible enrollment bias.
Multivariate analysis of patient outcomes, although not a substitute for randomization, is a helpful tool to distinguish possible biases and adjust results accordingly. Adjustment for potential biasing factors is absolutely essential in cohort trials, in which there is no randomization. The possible differences between groups may be adjusted for by multivariate regression modeling. To adjust for the potential biases from differences in initial pain severity, previous treatment, overall health status, sex, and age, one could fit a regression model (linear for continuous data and logistic for binary), including these factors as well as treatment as covariates. The effect of treatment from this model would be known as an “adjusted effect.” The effect of treatment that ignores the other factors would be known as “unadjusted.” A comparison of a simple unadjusted analysis with the multivariate analysis will provide some indication regarding the confounding factors in the cohorts.
Enrollment criteria for patients undergoing experimental back pain treatments require special consideration. Individual variation may be especially marked among patients with back pain and disk degeneration. Because of this diversity among patients with back pain, the enrollment criteria must be thoughtfully defined to assure the appropriateness of the group for treatment. Selecting patients because of particular disk morphology is generally less acceptable than selecting patients with specific clinical signs and symptoms, because the relationship between back pain and disk degeneration is complex (7). The use of signs and symptoms, preferably in some standardized manner such as the Quebec Task Force classification, is preferable to the use of morphologic features of the spine (8).
Another major source of bias is the observer or detection bias. Different people assessing clinical outcomes at different times or employing different methods may observe different outcomes. Assuring objectivity and reproducibility in the measurement of outcomes is a major consideration in study design. Assessment of outcomes by investigators who have a stake in the research and who use subjective ratings is the least valid method. Subjective rating with the usual four-step scale (excellent, good, fair, poor) may produce very divergent results in the hands of different investigators (9). The most reliable outcome measures are those that are objective and based on physiology or activity. For example, in studies of back pain, a good question to ask in an outcome questionnaire would be, “Do you use pain medication more or less frequently since your procedure?” and a bad question would be, “Do you feel better or worse since you began treatment?”
To minimize observer bias, various strategies can be used. Double blinding, in which both the physician and the patient are unaware which treatment has been applied, is the best method to avoid observer bias but the most difficult to implement in back pain therapy studies. Double blinding eliminates the possibility that the patient or the doctor manipulates the treatment or biases the reporting. In cohort and retrospective studies, double blinding is virtually impossible. For studies of back pain treatment, the choice of outcomes to measure is challenging because of the variety of signs and symptoms and temporal fluctuations. In cases of back pain, outcomes are typically dissociated; one measure improves while another deteriorates. Outcomes for back pain studies cannot be measured in terms of laboratory or imaging measurements but must rely on patient experience, considering the nature of back disease. Methods for measuring outcomes of this sort are valid and reliable (10). In a clinical trial, a detailed description of the methods of assessing patient outcomes and selection of observers who are independent of the therapeutic team reassures readers that the investigators have attempted to achieve objectivity in their study.
Another potential systematic error is the transfer bias, which is a differential rate of attrition in the treated compared with the control group (11). Complete follow-up is not achieved in all cases. Transfer bias may confound results in that hostile patients and noncompliant or poorly responding patients may be lost to follow-up to a greater degree than other patients. In clinical trials, it cannot be assumed that the success rate is the same in the cases lost to follow-up as in the cases followed. A high attrition rate may suggest a potential bias. The number of study participants lost to follow-up should be reported in any clinical trial. Analyses to evaluate a possible transfer bias should be included in clinical trials. Adjustments may be made or post hoc matching may be used to minimize this source of bias.
Study designs without controls and randomization may answer questions of clinical importance regarding novel treatments for back pain (11), but the randomized clinical trial remains the criterion standard for assessing clinical efficacy. Good examples of randomized, controlled, prospective trials of surgical and nonsurgical treatments for back pain are found in the literature (12, 13). The technical and ethical concerns regarding randomization can usually be overcome by one or another randomization scheme (11). If randomized clinical trials are not feasible for ethical, economic, or logistical reasons and if a cohort design is chosen, the investigators should attempt to achieve the highest degree of scientific validity possible. Matching or adjusted multivariate regression analyses, and other methods, may be used effectively. The performance of clinical trials is difficult and demands the utmost honesty and neutrality from the researcher (7). High standards of investigation, similar to those for conducting clinical trials for cancer or cardiovascular disease, applied to the evaluation of back pain therapies would benefit both patients and physicians who are interested in knowing the benefits of spinal therapies.
Can randomized controlled study designs be used to measure the therapeutic efficacy of intradiskal therapies, such as the one described in this issue of the AJNR? Patients seeking treatment for back pain in a medical facility may be asked to participate in a study and informed that their treatment will be randomized. Those who agree to participate will be screened further. Those meeting inclusion and exclusion criteria of duration, character, and severity of pain who are considered candidates for the experimental treatment are then randomly assigned to the experimental treatment or other treatment(s) (eg, treatment with intradiskal medical ozone, treatment with intradiskal steroids, or treatment with a combination of the two). As a minimum, the treating physicians must not participate in the assignment of patients to treatment groups. Blinding the treating physician to the treatment administered might be an additional option to minimize bias. For example, a technician who alone knows the randomization plan and keeps a log hidden to the other investigators might provide the therapist with the material to inject into the disk. The patient may be blinded to the treatment by not informing him or her which therapy was used. An independent observer, blinded to the treatment group of the patient, evaluates each patient at a specified time point or time points after treatment according to a predetermined and validated instrument such as a questionnaire. Objective measures of a patient’s condition, such as number of analgesics used per day, are included. Ideally, the same observer evaluates each patient or, if this is not feasible, the selected observers are randomly assigned to evaluate patients. The outcomes of the patients assigned to each group are compared with statistical tests. By convention, outcomes are reported in terms of the patients assigned to each group (intention to treat) rather than the number receiving treatment or the number receiving technically sufficient treatment. The effect of technical failures may be analyzed and discussed. Other possible confounding factors are detected, analyzed, and discussed. Patient features such as age, duration of symptoms, and sex in the three groups are tested for differences and their effects on results considered. Drop out rates are recorded and their possible effect analyzed. If statistically significant differences are found between groups, without evidence of important biases, conclusions may be tentatively drawn concerning the relative efficacy of medical ozone therapy versus steroid therapy alone or a combination of the two.
References
- 1.Deyo R. Practice variations, treatment fads, rising disability: do we need a new clinical research paradigm?Spine 1993;18:2153–2162 [DOI] [PubMed] [Google Scholar]
- 2.Hoffman RM, Turner JA, Cherkin DC, Deyo RA, Herron LD. Therapeutic trials for low back pain.Spine 1994;19:2068S–2075S [DOI] [PubMed] [Google Scholar]
- 3.Weber H. Lumbar disk herniation: a controlled, prospective study with ten years of observation.Spine 1983;8:131–140 [PubMed] [Google Scholar]
- 4.Keller RB, Rudicel SA, Liang MH. Outcomes research in orthopaedics.Instr Course Lect 1994;43:599–611 [PubMed] [Google Scholar]
- 5.Deyo RA. Measuring functional status of patients with low back pain.Arch Phys Med Rehabil 1988;69:1044–1053 [PubMed] [Google Scholar]
- 6.Revel M, Payan C, Vallee C, et al. Automated percutaneous lumbar discectomy versus chemonucleolysis in the treatment of sciatica: a randomized multicenter trial.Spine 1993;18:1–7 [DOI] [PubMed] [Google Scholar]
- 7.Weber H. The natural history of disc herniation and the influence of intervention.Spine 1994;19:2234–2238 [DOI] [PubMed] [Google Scholar]
- 8.Loisel P, Vachon B, Lemaire J, et al. Discriminative and predictive validity assessment of the Quebec Task Force classification.Spine 2002;27:851–857 [DOI] [PubMed] [Google Scholar]
- 9.[No authors listed]. Scientific approach to the assessment and management of activity-related spinal disorders: a monograph for clinicians: report of the Quebec Task Force on Spinal Disorders.Spine 1987;12[suppl]:S1–S59 [PubMed] [Google Scholar]
- 10.Deyo RA, Diehl AK. Psychosocial predictors of disability in patients with low back pain.J Rheumatol 1988;15:1557–1564 [PubMed] [Google Scholar]
- 11.Keller RB. Pro: outcomes research is cost effective and critical to the specialty.Spine 1995;20:384–386 [PubMed] [Google Scholar]
- 12.Rasmussen FO, Amundsen T, Vandvik B. Lumbar disk prolapses and radiologic spinal intervention: what do the randomized controlled trials say? [in Norwegian]Tidsskr Nor Laegeforen 1998;118:2470–2480 [PubMed] [Google Scholar]
- 13.Burton AK, Tillotson KM, Cleary J. Single-blind randomised controlled trial of chemonucleolysis and manipulation in the treatment of symptomatic lumbar disc herniation.Eur Spine J 2000;9:202–207 [DOI] [PMC free article] [PubMed] [Google Scholar]