Skip to main content
Osteoarthritis and Cartilage Open logoLink to Osteoarthritis and Cartilage Open
. 2021 Jul 22;3(3):100195. doi: 10.1016/j.ocarto.2021.100195

Exercise therapy with or without other physical therapy interventions versus placebo interventions for osteoarthritis –Systematic review

BJF Dean a,b,, J Collins a,∗∗, N Thurley c, I Rombach a, K Bennell d
PMCID: PMC9718284  PMID: 36474820

Abstract

Objective

To evaluate whether exercise therapy, with or without other physical therapy interventions, is superior to placebo intervention for osteoarthritis (OA).

Design

Systematic review and meta-analysis. Data sources: MEDLINE and EMBASE via OVID, CINAHL and SPORTDiscus via EBSCO were searched from inception to February 2021. Study selection: Randomised controlled trials (RCTs) of adults with OA investigating an intervention involving exercise therapy with a placebo comparator. Data extraction and analysis: Data were extracted and checked for accuracy and completeness by pairs of reviewers. Primary outcomes were self-reported pain, function and quality of life (QoL). Comparative treatment effects were analysed by random effects model for short- and longer-term follow up. Methodological quality was evaluated using the Cochrane risk of bias tool, and the Grading of Recommendations Assessment system was used to evaluate the certainty of evidence.

Results

13 RCTs involving 1079 patients were identified and included. Meta-analysis demonstrated improved pain (10 studies (GRADE low certainty), SMD -1.1 (95%CI -1.7 to −0.4)) and function (8 studies (GRADE low certainty), SMD -0.8 (95%CI -1.5 to −0.2)) in the short-term with exercise versus placebo, but no significant difference in the longer-term (pain 3 studies; function 3 studies).

Conclusion

Current evidence demonstrates that exercise therapy is superior to placebo in the short-term for pain and function in OA. The certainty of this evidence is low to very low and further research is very likely to have an important impact on our confidence in the estimate of effects.

Keywords: Osteoarthritis, Exercise, Physical therapy, Placebo, Sham, Systematic review

1. Introduction

Osteoarthritis (OA) affects approximately 7 ​% of the world's population and is a leading cause of disability [1]. Osteoarthritis has significant negative consequences including those on the individual in terms of pain, impaired function and reduced quality of life (QoL), as well as wider economic and societal impacts [1,2]. The most commonly affected joints include the hip, knee and hand [3,4]. Core treatments recommended by clinical guidelines include education and exercise [5]. Previous systematic reviews and meta-analyses have assessed the effectiveness of exercise therapy in OA for specific joints, demonstrating some short-term benefits of exercise [6,7].

When investigating the effectiveness of a specific intervention, such as exercise therapy, a number of different randomized controlled trial (RCT) designs may be employed. Important aspects of trial design include whether the specific intervention is combined with other interventions and the nature of the control ‘comparator’ arm(s). The placebo or sham control is generally seen as a ‘gold standard’, particularly when the outcomes are subjective patient-reported measures of pain and function due to the significant influence of the placebo effect [[8], [9], [10]]. Other types of control may include no treatment, usual care, enhanced usual care, attention control, and usual care in combination with another intervention [11]. While there are many systematic reviews investigating effects of exercise in OA, these have included studies with mixed comparator controls [6,7,12]. To our knowledge, none have specifically investigated effectiveness of exercise when judged exclusively from placebo controlled trials.

Given the potential for non-placebo controlled trials to overestimate the effectiveness of treatment for OA, we therefore aimed to determine the effectiveness of exercise therapy when restricted to placebo controlled trials. Specificially, we wished to evaluate whether exercise therapy, with or without other physical therapy interventions, is superior to placebo intervention for OA. Primary outcomes were self-reported pain, physical function and quality of life (QoL).

2. Methods

This systematic review is reported in accordance with the PRISMA statement, using methodology described in the Cochrane Handbook for Systematic Reviews of Interventions (ref). The protocol was developed prospectively and peer reviewed locally before registration on the PROSPERO database (CRD 42019154589).

2.1. Data sources and searches

A comprehensive search strategy was created in collaboration with a research librarian (NT) and was designed to capture all relevant articles (Appendix 1). The full search strategy is detailed in Appendix 1. The search strategy was applied to the following bibliographic databases from database inception until August 3, 2019 and later repeated until March 4, 2021: MEDLINE and EMBASE via OVID, CINAHL and SPORTDiscus via EBSCO.

2.2. Inclusion/exclusion criteria

The inclusion and exclusion criteria were defined prospectively during the protocol stage. Only RCTs, including cluster RCTs, involving participants with symptomatic joint OA and aged ≥18 years were included. There was no restriction on how OA was diagnosed. The intervention was exercise therapy for OA with or without other physical therapy interventions. Exercise therapy was defined as any land-based non-perioperative therapeutic exercise regimen aimed at relieving the symptoms of OA, regardless of content, duration, frequency or intensity [6]. Other physical therapy interventions included ones such as manual therapy, acupuncture, taping, stretching, heat/cold therapy, electrical stimulation and ultrasound. The RCT had to involve a placebo intervention as a comparator. Placebo interventions were defined as interventions of no plausible therapeutic effect which had to be defined as such by the study, with the aim being to control for study aspects such as attention and expectation of benefit. Specifically studies involving interventions descibed as ‘attention control’ were not included.

2.3. Selection of studies

Duplicates were removed and relevant studies identified from the search were imported into Covidence for screening. Studies were independently screened by title and abstract by two authors (B.J.F.D. and J.C.). The references of all included studies and all relevant review articles on the topic were also reviewed to identify other potential studies for inclusion. This was followed by a full-text evaluation of the selected studies from the first selection step by these authors. Disagreement between the two reviewers was solved by consensus involving a third author (K·B.).

2.4. Data extraction

Two reviewers (J.C. and B.J.F.D) independently extracted data. Data were extracted using a custom data extraction sheet in Covidence (http://www.covidence.org). The custom data extraction sheet was specifically designed to extract data relating to study design, details relating to the interventions (exercise therapy and placebo) undertaken and details regarding the other treatment undergone by trial participants alongside the described interventions. We prioritised using data from between group comparisons over data from within group comparisons, and prioritised change scores over absolute follow up scores. When data were not directly reported in the article, they were calculated from other available data when possible. Any inconsistencies between the two reviewers’ forms were resolved by consensus discussion. A third review (K·B.) was available for any disagreement that could not be resolved by this initial discussion.

If data were not available from full-text articles or trial registrations, authors were contacted to provide this information. If authors were not contactable as regards additional data, then this aspect of the study was excluded from the data synthesis. If contactable authors did not respond to initial requests, they were sent two subsequent reminders over a minimum of 6 weeks. If there was still no response for the additional data, then this aspect of the study was excluded from the data synthesis.

2.5. Risk of bias and quality assessment

Included studies were assessed for risk of bias by two independent raters (B.J.F.D. and J.C.) using the Cochrane Collaboration's tool for assessing risk of bias in randomised trials. This followed the description in the Cochrane Handbook for Systematic Review of Interventions, version 5.1 (Part 2: 8.5.1). Reporting content was assessed using the CERT checklist for exercise therapy and the TIDieR checklist. Any disagreements between ratings were resolved by discussion between the raters. A third party (K·B.) was available in any case where disagreements persisted after discussion.

The Grading of Recommendations, Development and Evaluation (GRADE) approach was used to rate the overall certainty/quality of the body of evidence in each pooled analysis [13]. The certainty/quality of evidence was defined as the following: (1) high certainty/quality—the authors have a lot of confidence that the true effect is similar to the estimated effect; the Cochrane risk of bias tool identified no risks of bias and all domains in the GRADE classification were fulfilled; (2) moderate certainty/quality—the authors believe that the true effect is probably close to the estimated effect, and one of the domains in the GRADE classification was not fulfilled; (3) low certainty/quality—the true effect might be markedly different from the estimated effect; two of the domains were not fulfilled in the GRADE classification; and (4) very low certainty/quality—The true effect is probably markedly different from the estimated effect; three of the domains in the GRADE classification were not fulfilled [14]. Two reviewers (B.J.F.D. and J.C. assessed these factors for each outcome and agreed by consensus.

2.6. Outcomes

Patient-reported pain, physical function and QoL were the primary outcomes of interest. Outcomes were grouped as short-term (<6 months after interventions had been completed) and longer-term (≥6 months (≥24 weeks) after interventions had been completed). Where more than one time point existed for either short-term or long-term outcomes, the outcome nearest to the end of the intervention was used.

2.7. Data analysis

Descriptive analysis was performed for all demographic, intervention and outcome data to facilitate narrative interpretation and comparison across studies. We conducted a meta-analysis when multiple studies (>1) reported on the same outcomes at similar time-points. For trials which provided data for more than one scale for each outcome, we extracted data from the highest according to a hierarchy format for pain, physical function and QoL. Prior to meta-analyses being undertaken, statistical heterogeneity was assessed via the I2 statistic, with values above 30 ​%, 50 ​% and 75 ​% considered moderate, substantial and considerable respectively [15]. Inverse-variance weighted random-effects models using DerSimonian-Laird estimators were used as significant unexplained heterogeneity was found between studies. Standardised mean differences (SMD) with corresponding 95 ​% confidence intervals were generated to account for the different outcome measures across studies. For the pain and physical function outcomes, negative SMDs indicate superior effects for the exercise interventions versus placebo. For the QoL outcomes, positive SMDs indicate superior effects for the exercise interventions. We performed a subgroup analysis of trials in: 1) different joints 2) those where the exercise intervention was not combined with any other physical therapy interventions and 3) those where the exercise intervention was combined with other physical therapy interventions. Pooled SMDs were presented overall, as well as for the subgroups. Forest plots were generated in Stata IC version 15.

3. Patient and public involvement

Patients were not involved in this review.

4. Results

4.1. Study selection

A total of 3130 studies were identified by the search, after duplicates were removed. Following screening by full-text, 13 RCTs were identified as eligible for inclusion (Fig. 1). The number of studies identified and excluded at each stage is detailed in Fig. 1. All included studies were parallel group RCTs.

Fig. 1.

Fig. 1

PRISMA flow diagram.

4.2. Study characteristics

Study characteristics of the included trials including the participant demographics, inclusion criteria, nature of the exercise-based intervention, comparators and outcomes are provided in Tables 1 and 2. The majority (ten trials) involved an exercise intervention which was combined with other physical therapy modalities. The exercise therapy involved a supervised component in nine trials and five trials had no supervision. The specific type, duration and frequency of exercise therapy was variable. The nature of the placebo intervention was inactive ultrasound in five trials, an inactive topical cream in three trials, sham exercise in one trial (exercise machine set to no resistance) inactive transcutaneous electrical nerve stimulation (TENS) in two trials, a placebo massage ball in one trial and inactive photobiomodulation (PBM) in one trial. Only one study did not provide outcome data that could be used in meta-analysis, meaning it was only included descriptively [16].

Table 1.

Study characteristics.

Author and year Total participants (mean age, %female) Inclusion criteria Exercise-based intervention arm(s) (combined-yes/no) Duration of inter-vention (weeks) Comparators Outcomes (primary in bold if specified) Time points (weeks)
Bennell 2014 [20] 102 (64, 61 ​%) Hip OA (ACR), ≥50 years old, pain >3 months duration, average pain >40 ​mm on VAS, moderate difficulty ADLs Exercise, manual therapy gait aid, and education/advice (yes) Sham US VAS pain and WOMAC physical function. Impairments, physical performance, global change, psychological status, and quality of life 0, 13, 36
Bennell 2005 [19] 140 (69, 67 ​%) Knee pain, >50 years old, osteophytes, pain/difficulty raising from chair Exercise, knee taping, soft tissue massage, thoracic spine mobilisation (yes) 12 Sham US VAS movement pain. VAS restriction, WOMAC, KPS, SF-36, AQoL, quads strength, step test. 0, 12, 24
Cheing 2004 [16] 62 (64, 85 ​%) Knee OA (Kellgren and Lawrence grade 2 or above), symptoms> 6 months, stable on medication for three weeks i) Exercise and TENS (yes) ii) Exercise (no) 4 i)Placebo TENS, ii)TENS Isometric peak torque, spatiotemporal gait parameters and range of knee movement 0, 4
Cheing 2002 [24] 62 (64, 85 ​%) Knee OA (Kellgren and Lawrence grade 2 or above), symptoms> 6 months, stable on medication for three weeks i) Exercise and TENS (yes) ii) Exercise (no) 4 i)Placebo TENS, ii)TENS VAS Pain 0, 4
Deyle 2000 [21] 83 (61, 57 ​%) Knee OA (Altman criteria) Exercise and manual therapy (yes) 4 Placebo US Distance walked in 6 ​min and WOMAC score 0, 4, 8
Foroughi 2011 [29] 54 (64, 100 ​%) Knee OA (ACR criteria), >40 years old; in stable health, female Exercise (no) 26 Sham exercise Dynamic shank and knee adduction angles and knee adduction moment of most symptomatic knee. Muscle strength, gait speed, and osteoarthritis symptoms (WOMAC pain and total score). 0, 26
Krauss 2014 [30] 218 (59, 43 ​%) Hip OA (ACR crtieria), age 18–85 years Exercise and ultrasound (yes) 12 i)Placebo US, WOMAC pain, physical function and stiffnes, SF-36 health questionnaire domains 0, 12
Merritt 2012 [31] 27 (66.9, 96 ​%) Pain in thumb CMC joint, > 30 years old, independent in self-care and positive grind test Exercise, joint protection and orthoses (yes) 4 Sham topical cream AUSCAN pain, stiffness and function 0, 4
Rogers 2012 [32] 44 (70, 61 ​%) Knee OA (ACR criteria), ≥50 years, knee pain on most days of previous month, minimum disability score of 17 points on WOMAC Physical Function subscale i) Exercise (no) ii) Exercise (no) 8 Sham topical cream WOMAC, Human Activity Profile, exercise self efficacy, self-reported knee stability, 15 ​m get up and go 0, 4, 8
Rogers 2009 [17] 76 (75, 87 ​%) Radiographic OA in at least one hand joint, ≥50 years and an unspecified minimum AUSCAN physical function subscale score Exercise (no) 16 i)Sham topical cream AUSCAN physical function subscale, AUSCAN pain and stiffness, grip and pinch strength 0, 16
Stoffer-Marx 2018 [33] 151 (59.6, 84 ​%) Hand OA (ACR criteria), hand pain of minimum 3 points on 11-point Likert scale at two time points (baseline and intervention) Exercise, information, pain management and assistive devices (yes) 8 i)Placebo massage ball and routine care Grip strength after 8 weeks, AUSCAN, VAS Pain, satisfaction, health status, Jebsen-Taylor hand function subtests 0, 8
Vassão 2019 [34] 62 (63, 100 ​%) Knee OA (ACR criteria), knee pain in previous 6 months, aged 55–70 years, grade 2/3 (Kellgren–Lawrence), BMI 22–35 kg/m2, >2 points on Numeric Rating Pain Scale, and classified as low/irregularly active Exercise and PBM (yes) 8 i)Placebo PBM, ii) exercise ​+ ​placebo PBM VAS Pain, lower limb muscle strength, mean distance walked in 6 ​min, timed up and go test 0, 8
Villafane 2013 [18] 60 (82, 90 ​%) Stage 3/4 thumb CMC joint OA in dominant hand Exercise and manual therapy (yes) 4 Sham US VAS Pain, tip pinch and grip strength, pressure pain thresholds 0, 4

Abbreviations: ACR – American College of Radiology, EMG – electromyography, VAS – visual analogue score, WOMAC - Western Ontario and McMaster Universities Osteoarthritis Index, KPS – knee pain scale, AQoL – assessment of QoL, TENS – transcutaneous electrical nerve stimulation, AUSCAN – Australian/Canadian Osteoarthritis Hand Index, BMI – body mass index, PBM – photobiomodulation, US – ultrasound, CMC- carpometacarpal

Table 2.

Summary of study interventions.

Author and year Interventions Delivery Provider Setting Supervised or unsupervised Detailed description of exercise and sham intervention Frequency Personalisation Adherence assessment Adherence actual Blinding reported
Bennell 2014 [20] Exercise Individual PT Clinic and home Mixed Strengthening of hip abductors and quadriceps, stretching and hip range of motion, and functional balance and gait drills. 4 to 6 exercises 4 × per week Semi-standardised and progressed Patient logbook 9.6/10∗ Yes – 66 ​% of placebo group remained blinded
Sham US Individual PT Clinic and home Mixed Inactive ultrasound and inert gel lightly applied to the anterior and posterior hip region. Time not specified. 3 × week NA Patient logbook 9.4/10∗
Bennell 2005 [19] Exercise Individual PT Home Mixed Exercises to retrain the quadriceps, hip, and back muscles, and balance exercise. 3 × daily Semi-standardised and progressed Patient logbook 95 ​% appointments and 72 ​% home exercise Yes – James blinding index >0.5 ​at ​week 13
Sham US Individual PT Home Mixed Sham ultrasound and light application of non-therapeutic gel 1 × weekly for 1 month and then 1 × fortnight 1 month NA NA NI
Cheing 2004 [16] and Cheing 2002 [24] Exercise Individual NI Clinic Supervised Isometric strengthening using dyanomometer for 30 ​mins. This included a warm up, isometric quadriceps and hamstring exercises in a variety of knee positions. 5 × per week NI NA NA NI
Placebo TENS Individual NI Clinic Supervised Placebo TENS using identical machine 5 × per week NA NA NA NI
TENS ​+ ​exercise Individual NI Clinic Supervised Conventional TENS for 60 ​min followed by 30 ​min of exercise involving isometric strengthening using dyanomometer for 30 ​mins 5 × per week As above and below NA NA NI
TENS Individual NI Clinic Supervised Conventional TENS for 60 ​min continuous trains of 140 μs square pulses at 80 ​Hz 5 × per week Adjusted relative to sensory threshold NA NA NI
Deyle 2000 [21] Exercise Individual PT Clinic and home Supervised Active knee range-of-motion exercises, hip and knee muscle strengthening exercises, lower limb muscle stretching, and stationary cycling. 2 × per week Increased as patient tolerated NA NA NI
Placebo US Individual PT Clinic Supervised Subtherapeutic ultrasound for 10 ​min at an intensity of 0.1 ​W/cm2 and 10 ​% pulsed mode 2 × per week NA NA NA NI
Foroughi 2011 [29] Exercise Individual NI Clinic Supervised Progressive resistance training exercises at 80 ​% of peak muscle strength using Keiser machines including unilateral knee extension, standing hip abduction and adduction; and bilateral knee flexion, leg press, and plantar-flexion. 3 × per week Increased resistance as tolerated NA NA NI
Sham exercise Individual NI Clinic Supervised Sham exercises on the same equipment as the intervention group except without hip adduction, and performed knee extension bilaterally. Minimal resistance was set on the machine. 3 × per week No progression NA NA NI
Krauss 2014 [30] Exercise Group and individual PT Clinic and Home Mixed THüKo exercise therapy - exercises to strengthen the muscles and improve proprioception, balance and flexibility. 1 × per week group and 2 × per week individual home exercise NI Study and patient log 93 ​% group and 95 ​% home exercise NI
Placebo US, Individual PT Clinic US for 15 ​min at subtherapeutic level 1 × per week NA NA NA NI
Merritt 2012 [31] Exercise Individual PT (hand therapist) Home Unsupervised Specific exercises for thumb web space, thumb stability exercises and isolated thumb blocking, if indicated 3 × per day NI Patient log 87 ​% adherence NI
Sham topical cream Individual PT (hand therapist) Home Unsupervised Sham topical cream applied. Time not specified. 2 × per day NA Patient log 87 ​% adherence NI
Rogers 2012 [32] Exercise (KBA) Individual Therapist Clinic and home Mixed (first 3 sessions supervised) KBA utilized walking agility exercises plus single-leg static and dynamic balancing (wedding march, backwards wedding march, side stepping, semi-tandem walk, tandem walk, cross-over walk, modified grapevine, toe walking, heel walking, static balance, dynamic balance) 3 × per week Increased time and repetions as able Patient log 95.3 ​% NI
Exercise (RT) Individual Therapist Clinic and home Mixed (first 3 sessions supervised) Resistance Training (RT) involved Thera-Band non-latex elastic resistance bands (Seated: Ankle extension, ankle flexion, knee extension, knee flexion, hip abduction, hip adduction, hip internal rotation, hip external rotation, leg press (hip and knee extension) Standing: Hip hyper-extension) 3 × per week Increased resistance by changing bands Patient log 96.4 ​% NI
Exercise (KBA ​+ ​RT) Individual Therapist Clinic and home Mixed (first 3 sessions supervised) KBA and RT as detailed above. 3 × per week As below Patient log 98.6 ​% NI
Sham topical cream Individual Therapist Clinic and home Unsupervised Daily inert topical cream to affected area. Time not specified. 1 × per day NA Patient log 97 ​% NI
Rogers 2009 [17] Exercise Individual Therapist Home Unsupervised Exercise intervention which included nine exercises involving range of motion, gripping and pinching, including use of a non-latex polymer ball for around 10–15 ​min per session. 1 × per day Repetitions increased sequentially as tolerated NI NI NI
Sham topical cream Individual Therapist Home Unsupervised Inert hand cream applied without massage. Time not specified. 1 × per day NA NI NI NI
Stoffer-Marx 2018 [33] Exercise Individual HCPs Home Unsupervised Exercise program consisting of making small fist, lateral pinch, O-sign, spread fingers and therapy putty exercises. 1 × per day Repetitions increased sequentially as tolerated Assessor inspection of putty 5 ​% judged not to have used putty at all NI
Placebo massage ball and routine care Individual HCPs Home Unsupervised Massage ball rolled gently on palmar and dorsal sides of hand. Time not specified. 1 × per day NA NI NI NI
Vassão 2019 [34] Exercise Individual PT Clinic Supervised Exercise program which included warming up on treadmill, 6 strength exercises (SLR-seated leg raise), gluteal bridge (hip lift), hip abductors chair, hip adductors chair, knee extensors chair, knee flexors chair), and stretching of major muscle groups 2 × per week Load progressed based on 2 weekly assessment NA NA NI
Placebo PBM Individual PT Clinic Supervised Turned off photobiomodulation to medial and lateral region of affected knee for 40 ​s 2 × per week NA NA NA NI
exercise ​+ ​placebo PBM Individual PT Clinic Supervised Combination of exercise and placebo PBM as described above 2 × per week Load progressed based on 2 weekly assessment NA NA NI
Villafane 2013 [18] Exercise Individual Therapist Clinic Supervised Hand exercises including range-of-motion, grip and pinch strength exercises, including use of a non-latex polymer ball. 3 × per week Progressed based on resistance and ability to increase repetitions NA NA NI
Sham US Individual Therapist Clinic Supervised inactive doses of pulsed ultrasound with an intensity of 0 ​W/cm and gentle application of an inert gel for 10 ​min to the hypothenar area of symptomatic hand 3 × per week NA NA NA NI

Abbreviations: US – ultrasound, NA – not applicable, NI – not indicated, TENS – transcutaneous electrical nerve stimulation, PBM – photobiomodulation.

•- denotes aself rated dherence measured on an 11 point numeric rating scale.

4.3. Risk of bias and quality assessment

Fig. 2 shows the risk of bias summary and Appendix 2 is the risk of bias graph. There was a high risk of reporting bias in most studies (nine trials), which was frequently related to a failure to specify a primary outcome. Selection bias was generally low with only one trial at high risk relating to random sequence generation [17]. Six trials were at high risk of detection bias relating to a failure to adequately blind outcome assessment. Fig. 3 shows the GRADE summary of findings relating to the meta-analyses. The certainty or quality of evidence was all either ‘low’ or ‘very low’, notably inconsistency was a consistent reason for downgrading.

Fig. 2.

Fig. 2

Risk of bias summary. Review authors' judgements about each risk of bias item for each included study. (Note that ‘blinding of participants and personnel’ relates to solely participants).

Fig. 3.

Fig. 3

GRADE summary of findings table.

4.4. Results of individual studies and synthesis of results

4.4.1. Pain

Fig. 4, Fig. 5 show the Forest plots for pain in the short- and longer-term respectively. Exercise was superior to placebo/sham in the short-term (828 participants (10 studies), SMD -1.1 (95%CI -1.7 to −0.4)) but not in the longer-term (237 participants (3 studies), SMD -0.1 (95%CI -0.4 to 0.2)). Subgroup analysis demonstrated that pain was improved in the short-term in hand OA (4 studies, SMD -2.4 (95%CI -4.1 to −0.8)) and knee OA (4 studies, SMD -0.9 (95%CI -1.8 to 0.0), but not hip OA (2 studies, SMD -0.1 (95%CI -0.6 to 0.5)). Appendix 4 shows the Forest plot comparing studies which combined exercise with other physical therapy modalities (combined) to those which did not (non-combined) for pain in the short term. The effect size for combined interventions was larger than that for non-combined interventions (8 studies, SMD -1.3 (95%CI -2 to −0.5) vs 3 studies, SMD -0.7 (95%CI -1.3 to −0.1)). The quality/certainty of evidence (GRADE) was rated as ‘low certainty/quality’ due to trial limitations and inconsistency. The statistical heterogeneity was considerable, I2 ​= ​93.8 ​% (Fig. 4).

Fig. 4.

Fig. 4

Forest plot of pain in short term with subgroups knee vs hip vs knee.

Fig. 5.

Fig. 5

Forest plot of pain in longer term.

Sensitivity analyses: Appendix 5 shows the Forest plot for pain in the short-term with the Villafane study removed (18). This sensitivity analysis was performed to assess the robustness in the pooled results, as the Villafane study was shown to be a substantial outlier in terms of its SMD. Exercise remained superior to placebo but the effect size was substantially reduced (SMD -0.5 (95%CI -0.8 to −0.1)). Appendix 6 shows the Forest plot for combined versus non-combined studies with the Villafane study removed. This reduced the effect size for combined interventions (SMD -0.5 (95%CI -0.9 to 0.0)). The quality/certainty of evidence (GRADE) was rated as ‘low quality/certainty’ due to trial limitations and inconsistency. The statistical heterogeneity was considerable (I2 ​= ​80.7 ​%).

4.4.2. Physical function

Fig. 6 and Appendix 3 show the Forest plots for function in the short- and longer-term respectively. Exercise was superior to placebo/sham for function in the short-term (8 studies, SMD -0.8 (95%CI -1.5 to −0.2)) but not in the longer-term (3 studies, SMD -0.5 (95%CI -1.4 to 0.4)). The quality/certainty of evidence (GRADE) for short-term was rated as ‘low certainty/quality’ due to trial design and inconsistency and statistical heterogeneity was considerable (I2 ​= ​93.8 ​%). The quality/certainty of evidence (GRADE) for the longer-term was rated as ‘very low certainty/quality’ due to trial design, imprecision and inconsistency, and the statistical heterogeneity was considerable (I2 ​= ​90.3 ​%).

Fig. 6.

Fig. 6

Forest plot of physical function in short term with subgroups knee vs hip vs knee.

Subgroup analysis demonstrated no obvious difference in effect between different joints with the 95%CIs for knee (3 studies, SMD -2.2 (95%CI -5.0 to 0.6)), hip (2 studies, SMD -0.2 (95%CI -0.7 to 0.3)) and hand (3 studies, SMD -0.1 (95%CI -0.6 to 0.3)) all overlapping zero.

Sensitivity analyses: Appendix 7 shows the Forest plot for function in the short-term with Deyle study removed due to it being a substantial outlier (SMD -0.2 (95%CI -0.5 to 0.0)).

4.4.3. Quality of life

Appendices 8 and 9 show the Forest plots for QoL in the short- and longer-term respectively. There was no difference in effects on QoL comparing exercise and placebo/sham in the short- (4 studies, SMD -0.2 (95%CI -0.5 to 0.1)) or longer-term (2 studies, SMD 0.8 (95 ​% CI -1.5 to 3.1)).

The quality/certainty of evidence (GRADE) for short-term was rated as ‘very low certainty/quality’ due to trial design, inconsistency and imprecision, while the statistical heterogeneity was substantial (I2 ​= ​69.5 ​%). The quality/certainty of evidence (GRADE) for the longer-term was rated as ‘certainty/low quality’ due to inconsistency and imprecision, while the statistical heterogeneity was considerable (I2 ​= ​98.2 ​%).

5. Discussion

The key finding of this systematic review is that exercise therapy, with or without the addition of other physical therapy interventions, was superior to placebo in the short-term for pain and function in OA. This was also observed in the knee and hand OA subgroups, but not the hip. There appears to be no exercise effect on pain or function in the longer-term and on QoL at any time point. However, the findings are limited by the small number of studies with the majority being at a high risk of bias in at least one domain. The certainty of the evidence was either low or very low.

Our review found very large beneficial effects of exercise on pain and function in the short-term in knee, hip and hand OA studies combined, although the effect was reduced to moderate when two studies with large outlier effects were removed and when investigating exercise in isolation without other physical therapy interventions. The risk of bias of the studies within our review was generally high with only two studies showing overall low risk of bias in all domains [19,20]. Interestingly, these two studies, one in knee OA and one in hip OA, found no effect of exercise combined with other physical therapy modalities on pain or function compared with a placebo intervention involving inactive ultrasound and light application of inert gel. Notably, the two studies with outlying results (Deyle et al. [18]and Villafane et al. [21]) were two of the smallest studies with both having less than 40 participants in each intervention arm. The phenomenon of larger effect sizes with smaller studies, also known as ‘small-study effects’, has been well described previously and relates to many factors including publication bias [22,23]. We felt it appropriate to describe the presence of the outlying studies and their influence, rather than excluding them from the meta-analysis entirely. There was variation with respect to the exercise therapy tested in the included RCTs. All hip and knee OA studies, except Cheing et al. [24], involved supervised exercise to some extent, while only one hand OA study involved supervision [18]. Most studies employed individual rather than group exercise. However, many other aspects of the exercise therapy including the mode, setting, frequency, duration, intensity, generic versus tailored nature, monitoring of adherence and type of concomitant physical therapy interventions were somewhat variable. It is not clear to what extent these factors influenced the exercise effects.

It has been well described that the majority of the symptomatic treatment effect observed for different interventions in OA trials is attributable to ‘non-specific’ contextual factors rather than specific effects [25]. Although some studies have described the ‘placebo effect’ as being the effect size relating to just the placebo control group, this is not strictly correct as the true ‘placebo effect’ is most accurately defined as the difference in effect size between the placebo control group and a ‘no treatment’ control group which does not contain the placebo. Therefore the calculation of the true ‘placebo effect’ requires a three arm RCT, which is far less frequently performed than two arm RCTs. This is of relevance to the findings of this review as by only including studies with a placebo comparator, we attempted to minimise the chances of overestimating the specific treatment effect of the exercise therapy in OA.

It has also been demonstrated that the size of the effect attributable to contextual factors is influenced by a number of factors including the strength of the active treatment, the baseline disease severity, the route of delivery for drugs, and the study sample size [22]. Placebos for drug therapies have greater treatment effects than for non drug therapies, which is likely due to greater patient expectation of benefit as well as potentially more successful blinding. It is difficult to create adequate placebos outside of drug studies, as mimicking more complex interactive interventions poses far more challenges. This is highlighted by the range of placebos used in the studies in our review including sham topical creams, sham electrotherapy modalities such as inactive ultrasound, and sham exercise. As only two studies reported and confirmed the success of blinding, it is possible that blinding failure may have led to our results overestimating the effects of exercise therapy compared with placebo [19,20]. Smaller samples sizes are also associated with smaller placebo effect sizes. This will increase the likelihood of finding a positive treatment effect and increase the likelihood of publication given the bias towards publishing positive and not negative findings. Many studies in our review were small and notably, the study with the largest effect size for short-term pain was also the smallest study [18]. Of the two hugely outlying studies, Deyle et al. was at a high risk of bias in four domains including both blinding domains and did not report on the success of blinding, while the Villafane et al. study also failed to report on the success of blinding [18,21].

To our knowledge, there are no other systematic reviews of exercise in OA that confine analyses to placebo-controlled trials to allow direct comparison with our results. However, we can indirectly compare our joint-specific results to those of other systematic reviews that combine studies with varying comparators such as usual care, no treatment, placebo, or other non-exercise treatment. A Cochrane review for knee OA included a large number of trials and demonstrated a significant benefit of exercise in the short-term on pain (44 trials, SMD 0.49 (95 ​% CI 0.39 to 0.59)) and on physical function (44 trials, SMD 0.52 (95 ​% CI 0.39 to 0.64)) [26]. Our effect sizes were larger although are comparable once the results of the outlier studies were removed. We also found very large exercise effects in hand OA but again, once the results of the outlier study were removed, the effects were comparable to those reported in a Cochrane review of exercise therapy for hand OA (7). The lack of longer-term effects of exercise we found in our review are consistent with findings of other systematic reviews were benefits were reduced or lost over time [27]. It is somewhat surprising that our effect sizes were not smaller than those reported in these reviews given their inclusion of non-placebo comparators which can overestimate treatment effects. However, this may relate to the limited number of placebo controlled studies, generally high risk of bias and the low to very low certainty of evidence. Our assumption that non-placebo controlled trials tend to overestimate the effect of treatments may therefore only apply provided that other aspects of the trials are similar, such as size and risk of bias.

A limitation of this review relates to its scope. We operationally defined a placebo intervention as one in which the study defined the intervention as having no plausible therapeutic effect and with the aim being to control for study aspects such as attention and expectation of benefit. This was felt to be the best way of making this distinction, although no method is without flaws. For example it can be argued that certain placebo interventions may have contained an element of therapeutic exercise. It is also inevitably subjective as to whether an intervention is deemed to have a ‘plausible therapeutic benefit’ as the evidence relating to the effectiveness of specific interventions is often very much open to different interpretations. It is inherently difficult to design a placebo treatment for exercise trials that is realistic and does not have any therapeutic effect, as there is a trade off between creating a convincing placebo intervention from which patients can be adequately blinded and the potential therapeutic effect of sham interventions that involve aspects such as touch or low intensity exercise. It can be argued that a very low dose exercise control group is practically the same as a placebo control group, and a learning point from this review may be that the semantics are not as important as the context surrounding the control intervention. Our definition also resulted in the exclusion of studies with non-placebo ‘attention’ control groups that may have incorporated elements such as education. Our decision to exclude such education control groups was based on research showing that patient education can have beneficial effects [28]. We also included exercise therapy combined with other physical therapy interventions. As such, we cannot isolate the independent effects of exercise in these studies. Another limitation is the degree of confidence we have relating to both the superiority of exercise therapy over placebo and the size of this specific treatment effect. The number of studies was limited and the certainty/quality of evidence based on GRADE was rated low to very low. Furthermore, while the sensitivity analyses demonstrated that exercise therapy was superior to placebo for both pain and function in the short-term, the effect size was considerably reduced to small-to-moderate once sensitivity analyses were performed with the two studies with substantial outliers removed from the analyses. Problematically, the only studies which reported blinding success and which had low risk of bias did not demonstrate superiority of exercise therapy over placebo in the short-term [19,20]. In this context, the results should be interpreted with caution.

6. Conclusions

Analysis of a limited number of studies, with most at high risk of bias in at least one domain, showed that exercise therapy, with or without other physical therapy interventions, was superior to placebo in the short-term, but not longer-term, for pain and function in OA. This effect was observed for knee and hand OA subgroups, but not hip OA. No exercise effects were seen for quality-of-life outcomes. However, the estimates of effects were substantially inflated by two study outliers and the certainty of the evidence was rated low to very low. Further research is therefore very likely to have an important impact on our confidence in the results and is likely to change the estimated effect sizes.

7. Contributorship

BJFD and JC are lead authors for this review and has led the project from the start. BJFD and JC have designed the review, written and submitted the review protocol to Prospero, communicated with the research librarian who carried out the searches, carried out the screening/data extraction and data analysis, and finally written the manuscript. JC and BD carried out the screening and data extraction. KB resolved any conflicts between BD and JC in terms of screening and data extraction. KB and IR have been involved in the development of the study, writing the manuscript and have also reviewed the final manuscript. NT has been involved in the development of the study, carrying our the searches, as well as writing and reviewing the final manuscript.

Data availability

All data underlying the results are available on request by emailing benjamin.dean@ndorms.ox.ac.uk.

Grant information

BJFD is the recipient of the BMA's Doris Hillier Arthritis and Rheumatism research grant.

Funding statement

This work was supported by BMA's Doris Hillier Arthritis and Rheumatism grant.

Declaration of competing interest

All authors have completed the Unified Competing Interest form (available on request from the corresponding author) and declare: no financial relationships with any organisations that might have an interest in the submitted work in the previous three years [or describe if any], no other relationships or activities that could appear to have influenced the submitted work [or describe if any].

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ocarto.2021.100195.

Contributor Information

B.J.F. Dean, Email: benjamin.dean@ndorms.ox.ac.uk.

J. Collins, Email: jeccollins1@gmail.com.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Appendix 1 – Full search histories

mmc1.doc (440.5KB, doc)

figs1.

figs1

Appendix 2 – Risk of bias graph. Review authors' judgements about each risk of bias item presented as percentages across all included studies.

Appendix 3 - Forest plot of physical function in longer term

mmc2.pdf (105KB, pdf)

Appendix 4 – Forest plot of pain in short term for subgroups of combined exercise versus non-combined exercise

mmc3.pdf (115.2KB, pdf)

Appendix 5 – Sensitivity analysis - Forest plot of pain in short term without Villafane

mmc4.pdf (112.5KB, pdf)

Appendix 6 – Sensitivity analysis – Forest plot of pain in short term combined versus non-combined exercise without Villafane

mmc5.pdf (115.1KB, pdf)

Appendix 7- Sensitivity analysis - Forest plot of physical function in short term without Deyle

mmc6.pdf (112KB, pdf)

Appendix 8 – Forest plot of QoL in short term

mmc7.pdf (107.4KB, pdf)

Appendix 9 – Forest plot of QoL in longer term

mmc8.pdf (105.2KB, pdf)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 1 – Full search histories

mmc1.doc (440.5KB, doc)

Appendix 3 - Forest plot of physical function in longer term

mmc2.pdf (105KB, pdf)

Appendix 4 – Forest plot of pain in short term for subgroups of combined exercise versus non-combined exercise

mmc3.pdf (115.2KB, pdf)

Appendix 5 – Sensitivity analysis - Forest plot of pain in short term without Villafane

mmc4.pdf (112.5KB, pdf)

Appendix 6 – Sensitivity analysis – Forest plot of pain in short term combined versus non-combined exercise without Villafane

mmc5.pdf (115.1KB, pdf)

Appendix 7- Sensitivity analysis - Forest plot of physical function in short term without Deyle

mmc6.pdf (112KB, pdf)

Appendix 8 – Forest plot of QoL in short term

mmc7.pdf (107.4KB, pdf)

Appendix 9 – Forest plot of QoL in longer term

mmc8.pdf (105.2KB, pdf)

Data Availability Statement

All data underlying the results are available on request by emailing benjamin.dean@ndorms.ox.ac.uk.


Articles from Osteoarthritis and Cartilage Open are provided here courtesy of Elsevier

RESOURCES