. 2010 Jan 20;2010(1):CD001977. doi: 10.1002/14651858.CD001977.pub2

for the main comparison.

Acupuncture compared with sham acupuncture for peripheral joint osteoarthritis
Patient or population: Patients with peripheral joint osteoarthritis Settings: Intervention: Acupuncture Comparison: Sham acupuncture
Outcomes	*Illustrative comparative risks (95% CI)**		Relative percent change	No of Participants   (studies)	Quality of the evidence   (GRADE)	Comments
	Assumed risk	Corresponding risk
	Control	Acupuncture
Pain (short term) WOMAC scale from 0 to 20 points (higher is worse pain). Follow up: 8 weeks	The mean pain (short term) in the control groups was ‐2.66 points¹	The mean pain (short term) in the intervention groups was 0.92 lower (1.48 to 0.36 lower)²	‐10.32%³	1835   (9 studies)	++OO   low^(4,5)	SMD ‐0.28 (‐0.45 to ‐0.11) Absolute percent difference: ‐4.59% (0.92 point lower on a 0‐20 point scale)⁶
Function (short term) WOMAC scale from 0 to 68 points (higher is worse function). Follow up: 8 weeks	The mean function (short term) in the control groups was ‐7.86 points¹	The mean function (short term) in the intervention groups was 2.70 lower (4.44 to 0.87 lower)²	‐8.63%³	1767   (8 studies)	++OO   low⁽⁴⁾	SMD ‐0.28 (‐0.46 to ‐0.09) Absolute percent difference: ‐3.97% (2.70 points lower on a 0‐68 point scale)⁶
Pain (long term) WOMAC scale from 0 to 20 points (higher is worse pain). Follow up: 26 weeks	The mean pain (long term) in the control groups was ‐2.92 points¹	The mean pain (long term) in the intervention groups was 0.36 lower (0.75 lower to 0.04 higher)²	‐4.06%³	1399   (4 studies)	++++   high	SMD ‐0.10 (‐0.21 to 0.01) Absolute percent difference: ‐1.81% (0.36 point lower on a 0‐20 point scale)⁶
Function (long term) WOMAC scale from 0 to 68 points (higher is worse function). Follow up: 26 weeks	The mean function (long term) in the control groups was ‐9.94 points¹	The mean function (long term) in the intervention groups was 1.21 lower (2.43 lower to 0 higher)²	‐3.89%³	1398   (4 studies)	++++   high	SMD ‐0.11 (‐0.22 to 0) Absolute percent difference: ‐1.79% (1.22 points lower on a 0‐68 point scale)⁶
Adverse events	See comment	See comment	Not estimable	‐	See comment	Eight RCTs described adverse events across groups, and they found that the frequency of adverse events was similar between the acupuncture and control groups. The frequency of adverse events in the acupuncture group ranged from 0% (Sangdee 2002) to 7% (Berman 2004). Pooling of adverse events across these RCTs was not possible because of limited reporting and heterogeneous methods. No serious adverse events were reported to be associated with acupuncture.
Side effects of acupuncture‐Bruising and bleeding at injection site	See comment	See comment	Not estimable	‐	See comment	The frequency of minor side effects of acupuncture, primarily minor bruising and bleeding at needle insertion sites, ranged from 0% (Foster 2007) to 45% (Sangdee 2002). These frequencies varied widely because of heterogeneous and scanty reporting and different definitions of what constitutes a side effect of acupuncture versus what is an inherent part of treatment (for example, occasional bruising at needle insertion site).
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).     CI: Confidence interval
GRADE Working Group grades of evidence   High quality: Further research is very unlikely to change our confidence in the estimate of effect.   Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.   Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.   Very low quality: We are very uncertain about the estimate.

¹ The representative trial selected for calculating the percent changes from baseline was the Berman 2004 trial because this trial was sufficiently large, and because the patient characteristics and the baseline mean and SD of the control group for this trial was most similar to, and thus most representative of, the other trials.

² We calculated the main difference by choosing the Berman 2004 trial as a representative study, and then calculating the difference by multiplying the SMD by the SD (standard deviation) of the mean change in the control group in this study.

³ We calculated the relative percent change by multiplying the SMD by the standard deviation of change in the control group of the Berman 2004 trial, dividing the result by the baseline mean in the control group of the Berman 2004 trial, and multiplying by 100 to obtain the percent.

⁴ We could not be certain that the shams used in three of the sham‐controlled trials (Sangdee 2002*; Vas 2004; Berman 2004) were sufficiently credible in fully blinding participants to the treatment being evaluated.

⁵ There was statistically significant heterogeneity of effect estimates between the two substrata for the following four variables for the pain outcome: success of blinding (Yes/Not sure); likely physiological activity of sham control (Yes/No); use of electrical stimulation of needles (Yes/No); and adequate number of acupuncture sessions (Yes/No).

⁶ We calculated the absolute percent change by multiplying the SMD by the standard deviation of change in the control group of the Berman 2004 trial, dividing the result by the number of units in the scale, and multiplying by 100 to obtain the percent.

⁷ There was statistically significant heterogeneity of effect estimates between the trials (I² = 73%).