Skip to main content
. 2020 Aug 13;2020(8):CD005552. doi: 10.1002/14651858.CD005552.pub3

Summary of findings 1. Metformin compared to oral contraceptive pill (OCP) for hirsutism, acne, and menstrual pattern in adult women with polycystic ovary syndrome (PCOS).

Metformin compared to OCP for hirsutism, acne, and menstrual pattern in adult women with PCOS
Patient or population: adult women with PCOS
Setting: Hospital or University Clinics
Intervention: metformin
Comparison: OCP
Outcomes Anticipated absolute effects* (95% CI) Relative effect
(95% CI) № of participants
(studies) Quality of the evidence
(GRADE) Comments
Risk with OCP Risk with metformin
Hirsutism ‐ Clinical F‐G score BMI ≤ 25kg/m2 The mean hirsutism ‐ Clinical F‐G score was 7.5 MD 0.38 higher
(0.44 lower to 1.19 higher) 134
(3 RCTs) ⊕⊝⊝⊝
VERY LOW1,2,3  
BMI > 25 kg/m2 < 30 kg/m2 The mean hirsutism ‐ Clinical F‐G score was 6.44 MD 1.92 higher
(1.21 higher to 2.64 higher) 254 (5 RCTs) ⊕⊕⊝⊝
LOW1,4
 
BMI ≥ 30 kg/m2 The mean hirsutism ‐ Clinical F‐G score was 6.05 MD 0.38 lower
(1.93 lower to 1.17 higher) 85 (2 RCTs) ⊕⊕⊝⊝
LOW1,5
 
Adverse events ‐ Severe Gastro‐intestinal 3 per 1 000 21 per 1 000
(10 to 45) OR 6.42
(2.98 to 13.84) 602
(11 RCTs) ⊕⊕⊝⊝
LOW 6,7  
Others 122 per 1000 27 per 1000
(12 to 57)
OR 0.20
(0.09 to 0.44) 363
(8 RCTs) ⊕⊕⊝⊝
LOW 6,7  
Adverse events ‐ Minor Gastro‐intestinal No trials reported on outcome "Adverse events ‐ Minor ‐ Gastro‐intestinal"
Others No trials reported on outcome "Adverse events ‐ Minor ‐ Others"
Improved menstrual pattern Shortening of intermenstrual days The mean improved menstrual pattern (ie. shortening of intermenstrual days) was 32.4 MD 6.05 higher
(2.37 higher to 9.74 higher) 153
(2 RCTs) ⊕⊕⊝⊝⊝
LOW 4,8  
An initiation of menses or cycle regularity) ‐ ≤
25 kg/m2 1000 per 1 000 1000 per 1 000
(1000 to 1000) OR 0.07
(0.01 to 0.65) 17
(1 RCT) ⊕⊕⊝⊝
LOW 6,7  
An initiation of menses or cycle regularity)‐ BMI > 25 kg/m2 < 30 kg/m2 931 per 1000 669 per 1000 (486 to 817) OR 0.15 (0.07 to 0.33) 129 (3 RCTs) ⊕⊝⊝⊝
VERY LOW 7,8,9
 
An initiation of menses or cycle regularity) ‐ BMI ≥ 30 kg/m2 1000 per 1 000 1000 per 1 000
(1000 to 1000) OR 0.09 (0.01 to 1.62) 18 (1 RCT) ⊕⊝⊝⊝
VERY LOW 6,10  
An initiation of menses or cycle regularity) ‐ BMI not stated 500 per 1000 661 per 1000 (281 to 906) OR 1.95 (0.39 to 9.65) 25 (1 RCT) ⊕⊝⊝⊝
VERY LOW 8,10
 
Acne ‐ Visual analogue scale   The mean acne ‐ Visual analogue scale was 1 MD 0.90 higher
(0.40 lower to 2.20 higher) 34
(1 RCT) ⊕⊕⊝⊝
LOW 11  
BMI (kg/m2) BMI ≤ 25 kg/m2 The mean BMI (kg/m2) was 22.7 MD 0.59 lower
(1.02 lower to 0.17 lower) 451
(9 RCTs) ⊕⊝⊝⊝
VERY LOW
1,12,13
 
BMI > 25 kg/m2 < 30 kg/m2 The mean BMI (kg/m2) was 27.4 MD 0.11 higher
(0.48 lower to 0.7 higher) 353 (8 RCTs) ⊕⊝⊝⊝
VERY LOW
1,14,15
 
BMI ≥ 30 kg/m2 The mean BMI (kg/m2) was 35.1 MD 2.31 lower
(4.40 lower to 0.21 lower) 119 (3 RCTs) ⊕⊝⊝⊝
VERY LOW
1, 15,16
 
*The risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

BMI: body mass index; CI: Confidence interval; F‐G: Ferriman‐Gallwey score; MD: Mean difference; OR: Odds ratio; RCT: Randomised controlled trial.
GRADE Working Group grades of evidenceHigh quality: further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: we are very uncertain about the estimate.

1 Evidence downgraded by one level for serious risk of bias ‐ the majority of the RCTs have unclear or high risk of bias
2 Evidence downgraded by one level for serious inconsistency (I2 = 50%) as unexplained heterogeneity (i.e. heterogeneity not explained by subgrouping of data according to mean study BMI)

3 Evidence downgraded by one level for serious imprecision – 95% CI includes both appreciable effect and little or no effect and low number of participants (total number of participants < 400)

4 Evidence downgraded by one level for serious imprecision – low number of participants (total number of participants < 400)

5 Evidence downgraded by one level for serious imprecision ‐ low number of participants (total number of participants < 400) and 95% CI includes both appreciable benefit and harm

6 Evidence downgraded by one level for serious risk of bias ‐ the majority of the RCTs have unclear risk of bias

7 Evidence downgraded by one level for serious imprecision – low number of events (total number of events < 300)

8 Evidence downgraded by one level for serious risk of bias ‐ the majority of the RCTs have high risk of bias

9 Evidence downgraded by one level for serious inconsistency (I2 = 51%) as unexplained heterogeneity (i.e. heterogeneity not explained by subgrouping of data according to mean study BMI)

10 Evidence downgraded by two levels for very serious imprecision – 95% CI includes both appreciable benefit and harm or no effect and very low number of events (total number of events < 300)

11 Evidence downgraded by two levels for serious imprecision – 95% CI includes both appreciable benefit and harm or no effect and low number of participants (total number of participants < 400)

12 Evidence downgraded by one level for serious inconsistency (I2 = 76%) as unexplained heterogeneity (i.e. heterogeneity not explained by subgrouping of data according to mean study BMI)

13 Evidence downgraded by one level for serious imprecision – 95% CI includes both appreciable effect and little or no effect

14 Evidence downgraded by one level for serious inconsistency (I2 = 72%) as unexplained heterogeneity (i.e. heterogeneity not explained by subgrouping of data according to mean study BMI)

15 Evidence downgraded by one level for serious imprecision ‐ low number of participants (total number of participants < 400) and 95% CI includes both appreciable effect and little or no effect

16 Evidence downgraded by one level for serious inconsistency (I2 = 52%) as unexplained heterogeneity (i.e. heterogeneity not explained by subgrouping of data according to mean study BMI)