. 2025 Jan 24;110(6):e327921. doi: 10.1136/archdischild-2024-327921

Table 3. Gender affirming hormone therapy versus no gender affirming hormone therapy: evidence from before–after studies.

Outcomes	Anticipated absolute effects^* (95% CI)		No of participants (studies)	Certainty of evidence (GRADE)	What happens
Outcomes	Risk with no GAHT	Risk with GAHT	No of participants (studies)	Certainty of evidence (GRADE)	What happens
GD, short term follow-up assessed with: participant reported Gender Preoccupation and Stability Questionnaire, higher scores indicate higher levels of GD, scale 14–70, follow-up mean 6 months^†	–	Standardised mean change 0.26 lower (1.64 lower to 1.13 higher)	36 (1 non-randomised study)^{19 21}	⨁◯◯◯ Very low^‡^§	The evidence is very uncertain about the effect of GAHT on GD at short term follow-up in natal females
Global function, short term follow-up assessed with: participant reported various scales (RAND Short Form-36 Health Survey, Symptom Checklist-90 Revised, Global Severity Index), higher scores indicate better global function, scale 0 –100, follow-up mean 6 months^†	–	Standardised mean change 0.25 higher (0.09 higher to 0.4 higher)	73 (2 non-randomised studies)^{21 25}	⨁◯◯◯ Very low^§^¶^**^††	The evidence is very uncertain about the effect of GAHT on global function at short term follow-up in natal females
Depression, long term follow-up assessed with: participant reported various scales (Beck Depression Inventory, Hospital Anxiety and Depression Scale), higher scores indicate worse depression, follow-up 18–24 months^‡‡	–	Standardised mean change 0.41 lower (0.65 lower to 0.17 lower)	389 (2 non-randomised studies)^{18 20}	⨁◯◯◯ Very low^§§^¶¶	The evidence is very uncertain about the effect of GAHT on depression at long term follow-up in natal males and females
Sexual dysfunction (ie, vaginal dryness or itch), long term follow-up assessed with: participant report of symptoms, follow-up mean 12 months^‡‡	In 193 participants, a linear regression analysis showed that there was no change from baseline in symptoms of vaginal dryness or itch after receiving GAHT (b=0.053, 95% CI − 0.03 to 0.13)^***		193 (1 non-randomised study)²⁶	⨁◯◯◯ Very low^†††	The evidence is very uncertain about the effect of GAHT on sexual dysfunction (ie, vaginal dryness or itch) at long term follow-up in natal females
Sexual dysfunction (ie, vaginal dryness or itch), short term follow-up assessed with: participant report of symptoms, follow-up mean 6 months^†	In 193 participants (ie, natal females), a linear regression analysis showed that there was no change from baseline in symptoms of vaginal dryness or itch after receiving GAHT (b=− 0.01, 95% CI −0.09 to 0.8)^***		193 (1 non-randomised study)²⁶	⨁◯◯◯ Very low^†††	The evidence is very uncertain about the effect of GAHT on sexual dysfunction (ie, vaginal dryness or itch) at short term follow-up in natal females
Bone mineral density, femoral neck, long term follow-up assessed with: DXA, z-scores, scale − 3 to 3, follow-up mean 12 months^‡‡	Mean bone mineral density, femoral neck, long term follow-up was 0.84	Mean change 0 (0.01 lower to 0)	199 (1 non-randomised study)³⁰	⨁◯◯◯ Very low^‡‡‡	The evidence is very uncertain about the effect of GAHT on bone mineral density, femoral neck, at long term follow-up in natal females
Bone mineral density, hip, long term follow-up assessed with: DXA g/cm², follow-up 12–36 months^‡‡	Mean bone mineral density, hip, long term follow-up was 0.95	Mean change 0.01 higher (0.01 higher to 0.01 higher)	199 (1 non-randomised study)³⁰	⨁◯◯◯ Very low^‡‡‡	The evidence is very uncertain about the effect of GAHT on bone mineral density, hip, at long term follow-up in natal females
Bone mineral density, lumbar spine, long term follow-up assessed with: DXA g/cm², follow-up 12–36 months^‡‡	Mean bone mineral density, lumbar spine, long term follow-up was 1.04	Mean change 0.01 higher (0 to 0.01 higher)	234 (2 non-randomised studies)^{27 30}	⨁◯◯◯ Very low^‡‡‡	The evidence is very uncertain about the effect of GAHT on bone mineral density, lumbar spine, at long term follow-up in natal females
Other outcomes, not measured^§§§	–	–	–	–	–

Grading of recommendations assessment, development and evaluation (GRADE) Working Group grades of evidence. High certainty=we are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty=we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty=our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low certainty=we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

The risk in the intervention group (with 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (with 95% CI).

^†

Short term follow-up: outcome measured at ≤6 months of follow-up.

^‡

Rated down three levels due to risk of bias stemming from prognostic imbalance associated with the observational study design and critical risk of bias due to missing data (ie, 46.75% provided outcome data).

^§

Rated down one level for imprecision because the optimal information size (200) was not met. Low sample size importantly increases the risk of random error.

^¶

Rated down two levels due to risk of bias stemming from prognostic imbalance associated with the observational study design and critical risk of bias due to missing data in one of the two included studies (ie, 46.75% provided outcome data).

^**

Statistically, there was considerable heterogeneity with I2=94% and p<0.01. However, we did not rate down for inconsistency as the overall effect estimate was not importantly affected by the studies contributing to statistical heterogeneity.

^††

Rated down one level for indirectness because one of the two included studies reported the outcome only for natal females.

^‡‡

Long term follow-up: outcome measured at ≥12 months of follow-up.

^§§

Rated down three levels due to risk of bias stemming from prognostic imbalance associated with the observational study design, as well as critical and serious risk of bias due to missing data in the two included studies (ie, 20% and 69% of participants, respectively, provided outcome data).

^¶¶

Statistically, there was considerable heterogeneity with I2=100% and p<0.01. However, we did not rate down for inconsistency as the overall effect estimate was not importantly affected by the studies contributing to statistical heterogeneity.

^***

In the linear mixed model, time was an added categorical variable to detect changes in symptom scores between 0 and 3 months, 0 and 6 months and 0 and 12 months of GAHT. Differences in changes in symptom scores between different administration forms were corrected for baseline differences to avoid regression to the mean. An increase or decrease in symptom scores of 0.2 was considered clinically relevant.

^†††

Rated down three levels due to risk of bias stemming from prognostic imbalance associated with the observational study design and critical risk of bias due to concerns with measurement of the outcome (ie, subjective and self-reported outcome).

^‡‡‡

Rated down three levels due to risk of bias stemming from prognostic imbalance associated with the observational study design and critical risk of bias due to missing data (ie, 48% of participants provided outcome data).

^§§§

Other outcomes: gender dysphoria, sexual dysfunction from physiological perspective (ie, lack of erection, dyspareunia and anorgasmia) and cardiovascular events.

DXA, dual energy x-ray absorptiometry; GAHT, gender affirming hormone therapy; GD, gender dysphoria.