Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Feb 18;172(5):1521–1538. doi: 10.1002/ohn.1173

Efficacy and Patient Satisfaction in Voice Feminization Procedures: A Systematic Review and Meta‐Analysis

Kristopher Lanham 1,2, Bradley A Melnick 1,3, Madeline J O'Connor 1, Angelica Bartler 1, Rolando J Casas Fuentes 1, Kelly C Ho 1, Robert D Galiano 1,
PMCID: PMC12035520  PMID: 39963873

Abstract

Objective

To evaluate the efficacy and quality of life impact of voice feminization interventions in transgender women.

Data Sources

We searched PubMed, EMBASE, Scopus, and Web of Science for RCTs and retrospective studies published between 2008 and 2023 that quantitatively evaluated any voice feminization procedure.

Review Methods

Studies in English reporting quantitative measures of efficacy were included. Studies using qualitative methodology were excluded. Risk of bias was assessed using Cochran tools and a random‐effects model was applied using inverse‐variance pooling. Primary outcomes were fundamental frequency [F0]/speaking fundamental frequency [SF0] and patient‐reported outcomes (PROs) (PROSPERO: CRD42023476192).

Results

Twenty‐four studies involving 893 participants showed significant SF0 improvements for voice therapy (VT, g = 0.86, [0.46, 1.26], P < .0001), Wendler glottoplasty (WG, g = 1.21, [0.65, 1.77], P < .0001), feminization laryngoplasty (FL, g = 3.05, [2.24, 3.86], P < .0001), and laser reduction glottoplasty (LRG, g = 12.28, [8.19, 15.64], P < .0001). PROs indicated enhanced quality of life for VT (g = 1.32, [0.68, 1.96], P < .0001), WG (g = 1.82, [1.07, 2.57], P < .0001), and LRG (g = 1.90, [0.72, 3.07], P = .0015), with a strong correlation between pitch alteration and QoL (r² = 0.83, P = .0001).

Conclusion

Our results indicate that both VT and surgical procedures enhance vocal pitch and strongly correlate with improved QoL. Our findings validate the role of voice feminization in gender‐affirming care, though methodological limitations prevent evaluation of newer interventions.

Keywords: fundamental frequency, gender dysphoria, patient‐reported outcomes, pitch alteration, transgender healthcare, voice feminization, voice therapy, Wendler glottoplasty


Gender care involves a range of medical interventions designed to align appearance and identity in individuals with gender dysphoria. Purely esthetic feminizing procedures generally yield outcomes that both meet transition expectations and can be easily quantified. 1 However, success in voice feminization is a multifactorial concept that can be difficult to quantify.

Voice therapy (VT) is a successful approach with a history of positive outcomes, 2 though its methods of maintaining feminine vocal characteristics require conscious effort to maintain. These techniques may falter during spontaneous speech, coughing, laughing, or reacting in surprise. 3 , 4 More permanent solutions require surgical intervention.

While vocal femininity is influenced by several factors, most surgical interventions target pitch elevation, which has been established as the most significant marker of a feminine voice. 5 Older techniques like cricothyroid approximation (CTA) have mixed results that often require correction, 6 while more contemporary methods like Wendler glottoplasty (WG) consistently increase pitch and enhance quality‐of‐life metrics. 7 Novel approaches, including feminization laryngoplasty (FL) 8 and vocal fold shortening with retrodisplacement of the anterior commissure, 9 aim for improved outcomes, but these approaches lack comprehensive evaluation.

Further still, there is controversy over the correlation between pitch and quality‐of‐life. A recent meta‐analysis found no correlation between these variables 10 despite individual studies consistently reporting improvements in both fundamental frequency (F0) and patient reported outcomes (PROs). This issue is exacerbated by the lack of research utilizing validated, gender care specific PROs. 1

To address these gaps in knowledge, this meta‐analysis aims to evaluate the effectiveness of various voice feminization procedures on improving vocal pitch and quality of life in transgender women, comparing surgical approaches against each other and to non‐surgical VT. We seek to quantify the efficacy of various interventions and hypothesize that surgical interventions will result in greater improvements in both vocal pitch and patient‐reported quality of life outcomes than VT alone, and that changes in pitch correlate positively with enhanced quality of life.

Interventions

VT

VT involves non‐surgical methods where therapists guide patients using vocal techniques to achieve feminine vocal characteristics. It is often used alongside surgical approaches but can also serve as a standalone option for those opting against surgery. This study evaluates VT as both an independent intervention and a baseline for surgical comparisons.

WG

Widely regarded as the standard for vocal feminization surgery, WG involves removing anterior vocal fold tissue and suturing ends to shorten the vocal folds. 11 It is the most extensively researched technique in this review.

CTA

CTA increases vocal cord tension by approximating the cricoid and thyroid cartilages, thereby lengthening the vocal folds and raising the fundamental frequency. 6

FL

FL is a comprehensive surgical method involving the removal of anterior thyroid cartilage and reduction of the vocal folds as in WG, shifting of the anterior commissure, and adjustments to the larynx and supraglottis. This approach aims to feminize both the voice and the appearance of the neck. 12

Vocal Fold Shortening and Retrodisplacement of the Anterior Commissure (VFSRAC)

Building on WG, VFSRAC involves removing the anterior third of the vocal fold membrane completely, forming a web with underlying structures, which retrodisplaces the anterior commissure while maintaining normal laryngeal physiology. 9

Radiesse Injection Augmentation (RADIESSE)

A simple variation of WG, RADIESSE uses Radiesse Voice Gel for approximating vocal folds instead of sutures. 13

Laser Reduction Glottoplasty (LRG)

LRG, along with Laser Assisted Voice Adjustment (LAVA), employs CO2 lasers for reducing vocal fold mass in a longitudinal section while preserving the medial border. LAVA focuses on superficial layer removal to increase pitch through scarring, while LRG involves complete removal of the section and suturing for mass reduction and tightening. Both methods share similar mechanisms and are analyzed together. 6

Methods

Ethical Considerations

This review adheres to the preferred reporting items for systematic reviews and meta‐analyses (PRISMA) guidelines, and articles were selected as described in Figure 1. 14 The protocol for this systematic review was registered in Prospero under registration number CRD42023476192. As this study only included previously published data, no new ethical approval was required for this analysis.

Figure 1.

Figure 1

PRISMA diagram.

Search Strategy

We performed a standardized search on 08/10/2023 in PubMed, EMBASE, Scopus, and Web of Science using the following search string: (“vocal feminization” OR “voice feminization” OR “voice therapy” OR “voice surgery” OR “voice training” OR “glottoplasty” OR “web formation” OR “cricothyroid approximation” OR “feminization laryngoplasty” OR “vocal fold shortening” OR “VFSRAC” OR “endoscopic shortening” OR “laser reduction glottoplasty” OR “laser assisted voice adjustment” OR “retrodisplacement of the anterior commissure”) AND (“outcomes” OR “efficacy” OR “durability” OR “satisfaction” OR “Transsexual voice questionnaire” OR “TVQ” OR “TVQ‐MtF” OR “Voice‐related quality of life” OR “VRQOL”) AND (“transgender” OR “transgendered” OR “transsexual” OR “person, transgendered” OR “persons, transgendered” OR “transgendered person” OR “transgender persons” OR “person, transgender” OR “persons, transgender” OR “transsexual” OR “transgender person” OR “transgenders” OR “transsexual persons” OR “person, transsexual” OR “persons, transsexual” OR “transsexual person” OR “gender identities” OR “identity, gender” OR “gender dysphoria” OR “gender affirming” OR “gender affirmation”).

Inclusion and Exclusion Criteria

Studies were reviewed if they met the following criteria: English language retrospective reviews or randomized controlled trials published between 2008 and 2023 that reported quantitative primary data on objective or subjective measures of efficacy. Other study designs, those published in other languages, or those using only qualitative methodology were excluded. An abstract review was performed by two independent reviewers at random using Rayyan (a systematic review organization platform) and conflicts were resolved by a third. A full text screen was then performed by two independent reviewers. A third independent reviewer resolved any inconsistencies.

Data Abstraction

Studies that met inclusion criteria were randomly assigned to reviewers via Rayyan for data abstraction. The following information was extracted: author, title, year, journal, study type, number of subjects, intervention, mean age, mean duration of hormone therapy, postoperative follow up time, pre‐ and postoperative F0, pre‐ and postoperative SF0, PRO scale used, pre operative PRO measure, postoperative PRO measure, complications, and limitations.

Risk of Bias

Two reviewers independently evaluated risk of bias using Cochrane risk of bias tools. 23 retrospective studies were evaluated using Risk Of Bias In Non‐randomized Studies—of Interventions (ROBINS‐I) and 1 RCT was evaluated using Risk of Bias (RoB).

Intervention Categorization

Surgical techniques within the literature were described using varied terminology; the techniques outlined in each paper were closely scrutinized to determine their appropriate category. Many papers included patients undergoing multiple interventions within the same procedure, such as LRG being performed concurrently with WG. To address each intervention separately, dummy variables for each intervention were assigned for meta‐regression analyses.

Objective Measures

Studies variously reported pre‐ and postintervention SF0 and F0 measurements. SF0 is defined as the habitual frequency at which vocal folds vibrate when speech sounds are produced and is measured as the median frequency recorded during speech. 15 F0 is defined as the average number of oscillations per second at which vocal folds vibrate and is measured during sustained vowel sounds /a or /e. 16 , 17

Reporting of recording methodology between studies varied. In one paper by Casado et al, F0 was undefined but an educated assumption was made based on other work by the same author. 18 Studies reported F0, SF0, both, or neither.

Only studies that reported their standard deviations or raw patient data were kept. Means and standard deviations were calculated manually in cases where studies gave raw data but no means or standard deviations. 18 , 19 , 20 , 21 Study populations were strictly isolated to MtF transgender patients. Again, means and standard deviations of transgender patient groups were calculated manually in cases where non‐transgender patients were excluded. 6 All measurements were reported in Hz.

Given the differences between F0 and SF0, with patients typically registering a higher pitch on sustained vowel sounds than in normal speech, 22 we analyzed both raw changes in SF0 and F0 in Hz and standardized effect sizes for combined analysis of all objective measures. We used Hedge's g to measure effect size, which is ideal for small sample sizes, as it offers a bias‐corrected alternative to Cohen's d to avoid overestimation. 23

Table 1 contains summary statistics for all calculated objective measures.

Table 1.

Summary of Studies for Objective Measures

ID Author Measure Interventions Preop mean SD Postop mean SD Effect size Standard error Variance N patients
2 Chadwick et al 2 SF0 VT 136.30 12.60 162.80 30.20 1.07 6.43 41.39 13
6 Koçak et al 6 F0 CTA, LRG 165.67 7.77 209.33 13.66 2.52 5.73 32.79 6
7 Mastronikolis et al 7 F0 WG 135.80 41.50 206.30 43.90 2.07 5.95 35.45 31
9 Kim 9 F0 VFSRAC 134.60 25.20 208.20 37.30 2.75 1.51 2.27 313
12 Yılmaz et al 24 F0 WG, LRG 150.00 7.00 219.00 7.00 12.44 0.92 0.84 35
12 Yılmaz et al 24 SF0 WG, LRG 146.00 5.00 215.00 7.00 13.49 0.85 0.71 35
13 Casado et al 18 F0 VT, WG, LRG 137.00 9.80 243.00 18.35 7.20 4.25 18.10 10
14 Casado et al 19 F0 WG, LRG 134.05 8.10 196.24 17.37 4.24 4.61 21.29 18
15 Rapoport et al 20 F0 VT, WG 151.47 33.60 212.87 42.86 1.90 7.26 52.77 18
15 Rapoport et al 20 SF0 VT, WG 130.20 32.30 162.60 28.70 1.30 5.62 31.62 18
16 Husain et al 21 F0 WG, LRG 145.00 30.70 245.20 60.00 1.81 19.82 392.74 5
19 Aires et al 25 F0 VT, WG 145.00 15.50 169.60 29.90 0.97 8.33 69.35 7
19 Aires et al 25 SF0 VT, WG 137.70 24.10 185.60 43.80 1.30 12.08 145.92 7
20 D'haeseleer et al 26 F0 WG 140.49 30.86 181.34 33.16 1.61 4.21 17.69 35
20 D'haeseleer et al 26 SF0 WG 128.02 17.43 167.21 31.05 1.70 3.82 14.58 35
21 Mora et al 27 F0 CTA 145.00 24.00 160.00 23.00 0.80 3.38 11.46 53
23 Nerurkar et al 28 F0 VT, WG 153.00 7.10 223.00 29.77 2.40 9.57 91.54 7
24 Yılmaz et al 11 F0 WG 152.00 12.00 195.00 14.00 4.08 1.97 3.88 27
24 Yılmaz et al 11 SF0 WG 158.00 11.00 200.00 15.00 3.80 2.06 4.26 27
25 Brown et al 29 F0 VT 148.00 39.00 175.00 35.00 0.91 5.56 30.93 48
25 Brown et al 29 SF0 VT 133.00 19.00 148.00 21.00 0.93 3.00 9.01 48
34 Anderson 13 F0 WG, RADIESSE 127.78 21.67 238.00 59.78 2.13 14.93 222.96 10
35 Merrick et al 30 F0 VT 135.90 28.30 161.50 24.80 1.14 6.28 39.40 19
36 Nuyen et al 31 SF0 FL 128.40 22.90 196.80 26.00 3.47 3.69 13.59 27
37 Nuyen et al 12 SF0 FL 134.00 18.00 191.00 28.00 2.83 1.58 2.48 162
38 Leyns et al 32 F0 VT 122.40 31.40 170.00 38.56 1.61 7.20 51.85 30
38 Leyns et al 32 SF0 VT 119.90 15.52 148.30 21.23 1.77 3.92 15.35 30
39 Kelly et al 33 SF0 VT 118.60 11.30 146.90 21.00 1.72 4.26 18.19 24
40 Chang et al 34 F0 VT, WG 175.00 28.00 194.00 52.20 0.48 7.23 52.24 28
40 Chang et al 34 SF0 VT, WG 143.00 23.00 163.00 26.00 1.01 3.62 13.14 28
41 Hancock et al 35 F0 VT 136.00 46.00 184.00 34.00 1.41 6.58 43.30 25
41 Hancock et al 35 SF0 VT 122.00 23.00 150.00 29.00 1.30 4.18 17.45 25

Studies evaluating effect sizes for SF0 and F0 across different interventions, including pre‐ and postoperative means, standard deviations (SD), effect size, standard error, variance, and patient counts.

Subjective Measures

As outlined in Table 2, studies utilized various patient reported outcomes, ranging from validated PROs such as the Trans Woman Voice Questionnaire (TWVQ) to unvalidated in‐house surveys based on the Visual Analogue Scale (VAS). Validated PROs in our study are defined as validated QoL surveys specific to transgender women undergoing voice feminization procedures. The unvalidated PROs in our study also measure voice‐related QoL in the context of the patient's perceived vocal femininity and satisfaction, but have not been thoroughly tested to confirm reliability, validity, and responsiveness. 36 Some utilized a VAS with endpoints that described a masculine versus feminine voice, 25 , 26 , 27 , 37 while others focused on general satisfaction with their procedure. 28

Table 2.

Patient‐Reported Outcome (PRO) Measures

PRO Name Description Scale Validated/unvalidated
TWVQ

Trans Woman Voice Questionnaire for individuals who transition from male to female, Likert scale, 30 items, 0–4 scale

Current Gold Standard

TSEQ → TVQ(MtF) → TWVQ

0 (best possible outcome) to 120 (worst possible outcome) Validated
SpFv, SelfFem (VAS) Self‐perceived Femininity of voice, Visual Analogue Scale 1 (most masculine) to 10 (most feminine) Unvalidated
VHI‐10 A shortened, 10‐item version of the Voice Handicap Index 0 (no handicap) to 40 (maximum handicap) Unvalidated
TSEQ

Transgender Self‐Evaluation Questionnaire, Likert scale, 30 items, 0–4 for each item

TVQ(Mtf) and TWVQ are based on this scale

TSEQ → TVQ(MtF) → TWVQ

0 (best possible outcome) to 120 (worst possible outcome) Validated
PA (VAS) Patient Assessment Visual Analogue Scale 1 (feminine voice) to 5 (masculine voice) Unvalidated
VAS Visual Analogue Scale, a measure of subjective characteristics or attitude 1 (masculine) to 100 (feminine) Unvalidated
VHI Voice Handicap Index, 30‐item scale that measures perceived voice‐related handicap, scale from 0‐4 for each item 0 (no handicap) to 120 (maximum handicap) Unvalidated
V‐RQOL Voice‐Related Quality of Life, assesses the impact of voice disorder, 20 items on Likert scale, 1‐5 for each item 0 (best possible outcome) to 100 (worst possible outcome) Unvalidated
TVQ (MtF)

Transsexual Voice Questionnaire Male to Female, a variation of TSEQ that predates TWVQ specifically for MtF transition, 30 items based on 5 subscales, 0‐4 for each item

Older version of TWVQ

TSEQ → TVQ(MtF) → TWVQ

0 (best possible outcome) to 120 (worst possible outcome) Validated
Patient Subjective Satisfaction Score (PSSS) patient's own assessment of their satisfaction with their voice, 1‐10 scale 1 (unsatisfied) to 10 (most satisfied) Unvalidated

PRO measures summary, including descriptions, scales, validation status, and categorization.

Some studies used validated PROs meant to address general voice pathology, such as the Voice Handicap Index (VHI) and Voice Related Quality of Life (V‐RQOL), to measure voice feminization procedure outcomes. These were often used in place of a validated PRO appropriate for gender transition. 6 , 9 , 11 , 24 , 27 By using a general PRO to measure the quality of life of a transitioning patient, the perceived femininity of the voice was never directly addressed. As this is the primary reason for seeking a voice feminization procedure, validated general PROs such as the VHI were categorized with unvalidated PROs for this study due to their incomplete reflection of the patient population. The validation categorization of all PROs is outlined in Table 2.

While two studies accurately utilized the VHI to address general voice health as an adjunct to other validated gender care PROs, 26 , 29 to maintain consistency in analysis and a more conservative approach we included them in the unvalidated PRO category.

As with the frequency measures, all PRO measures were converted to standardized effect sizes using Hedge's g. Many PROs, such as the TWVQ, use an inverted Likert scale where lower scores reflect positive outcomes, while others utilize more traditional scales where higher scores indicate positive outcomes. This directionality was accounted for during effect size calculation and standardized where a lower Hedge's g reflects negative outcomes (ie, perceived masculinity) and higher Hedge's g reflects positive outcomes (ie, perceived femininity) and is outlined for all PROs in Table 2.

Table 3 contains summary statistics for all calculated subjective measures.

Table 3.

Summary of Studies for Subjective Measures

ID Author Measure Interventions Preop mean SD Postop mean SD Effect size Standard error Variance N patients
2 Chadwick et al 2 TWVQ, validated VT 74.90 10.90 54.80 18.60 1.40 3.73 13.92 13
6 Koçak et al 6 V‐RQOL, unvalidated CTA, LRG 60.00 4.33 84.17 7.64 2.49 3.20 10.27 3
9 Kim 9 VHI, unvalidated VFSRAC 57.90 16.31 48.70 16.59 0.72 0.72 0.52 313
12 Yılmaz et al 24 VHI, unvalidated WG, LRG 45.90 2.40 27.50 3.10 8.08 0.38 0.14 35
12 Yılmaz et al 24 TVQ, validated WG, LRG 96.80 17.40 35.60 8.30 4.60 2.20 4.84 35
13 Casado et al 18 TSEQ, validated VT, WG, LRG 70.30 8.52 40.00 6.70 4.52 1.94 3.76 10
14 Casado et al 19 TSEQ, validated WG, LRG 49.50 4.99 20.75 1.75 6.44 1.40 1.97 18
19 Aires et al 25 SpFv, unvalidated VT, WG 2.80 1.80 7.70 2.40 2.48 0.65 0.42 7
19 Aires et al 25 TWVQ, validated VT, WG 98.30 9.20 54.10 25.00 1.95 7.44 55.38 7
20 D'haeseleer et al 26 VHI, unvalidated WG 56.00 22.00 36.00 22.00 1.15 2.88 8.30 70
20 D'haeseleer et al 26 TWVQ, validated WG 84.00 17.00 51.00 17.00 2.45 2.23 4.95 35
21 Mora et al 27 VHI, unvalidated CTA 19.00 9.00 15.00 9.00 0.56 1.29 1.68 106
22 Quinn et al 37 VAS, unvalidated VT 71.97 18.62 29.13 17.24 3.00 2.39 5.72 102
23 Nerurkar et al 28 PSSS, unvalidated VT, WG 4.00 0.48 8.00 1.70 2.47 0.53 0.28 7
24 Yılmaz et al 11 VHI, unvalidated WG 38.00 5.64 24.00 2.65 3.21 0.81 0.66 27
25 Brown at al 29 VHI, unvalidated VT 19.80 10.50 15.00 9.60 0.60 1.51 2.27 48
25 Brown at al 29 TWVQ, validated VT 86.10 24.20 67.90 29.00 0.84 4.06 16.45 48
35 Merrick et al 30 TWVQ, validated VT 90.70 20.10 50.10 11.00 2.55 4.43 19.59 19
36 Nuyen et al 31 SelfFem, unvalidated FL 36.00 41.00 85.00 22.00 1.58 5.78 33.41 27
41 Hancock et al 35 TSEQ, validated VT 87.50 13.40 58.75 28.09 1.32 4.20 17.67 25

Studies assessing effect sizes for validated and unvalidated PROs across different interventions, including pre‐ and postoperative means, standard deviations (SD), effect size, standard error, variance, and patient counts.

Data Analysis

Meta‐Analyses

We conducted a series of meta‐analyses to evaluate the overall effects of all interventions in our data set on both objective and subjective measures, utilizing the metafor package within R. Our analysis covered three main components: raw changes in fundamental frequency for both SF0 and F0, effect size analysis for the objective SF0/F0, and effect size analysis for subjective PROs.

Objective Measures

For objective measures, we first analyzed the raw mean differences in Hz for both SF0 and F0. For each, standard errors were calculated, and weights were assigned based on inverse variance. The data were pooled using a random‐effects model with the Restricted Maximum Likelihood (REML) method to address heterogeneity. Forest plots were generated for both SF0 and F0 to display the mean changes in Hz, weights, and heterogeneity statistics (τ², I², and Q) alongside total patient counts.

We then conducted a similar meta‐analysis utilizing the standardized effect sizes for F0 and SF0 to allow for analysis as a single measure.

Subjective Measures

For the subjective measures, as before, standard errors were calculated and weights were determined based on inverse variance. The same random‐effects model with the REML method was employed to combine the data. A forest plot was generated with weight percentages, heterogeneity statistics, and patient counts.

Meta‐Regressions

Meta‐regressions were conducted to explore the effects of individual interventions on both the objective and subjective outcomes as previously described.

Objective Measures

We first performed a meta‐regression on the mean differences in Hz for both SF0 and F0. We adjusted for multiple interventions, including VT, WG, CTA, FL, VFSRAC, RADIESSE, and LRG. Standard errors were calculated and used to weigh the effect sizes and forest plots were generated. We then repeated the regression using effect sizes for SF0 and F0 using the same methodology.

Subjective Measures

Finally, similar meta‐regression was performed using effect sizes with validated, unvalidated, and all PROs.

Heterogeneity Testing

Due to significant variations expected between surgical techniques, patient populations, and the necessary inclusiveness of our study design, we expected high heterogeneity even after analyzing meta‐regressions. To explore the sources of this heterogeneity further, we devised an additional test.

We calculated the dispersion of changes in F0, SF0, combined SF0/F0, validated PROs, unvalidated PROs, and any PROs within each procedure to analyze the variability of surgical outcomes. We used the standard deviation of the mean difference of each procedure as a measure of variability. In this analysis, a high standard deviation indicates larger dispersion, less predictability, less stability, and greater variability in surgical outcomes, suggesting heterogeneity due to factors such as differences in surgical technique, individual patient physiology, and post‐operative care.

Statistically Significant Surgical Interventions Versus VT

After determining which interventions showed statistical significance in our meta‐regression analyses, we performed another pair of meta‐regressions to compare the efficacy of statistically significant surgical interventions against VT using both objective and subjective measures.

Correlation Analysis

As increasing vocal pitch to the feminine range is the primary reason transitioning patients seek out voice feminization procedures, we analyzed the correlations between changes in SF0/F0 and validated, unvalidated, and any PROs. We ran Pearson's product‐moment correlation analysis of validated PROs versus SF0/F0, unvalidated PROs versus SF0/F0, and any PRO versus SF0/F0.

Risk of Bias Analysis

To assess the impact of overall bias, we excluded studies identified with a severe risk of bias and repeated all meta‐analyses, meta‐regressions, and correlation analyses to evaluate any shifts in effect sizes and heterogeneity metrics. Additionally, a trim and fill analysis was conducted to further examine the potential for publication bias.

Results

Twenty‐four studies were included in the meta‐analysis as listed in Tables 1 and 3. Twenty‐three studies were retrospective analyses and 1 study was a randomized controlled trial. There were a total of 893 unique patients across studies. The number of patients utilizing each intervention and PRO are outlined further in the summary statistics in Supplemental Table 1, available online.

Meta‐Analyses

Objective Measures

As shown in Figure 2, the analysis of SF0 in Hz showed a pooled mean change of 36.02 Hz (95% confidence interval [CI]; 27.16 to 44.88), with significant heterogeneity (τ² = 305.53; I² = 97.6%; Q = 951.94). For F0, the mean change was 53.90 Hz (95% CI; 43.29 to 64.51), with high heterogeneity (τ² = 686.33; I² = 98.30%; Q = 935.71).

Figure 2.

Figure 2

Forest plot of SF0 and F0 changes. Mean change in Speaking Fundamental Frequency (SF0) and Fundamental Frequency (F0) in Hertz (Hz) with 95% confidence intervals across studies. Significant improvements are observed with high heterogeneity (SF0: I² = 97.6%; F0: I² = 98.30%). Total N: SF0 = 479, F0 = 725.

Figure 3 shows the effect size for all objective measures was 2.39 (95% CI; 1.76 to 3.02), also indicating notable variability (τ² = 3.81; I² = 97.20%; Q = 421.89).

Figure 3.

Figure 3

Forest plot of all objective measures using Hedge's g. Bias‐corrected effect sizes (Hedge's g) for SF0/F0 across studies, with 95% confidence intervals. Significant variability is noted (I² = 97.20%). Total N: 1204. The pooled effect size indicates overall vocal pitch improvement post‐intervention.

Subjective Measures

The analysis of all subjective measures shown in Figure 4 yielded a pooled effect size of 2.35 (95% CI; 1.84 to 2.86). The heterogeneity statistics indicated substantial variation among the studies (τ² = 2.13; I² = 96.20%; Q = 384.35).

Figure 4.

Figure 4

Forest plot of all subjective measures using Hedge's g (All PROs). Bias‐corrected effect sizes (Hedge's g) for patient‐reported outcomes (PROs) across studies, with 95% confidence intervals. Significant variability is evident (I² = 96.20%). Total N: 1089. The pooled effect size highlights improvements in quality of life following interventions.

Meta‐Regressions

Objective Measures

As shown in Figure 5, when measuring SF0 in raw Hz, significant improvements were observed for VT (mean difference [MD], 17.27 Hz; 95% CI; 11.66 to 22.87 Hz; P < .0001), WG (MD, 26.02 Hz; 95% CI; 18.52‐33.53 Hz; P < .0001), FL (MD, 62.14 Hz; 95% CI; 51.74‐72.55 Hz; P < .0001), and LRG (MD, 42.98 Hz; 95% CI; 27.27‐58.69 Hz; P < .0001). For F0 in Hz, improvements were seen for VT (MD, 26.46 Hz; 95% CI; 13.87 to 39.05 Hz; P < .0001), WG (MD, 35.57 Hz; 95% CI; 21.48‐49.66 Hz; P < .0001), VFSRAC (MD, 73.60 Hz; 95% CI; 36.91‐110.29 Hz; P < .0001), RADIESSE (MD, 74.65 Hz; 95% CI; 25.74‐123.57 Hz; P = .0028), and LRG (MD, 37.19 Hz; 95% CI; 17.43 to 56.95 Hz; P = .0002). Changes in F0 for CTA were not statistically significant. Heterogeneity measures for the SF0 regression (τ² = 48.89; I² = 76.70%; Q = 52.66) were much smaller than for F0 (τ² = 348.14; I² = 94.10%; Q = 301.84).

Figure 5.

Figure 5

Mean change in hertz (Hz) by Intervention. Mean change in Hz for various interventions, with 95% confidence intervals. Interventions are grouped by Speaking Fundamental Frequency (SF0) and Fundamental Frequency (F0). Significant improvements are noted for most interventions, with high heterogeneity (SF0: I² = 76.70%; F0: I² = 94.10%).

Figure 6 shows pooled effect sizes for objective measures. SF0 showed statistically significant increases for VT (g, 0.86; 95% CI; 0.46 to 1.26; P < .0001), WG (g, 1.21; 95% CI; 0.65 to 1.77; P < .0001), FL (g, 3.05; 95% CI; 2.24 to 3.86; P < .0001), and LRG (g, 12.28; 95% CI; 8.19 to 15.64; P < .0001). F0 effect sizes were increased for WG (g, 1.68; 95% CI; 0.54 to 2.82; P = .0039) and LRG (g, 3.57; 95% CI; 1.72 to 5.42; P = .0002). Heterogeneity was again markedly less for SF0 (τ² = 0.23; I² = 67.50%; Q = 34.68) compared to F0 (τ² = 2.26; I² = 93.20%; Q = 110.78).

Figure 6.

Figure 6

Objective (SF0/F0) effect size by intervention. Effect sizes (Hedge's g) for different interventions targeting SF0 and F0, with 95% confidence intervals. Most interventions demonstrate significant improvements, with notable heterogeneity (SF0: I² = 67.50%; F0: I² = 93.20%).

Subjective Measures

The meta‐regression for subjective measures is shown in Figure 7. Validated PROs showed significant positive effect for both VT (g, 1.47; 95% CI; 0.19 to 2.76; P = .024) and WG (g, 2.73; 95% CI; 0.60 to 4.86; P = .012), though it is notable that LRG was the only other intervention that included validated PROs. Unvalidated PROs showed significant positive effects for VT (g, 1.56; 95% CI; 0.69 to 2.44; P = .0005), WG (g, 1.61; 95% CI; 0.81 to 2.41; P < .0001), and LRG (g, 4.50; 95% CI; 2.32 to 6.67; P < .0001), with LRG showing the largest effect size. Meta‐regression calculated using all PROs showed similar trends, with VT (g, 1.32; 95% CI; 0.68 to 1.96; P < .0001), WG (g, 1.82; 95% CI; 1.07 to 2.57; P < .0001), and LRG (g, 1.90; 95% CI; 0.72 to 3.07; P = .0015) showing statistical significance. Heterogeneity was still very high for the all PRO subjective regression, though lower than the heterogeneity observed for the all PRO subjective meta‐analysis (τ² = 1.43; I² = 91.70%; Q = 248.03 vs τ² = 2.26; I² = 96.25%; Q = 384.35, respectively).

Figure 7.

Figure 7

Subjective (PROs) effect size by intervention. Effect sizes (Hedge's g) for various interventions based on validated, unvalidated, and all patient‐reported outcomes (PROs), with 95% confidence intervals. Significant subjective improvements are seen, with high heterogeneity noted (validated PROs: I² = 91.60%; unvalidated PROs: I² = 90.00%; All PROs: I² = 91.70%).

Heterogeneity Testing

The combined variability or dispersion of changes of all measures by intervention is shown in Figure 8. We showed higher standards of deviation for WG compared to VT for SF0 (WG standard deviation of Hedge's g [SDg] = 6.29; VT SDg = 0.66), F0 (WG SDg = 3.64; VT SDg = 2.22), combined SF0/F0 (WG SDg = 4.49; VT SDg = 1.72), unvalidated PROs (WG SDg = 3.37; VT SDg = 0.99), and any PRO (WG SDg = 2.43; VT SDg = 1.91). Validated PROs were the only measure where VT showed higher standards of deviation than WG (WG SDg = 2.00; VT SDg = 2.40) and while CTA and FL exhibited lower outcome variability overall compared to WG or VT, these two interventions were not represented thoroughly enough in our data set to draw conclusions using this analysis. Overall, these models suggest higher variability in surgical outcomes like WG compared to VT alone.

Figure 8.

Figure 8

Standard deviation of effect sizes (Hedge's g) by Intervention. Depicts the variability (standard deviation) in effect sizes for F0, SF0, combined SF0/F0, and both validated and unvalidated PROs across different interventions. Higher variability is noted for interventions like WG, suggesting a source of heterogeneity.

Statistically Significant Surgical Interventions Versus VT

The meta‐regression findings comparing only statistically significant surgical interventions to VT are shown in Figure 9. For objective measures, VT had an effect size of 0.74 (95% CI: −0.04 to 1.52; P = .0624), while WG, FL, and LRG collectively showed a significant increase in effect size (g, 2.61; 95% CI: 1.84 to 3.38; P < .0001). For subjective PRO measures, VT showed an effect size of 1.37 (95% CI: 0.60 to 2.14; P = .0005), whereas WG and LRG showed a larger effect size (g, 2.51; 95% CI: 1.75 to 3.27; P < .0001), indicating stronger effects with surgical interventions. The heterogeneity for SF0 or F0 was high (τ² = 1.85; I² = 94.50%; Q = 210.95), as were all PROs (τ² = 2.24; I² = 97.10%; Q = 1340.04), again indicating substantial variability across studies.

Figure 9.

Figure 9

Statistically significant surgical interventions versus voice therapy. Effect sizes (Hedge's g) comparing statistically significant surgical interventions (WG, LRG, FL) to voice therapy (VT) for both SF0/F0 and PROs, with 95% confidence intervals. Surgical interventions show greater improvements. High heterogeneity is observed (SF0/F0: I² = 94.50%; All PROs: I² = 97.10%).

Correlation Analysis

As shown in Figure 10, Pearson's product‐moment correlation comparing mean SF0/F0 with validated PROs showed a moderately strong association (r 2 = .6592; P = .05). SF0/F0 versus unvalidated PROs showed a very strong association (r 2 = .9289, P = .0001). Overall correlation between SF0/F0 and all PROs again shows a strong association (r 2 = .8297; P = .0001).

Figure 10.

Figure 10

Correlation Between Mean Objective and Subjective Effect Sizes. This scatter plot shows the correlation between mean objective effect sizes (SF0, F0) and subjective effect sizes (PROs). Strong correlations are observed for unvalidated (r = .93) and all PROs (r = .83), with moderate correlation for validated PROs (r = .66). Lines represent linear regressions for each PRO type.

Risk of Bias Analysis

After excluding studies with overall severe risk of bias, a statistically negligible uniform leftward shift in effect sizes was observed. Overall heterogeneity metrics showed negligible mixed responses as well, with some measures experiencing increased variability while others decreased. Notably, the meta‐regression for validated PROs for VT was the only metric that displayed a significant change, with a pre‐corrected effect size of 1.56 (95% CI; 0.69 to 2.44; P = .0005) shifting to 0.13 (95% CI; −1.15 to 1.40; P = .8462). As shown in Supplemental Figure 1, available online the trim and fill analysis did not suggest any missing studies but still indicated high heterogeneity despite a lack of publication bias, which is in line with the rest of our results.

A GRADE summary of the most relevant findings is outlined in Table 4.

Table 4.

GRADE Summary of Findings Table for Voice Feminization Interventions

Outcome Intervention N patients (studies) Effect size (95% CI) Certainty of evidence (GRADE) Comments
Statistically Significant Objective Measures
Speaking Fundamental Frequency (SF0) Voice Therapy (VT) 193 (8 studies) 0.86 [0.46, 1.26]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 67.5%). No publication bias detected. Significant improvement in SF0 with VT.
Wendler Glottoplasty (WG) 171 (6 studies) 1.21 [0.65, 1.77]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 67.5%). No publication bias detected. Significant improvement in SF0 with WG.
Feminization Laryngoplasty (FL) 189 (2 studies) 3.05 [2.24, 3.86]

⊕⊕⊝⊝

low

Low certainty due to high heterogeneity (I² = 67.5%) and limited study numbers (2 studies). No publication bias detected. Significant improvement in SF0 with FL.
Laser Reduction Glottoplasty (LRG) 35 (1 study) 12.28 [8.92, 15.64]

⊕⊝⊝⊝

very low

Very low certainty due to single study data, potential bias, and high heterogeneity (I² = 67.5%). No publication bias detected. Highly significant improvement in SF0 with LRG.
Fundamental Frequency (F0) WG 276 (12 studies) 1.68 [0.54, 2.82]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 93.2%). No publication bias detected. Significant improvement in F0 with WG.
LRG 35 (5 studies) 3.57 [1.72, 5.42]

⊕⊕⊝⊝

low

Low certainty due to high heterogeneity (I² = 93.2%) and limited study numbers. No publication bias detected. Notable improvement in F0 with LRG.
Statistically Significant Subjective Measures
Validated PROs VT 132 (6 studies) 1.47 [0.19, 2.76]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 91.6%). No publication bias detected. Positive improvement in validated PROs with VT.
WG 126 (5 studies) 2.73 [0.59, 4.86]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 91.6%). No publication bias detected. Strong improvement in validated PROs with WG.
Unvalidated PROs VT 164 (3 studies) 1.56 [0.69, 2.44]

⊕⊕⊝⊝

low

Low certainty due to high heterogeneity (I² = 90.0%) and limited study numbers. No publication bias detected. Positive improvement in unvalidated PROs with VT.
WG 215 (5 studies) 1.61 [0.81, 2.41]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 90.0%). No publication bias detected. Consistent improvement in unvalidated PROs with WG.
LRG 38 (2 studies) 4.50 [2.32, 6.67]

⊕⊕⊝⊝

low

Low certainty due to high heterogeneity (I² = 90.0%) and few studies. No publication bias detected. Notable improvement in unvalidated PROs with LRG.
All PROs VT 296 (9 studies) 1.32 [0.68, 1.95]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 91.7%). No publication bias detected. Positive improvement in overall PROs with VT.
WG 341 (10 studies) 1.82 [1.07, 2.57]

⊕⊕⊕⊝

moderate

Moderate certainty due to high heterogeneity (I² = 91.7%). No publication bias detected. Strong improvement in overall PROs with WG.
LRG 101 (4 studies) 1.90 [0.72, 3.07]

⊕⊕⊝⊝

low

Low certainty due to high heterogeneity (I² = 91.7%) and limited study numbers. No publication bias detected. Positive improvement in overall PROs with LRG.

Effect sizes and GRADE levels for statistically significant interventions. Certainty levels range from very low (LRG) to moderate (VT, WG) across measures of SF0, F0, and PROs, and reflect significant improvements despite high heterogeneity.

Discussion

All of our meta‐analyses showed statistically significant improvement in both objective and subjective measures for all interventions, though there was very high heterogeneity across all studies. While this was expected given the overall inclusiveness of our study design and the inherent variability in surgical procedures and other factors that were too granular to be accounted for in this type of analyses, we believe this top‐level trend of consistent improvement in all measures among all studies is notable.

Our meta‐regression findings show enhancement in vocal pitch by raw Hz across all interventions when assessed by either F0 and SF0, except for CT. The observed effect size analysis largely corroborates the trends shown in the raw Hz findings, though VFSRAC and RADIESSE are no longer statistically significant when bias‐corrected. Taken with the subjective measure results discussed below, this is not surprising given that VFSRAC, RADIESSE, and FL are novel procedures poorly represented in the literature and do not have the statistical power yet to draw firm conclusions about efficacy.

Our analysis also showed that surgical approaches to voice feminization result in larger gains in frequency alteration and patient satisfaction compared to VT alone. This suggests that while VT is a valuable tool for vocal feminization, surgical interventions may offer a greater increase in pitch and ultimate satisfaction by allowing transitioning individuals to attain better gender‐congruent expression.

Our analysis of subjective PRO measures demonstrates positive effect sizes for WG, VT, and LRG. However, it is notable that the differences in QoL between VT and WG are minimal, suggesting comparable patient satisfaction in both non‐surgical and at least certain surgical approaches.

Contrary to a recent study that posited a lack of correlation between F0 and validated PROs, 10 our analysis including both validated and unvalidated PROs offers a different conclusion. Our findings reveal a statistically significant strong correlation between subjective and objective measures in two of our analyses, and a moderately strong correlation in the most stringent validated PRO category. Although there is an expected disparity between validated and unvalidated PROs due to the differences in their methodological rigor, our data firmly establish that improvements in vocal pitch are significantly associated with enhanced quality of life irrespective of the PRO used. Nonetheless, we emphasize that while all interventions primarily target pitch elevation, achieving a feminine voice encompasses a wider array of vocal characteristics 5 than frequency alone.

That said, as illustrated by the lack of comprehensive quality PRO data for novel procedures like FL and VFSRAC, the enduring use of unvalidated instruments highlights a significant gap in standardization of outcome reporting. The ongoing utilization of such instruments, even with the existence of the TWVQ, suggests a collective inertia that warrants attention from clinicians and researchers. This calls for a concerted effort to standardize measuring outcomes.

While our analysis shows correlation between objective and subjective measures, we recognize that patient satisfaction is multi‐faceted and complex. Current subjective PRO measures fail to address the discord that might arise between a patient's self‐perceived vocal femininity and external social validation from third‐parties. 38 , 39 , 40 Such incongruence can pose significant emotional distress to individuals who seek social congruence in their gender expression. 38 Therefore, the authors advocate for an improved assessment framework by working toward standardized third‐party measurement systems. This would complete the triad of what in our view constitutes a truly comprehensive assessment: objective measures, subjective measures, and patient independent third‐party measures. While third‐party tools like the telephone test 40 or machine‐learning algorithms 39 have been proposed, a fully robust approach has yet to be achieved and deserves further attention from the research community.

Limitations and Future Directions

While our meta‐analysis provides some important validation of efficacy for certain voice feminization procedures, there are still several limitations that must be addressed. While comprehensive data were available for VT and WG, the paucity of data on other interventions posed challenges. Particularly, PRO analysis for novel techniques such as VFSRAC and FL could not be thoroughly evaluated due to insufficient evidence. The lack of extensive research on these procedures limits our ability to confirm their purported benefits and warrants further investigation as more data becomes available.

Our analysis faced the inherent limitation of high heterogeneity, as expected in meta‐analyses involving surgical procedures. Attempts to identify underlying causes for this heterogeneity were constrained by the absence of comprehensive information on potentially influential factors such as age, race, or comorbidities. While differences in patient physiology are expected to contribute to variability, we hypothesize that a significant portion of the heterogeneity is ascribable to nuances within surgical techniques—specifically within the WG category as alluded to in our dispersion heterogeneity test. For instance, we observed considerable variability in the ratio of vocal fold de‐epithelialization during glottoplasty from 33% to 50%, 11 , 18 , 19 , 21 , 25 , 41 the use of laser versus microlaryngeal scissors for de‐epithelialization, 29 and in postoperative care protocols including the adjunctive use of VT. It should also be emphasized that WG was the only surgical intervention with enough data to allow analysis of heterogeneity; exploration of other interventions will require more thorough data sets.

Variability in follow‐up durations added complexity to our analysis, which constrained our ability to assess long‐term outcomes. The evaluation of combined therapies was also challenging, as multiple interventions in the same patient population prevented truly isolating individual contributions given the potential additive effects of adjunctive interventions despite the use of comprehensive meta‐regressions. Additionally, the use of unvalidated PROs, although necessary due to the scarcity of validated PROs, introduced uncertainty in effect size comparisons with standardized instruments like the TWVQ. Our findings underscore the need for standardized measures but do not endorse the continued use of unvalidated PROs. Lastly, LRG showed an exceptionally high SF0 effect size due to an unusually low standard deviation recorded by Yilmaz et al, 24 which could skew results, though the overall trend still suggests effectiveness.

Conclusion

Our meta‐analysis provides a comprehensive evaluation of voice feminization procedures at this point in time, demonstrating that surgical interventions yield more significant improvements in vocal pitch than VT, and such pitch enhancements are positively associated with improved quality of life. Despite these findings, inadequate data on novel techniques and reliance on unvalidated PROs indicate a pressing need for standardized outcome measures and the incorporation of third‐party assessments.

Author Contributions

Kristopher Lanham, concept and design, data acquisition, manuscript composition, critical review, statistical analysis, risk of bias analysis; Bradley A. Melnick, data acquisition, manuscript composition, critical review, risk of bias analysis; Madeline J. O'Connor, data acquisition, manuscript composition, critical review, PRISMA review; Angelica Bartler, data acquisition, manuscript composition, critical review; Kelly C. Ho, data acquisition, critical review; Rolando J. Casas Fuentes, data acquisition, statistical analysis; Robert D. Galiano, concept and design, critical review, supervision.

Disclosures

Competing interests

None of the authors have any conflicts of interest to disclose.

Funding

None.

Supporting information

Supplemental Figure 1. Risk of bias and trim and fill analysis. Risk of bias assessment for retrospective studies using ROBINS‐I and for RCTs using RoB. Domains are color‐coded to indicate judgment levels. Trim and fill analysis funnel plot shows no missing studies, with high heterogeneity for both objective (I² = 97.56%) and subjective measures (I² = 94.73%). This suggests robustness without significant publication bias but high heterogeneity.

OHN-172-1521-s001.pdf (237.7KB, pdf)

Supplemental Table 1. Summary of Study Metrics. Summary of study metrics, including total number of unique patients and distribution of patients across interventions.

OHN-172-1521-s002.docx (213.7KB, docx)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data include the specific analysis tools and code used in the study along with all relevant data sets.

References

  • 1. Oles N, Darrach H, Landford W, et al. Gender affirming surgery: a comprehensive, systematic review of all peer‐reviewed literature and methods of assessing patient‐centered outcomes (part 1: breast/chest, face, and voice). Ann Surg. 2022;275(1):e52‐e66. 10.1097/SLA.0000000000004728 [DOI] [PubMed] [Google Scholar]
  • 2. Chadwick KA, Coleman R, Andreadis K, Pitti M, Rameau A. Outcomes of gender‐affirming voice and communication modification for transgender individuals. Laryngoscope. 2022;132(8):1615‐1621. 10.1002/lary.29946 [DOI] [PubMed] [Google Scholar]
  • 3. Yılmaz T, Özer F, Aydınlı FE. Laser reduction glottoplasty for voice feminization: experience on 28 patients. Ann Otol Rhinol Laryngol. 2021;130(9):1057‐1063. 10.1177/0003489421993728 [DOI] [PubMed] [Google Scholar]
  • 4. Gross M. Pitch‐raising surgery in male‐to‐female transsexuals. J Voice. 1999;13(2):246‐250. 10.1016/S0892-1997(99)80028-9 [DOI] [PubMed] [Google Scholar]
  • 5. Simpson AP. Phonetic differences between male and female speech. Lang Linguistics Compass. 2009;3(2):621‐640. 10.1111/j.1749-818X.2009.00125.x [DOI] [Google Scholar]
  • 6. Koçak I, Akpınar ME, Çakır ZA, Doğan M, Bengisu S, Çelikoyar MM. Laser reduction glottoplasty for managing androphonia after failed cricothyroid approximation surgery. J Voice. 2010;24(6):758‐764. 10.1016/j.jvoice.2009.06.004 [DOI] [PubMed] [Google Scholar]
  • 7. Mastronikolis NS, Remacle M, Biagini M, Kiagiadaki D, Lawson G. Wendler glottoplasty: an effective pitch raising surgery in male‐to‐female transsexuals. J Voice. 2013;27(4):516‐522. 10.1016/j.jvoice.2013.04.004 [DOI] [PubMed] [Google Scholar]
  • 8. Kunachak S, Prakunhungsit S, Sujjalak K. Thyroid cartilage and vocal fold reduction: a new phonosurgical method for male‐to‐female transsexuals. Ann Otol Rhinol Laryngol. 2000;109(11):1082‐1086. 10.1177/000348940010901116 [DOI] [PubMed] [Google Scholar]
  • 9. Kim HT. A new conceptual approach for voice feminization: 12 years of experience. Laryngoscope. 2017;127(5):1102‐1108. 10.1002/lary.26127 [DOI] [PubMed] [Google Scholar]
  • 10. Hao Y, Trilles J, Brydges HT, et al. Meta‐analysis of validated quality of life outcomes following voice feminization in transwomen. J Craniofac Surg. 2024;35(1):53‐58. 10.1097/SCS.0000000000009742 [DOI] [PubMed] [Google Scholar]
  • 11. Yılmaz T, Kuşçu O, Sözen T, Süslü AE. Anterior glottic web formation for voice feminization: experience of 27 patients. J Voice. 2017;31(6):757‐762. 10.1016/j.jvoice.2017.03.006 [DOI] [PubMed] [Google Scholar]
  • 12. Nuyen BA, Qian ZJ, Campbell RD, Erickson‐DiRenzo E, Thomas J, Sung CK. Feminization laryngoplasty: 17‐year review on long‐term outcomes, safety, and technique. Otolaryngol Head Neck Surg. 2022;167(1):112‐117. 10.1177/01945998211036870 [DOI] [PubMed] [Google Scholar]
  • 13. Anderson JA. Pitch elevation in trangendered patients: anterior glottic web formation assisted by temporary injection augmentation. J Voice. 2014;28(6):816‐821. 10.1016/j.jvoice.2014.05.002 [DOI] [PubMed] [Google Scholar]
  • 14. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hee Lee J, Humes LE. Effect of fundamental‐frequency and sentence‐onset differences on speech‐identification performance of young and older adults in a competing‐talker background. J Acoust Soc Am. 2012;132(3):1700‐1717. 10.1121/1.4740482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zhu S, Chong S, Chen Y, Wang T, Ng ML. Effect of language on voice quality: an acoustic study of bilingual speakers of Mandarin Chinese and English. Folia Phoniatr Logop. 2022;74(6):421‐430. 10.1159/000525649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Gerhard D. Pitch extraction and fundamental frequency: history and current techniques. Technical Report TR‐CS 2003‐06; 2003.
  • 18. Casado JC, ÓConnor C, Angulo MS, Adrián JA. Glotoplastia de Wendler y tratamiento logopédico en la feminización de la voz en transexuales: resultados de la valoración pre‐ vs. poscirugía. Acta Otorrinolaringol Esp. 2016;67(2):83‐92. 10.1016/j.otoeng.2015.02.003 [DOI] [PubMed] [Google Scholar]
  • 19. Casado JC, Rodríguez‐Parra MJ, Adrián JA. Voice feminization in male‐to‐female transgendered clients after Wendler's glottoplasty with vs. without voice therapy support. Eur Arch Otrhinolaryngol. 2017;274(4):2049‐2058. 10.1007/s00405-016-4420-8 [DOI] [PubMed] [Google Scholar]
  • 20. Rapoport SK, Park C, Varelas EA, et al. 1‐year results of combined modified Wendler glottoplasty with voice therapy in transgender women. Laryngoscope. 2023;133(3):615‐620. 10.1002/lary.30225 [DOI] [PubMed] [Google Scholar]
  • 21. Husain S, Campe L, Mirza N. Modification of Wendler glottoplasty for male to female gender transition. J Voice. 2023:S0892199723000279. In press. 10.1016/j.jvoice.2023.01.028 [DOI] [PubMed] [Google Scholar]
  • 22. Iwarsson J, Hollen Nielsen R, Næs J. Mean fundamental frequency in connected speech and sustained vowel with and without a sentence‐frame. Logoped Phoniatr Vocol. 2020;45(2):91‐96. 10.1080/14015439.2019.1637455 [DOI] [PubMed] [Google Scholar]
  • 23. Enzmann D. Notes on effect size measures for the difference of means from two independent groups: the case of Cohen's d and Hedges’ g. University of Hamburg, Institute of Criminal Sciences; 2015. 10.13140/2.1.1578.2725 [DOI]
  • 24. Yılmaz T. Sequential Wendler glottoplasty and laser reduction glottoplasty for voice feminization. Laryngoscope. 2023;134:1133‐1138. 10.1002/lary.30958 [DOI] [PubMed] [Google Scholar]
  • 25. Aires MM, De Vasconcelos D, Lucena JA, Gomes AOC, Moraes BT. Effect of Wendler glottoplasty on voice and quality of life of transgender women. Braz J Otorhinolaryngol. 2023;89(1):22‐29. 10.1016/j.bjorl.2021.06.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. D'haeseleer E, Papeleu T, Leyns C, Adriaansen A, Meerschman I, Tomassen P. Voice outcome of glottoplasty in trans women. J Voice. 2023:S0892199723000152. In press. 10.1016/j.jvoice.2023.01.013 [DOI] [PubMed] [Google Scholar]
  • 27. Mora E, Cobeta I, Becerra A, Lucio MJ. Comparison of cricothyroid approximation and glottoplasty for surgical voice feminization in male‐to‐female transsexuals. Laryngoscope. 2018;128(9):2101‐2109. 10.1002/lary.27172 [DOI] [PubMed] [Google Scholar]
  • 28. Nerurkar NK, Nagree Z, Malik E, Jahnavi. Vocal outcomes following pitch alteration surgeries. Indian J Otolaryngol Head Neck Surg. 2023;75(4):2741‐2746. 10.1007/s12070-023-03837-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Brown SK, Chang J, Hu S, et al. Addition of Wendler glottoplasty to voice therapy improves trans female voice outcomes. Laryngoscope. 2021;131(7):1588‐1593. 10.1002/lary.29050 [DOI] [PubMed] [Google Scholar]
  • 30. Merrick G, Figol A, Anderson J, Lin RJ. Outcomes of gender affirming voice training: a comparison of hybrid and individual training modules. J Speech Lang Hear Res. 2022;65(2):501‐507. 10.1044/2021_JSLHR-21-00056 [DOI] [PubMed] [Google Scholar]
  • 31. Nuyen B, Qian ZJ, Rakkar M, Thomas JP, Erickson‐DiRenzo E, Sung CK. Diagnosis and management of vocal complications after chondrolaryngoplasty. Laryngoscope. 2023;133(9):2301‐2307. 10.1002/lary.30518 [DOI] [PubMed] [Google Scholar]
  • 32. Leyns C, Daelman J, Adriaansen A, et al. Short‐term acoustic effects of speech therapy in transgender women: a randomized controlled trial. Am J Speech Lang Pathol. 2023;32(1):145‐168. 10.1044/2022_AJSLP-22-00135 [DOI] [PubMed] [Google Scholar]
  • 33. Kelly V, Hertegård S, Eriksson J, Nygren U, Södersten M. Effects of gender‐confirming pitch‐raising surgery in transgender women a long‐term follow‐up study of acoustic and patient‐reported data. J Voice. 2019;33(5):781‐791. 10.1016/j.jvoice.2018.03.005 [DOI] [PubMed] [Google Scholar]
  • 34. Chang J, Brown SK, Hu S, et al. Effect of Wendler glottoplasty on acoustic measures of voice. Laryngoscope. 2021;131(3):583‐586. 10.1002/lary.28764 [DOI] [PubMed] [Google Scholar]
  • 35. Hancock AB, Garabedian LM. Transgender voice and communication treatment: a retrospective chart review of 25 cases. Int J Lang Commun Disord. 2013;48(1):54‐65. 10.1111/j.1460-6984.2012.00185.x [DOI] [PubMed] [Google Scholar]
  • 36. Dacakis G, Davies S, Oates JM, Douglas JM, Johnston JR. Development and preliminary evaluation of the transsexual voice questionnaire for male‐to‐female transsexuals. J Voice. 2013;27(3):312‐320. 10.1016/j.jvoice.2012.11.005 [DOI] [PubMed] [Google Scholar]
  • 37. Quinn S, Oates J, Dacakis G. Perceived gender and client satisfaction in transgender voice work: comparing self and listener rating scales across a training program. Folia Phoniatr Logop. 2022;74(5):364‐379. 10.1159/000521226 [DOI] [PubMed] [Google Scholar]
  • 38. T'Sjoen G, Moerman M, Van Borsel J, et al. Impact of voice in transsexuals. Int J Transgend. 2006;9(1):1‐7. 10.1300/J485v09n01_01 [DOI] [Google Scholar]
  • 39. Bensoussan Y, Park C, Johns M, et al. A comparison of an artificial intelligence tool to fundamental frequency as an outcome measure in people seeking a more feminine voice. Laryngoscope. 2021;131(11):2567‐2571. 10.1002/lary.29605 [DOI] [PubMed] [Google Scholar]
  • 40. Meister J, Kühn H, Shehata‐Dieler W, Hagen R, Kleinsasser N. Perceptual analysis of the male‐to‐female transgender voice after glottoplasty‐the telephone test: perceptual analysis of the transgender voice. Laryngoscope. 2017;127(4):875‐881. 10.1002/lary.26110 [DOI] [PubMed] [Google Scholar]
  • 41. Meister J, Hagen R, Shehata‐Dieler W, Kühn H, Kraus F, Kleinsasser N. Pitch elevation in male‐to‐female transgender persons—the Würzburg approach. J Voice. 2017;31(2):244.e7‐244.e15. 10.1016/j.jvoice.2016.07.018 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure 1. Risk of bias and trim and fill analysis. Risk of bias assessment for retrospective studies using ROBINS‐I and for RCTs using RoB. Domains are color‐coded to indicate judgment levels. Trim and fill analysis funnel plot shows no missing studies, with high heterogeneity for both objective (I² = 97.56%) and subjective measures (I² = 94.73%). This suggests robustness without significant publication bias but high heterogeneity.

OHN-172-1521-s001.pdf (237.7KB, pdf)

Supplemental Table 1. Summary of Study Metrics. Summary of study metrics, including total number of unique patients and distribution of patients across interventions.

OHN-172-1521-s002.docx (213.7KB, docx)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data include the specific analysis tools and code used in the study along with all relevant data sets.


Articles from Otolaryngology--Head and Neck Surgery are provided here courtesy of Wiley

RESOURCES