Abstract
Objective
To evaluate the efficacy and quality of life impact of voice feminization interventions in transgender women.
Data Sources
We searched PubMed, EMBASE, Scopus, and Web of Science for RCTs and retrospective studies published between 2008 and 2023 that quantitatively evaluated any voice feminization procedure.
Review Methods
Studies in English reporting quantitative measures of efficacy were included. Studies using qualitative methodology were excluded. Risk of bias was assessed using Cochran tools and a random‐effects model was applied using inverse‐variance pooling. Primary outcomes were fundamental frequency [F0]/speaking fundamental frequency [SF0] and patient‐reported outcomes (PROs) (PROSPERO: CRD42023476192).
Results
Twenty‐four studies involving 893 participants showed significant SF0 improvements for voice therapy (VT, g = 0.86, [0.46, 1.26], P < .0001), Wendler glottoplasty (WG, g = 1.21, [0.65, 1.77], P < .0001), feminization laryngoplasty (FL, g = 3.05, [2.24, 3.86], P < .0001), and laser reduction glottoplasty (LRG, g = 12.28, [8.19, 15.64], P < .0001). PROs indicated enhanced quality of life for VT (g = 1.32, [0.68, 1.96], P < .0001), WG (g = 1.82, [1.07, 2.57], P < .0001), and LRG (g = 1.90, [0.72, 3.07], P = .0015), with a strong correlation between pitch alteration and QoL (r² = 0.83, P = .0001).
Conclusion
Our results indicate that both VT and surgical procedures enhance vocal pitch and strongly correlate with improved QoL. Our findings validate the role of voice feminization in gender‐affirming care, though methodological limitations prevent evaluation of newer interventions.
Keywords: fundamental frequency, gender dysphoria, patient‐reported outcomes, pitch alteration, transgender healthcare, voice feminization, voice therapy, Wendler glottoplasty
Gender care involves a range of medical interventions designed to align appearance and identity in individuals with gender dysphoria. Purely esthetic feminizing procedures generally yield outcomes that both meet transition expectations and can be easily quantified. 1 However, success in voice feminization is a multifactorial concept that can be difficult to quantify.
Voice therapy (VT) is a successful approach with a history of positive outcomes, 2 though its methods of maintaining feminine vocal characteristics require conscious effort to maintain. These techniques may falter during spontaneous speech, coughing, laughing, or reacting in surprise. 3 , 4 More permanent solutions require surgical intervention.
While vocal femininity is influenced by several factors, most surgical interventions target pitch elevation, which has been established as the most significant marker of a feminine voice. 5 Older techniques like cricothyroid approximation (CTA) have mixed results that often require correction, 6 while more contemporary methods like Wendler glottoplasty (WG) consistently increase pitch and enhance quality‐of‐life metrics. 7 Novel approaches, including feminization laryngoplasty (FL) 8 and vocal fold shortening with retrodisplacement of the anterior commissure, 9 aim for improved outcomes, but these approaches lack comprehensive evaluation.
Further still, there is controversy over the correlation between pitch and quality‐of‐life. A recent meta‐analysis found no correlation between these variables 10 despite individual studies consistently reporting improvements in both fundamental frequency (F0) and patient reported outcomes (PROs). This issue is exacerbated by the lack of research utilizing validated, gender care specific PROs. 1
To address these gaps in knowledge, this meta‐analysis aims to evaluate the effectiveness of various voice feminization procedures on improving vocal pitch and quality of life in transgender women, comparing surgical approaches against each other and to non‐surgical VT. We seek to quantify the efficacy of various interventions and hypothesize that surgical interventions will result in greater improvements in both vocal pitch and patient‐reported quality of life outcomes than VT alone, and that changes in pitch correlate positively with enhanced quality of life.
Interventions
VT
VT involves non‐surgical methods where therapists guide patients using vocal techniques to achieve feminine vocal characteristics. It is often used alongside surgical approaches but can also serve as a standalone option for those opting against surgery. This study evaluates VT as both an independent intervention and a baseline for surgical comparisons.
WG
Widely regarded as the standard for vocal feminization surgery, WG involves removing anterior vocal fold tissue and suturing ends to shorten the vocal folds. 11 It is the most extensively researched technique in this review.
CTA
CTA increases vocal cord tension by approximating the cricoid and thyroid cartilages, thereby lengthening the vocal folds and raising the fundamental frequency. 6
FL
FL is a comprehensive surgical method involving the removal of anterior thyroid cartilage and reduction of the vocal folds as in WG, shifting of the anterior commissure, and adjustments to the larynx and supraglottis. This approach aims to feminize both the voice and the appearance of the neck. 12
Vocal Fold Shortening and Retrodisplacement of the Anterior Commissure (VFSRAC)
Building on WG, VFSRAC involves removing the anterior third of the vocal fold membrane completely, forming a web with underlying structures, which retrodisplaces the anterior commissure while maintaining normal laryngeal physiology. 9
Radiesse Injection Augmentation (RADIESSE)
A simple variation of WG, RADIESSE uses Radiesse Voice Gel for approximating vocal folds instead of sutures. 13
Laser Reduction Glottoplasty (LRG)
LRG, along with Laser Assisted Voice Adjustment (LAVA), employs CO2 lasers for reducing vocal fold mass in a longitudinal section while preserving the medial border. LAVA focuses on superficial layer removal to increase pitch through scarring, while LRG involves complete removal of the section and suturing for mass reduction and tightening. Both methods share similar mechanisms and are analyzed together. 6
Methods
Ethical Considerations
This review adheres to the preferred reporting items for systematic reviews and meta‐analyses (PRISMA) guidelines, and articles were selected as described in Figure 1. 14 The protocol for this systematic review was registered in Prospero under registration number CRD42023476192. As this study only included previously published data, no new ethical approval was required for this analysis.
Figure 1.

PRISMA diagram.
Search Strategy
We performed a standardized search on 08/10/2023 in PubMed, EMBASE, Scopus, and Web of Science using the following search string: (“vocal feminization” OR “voice feminization” OR “voice therapy” OR “voice surgery” OR “voice training” OR “glottoplasty” OR “web formation” OR “cricothyroid approximation” OR “feminization laryngoplasty” OR “vocal fold shortening” OR “VFSRAC” OR “endoscopic shortening” OR “laser reduction glottoplasty” OR “laser assisted voice adjustment” OR “retrodisplacement of the anterior commissure”) AND (“outcomes” OR “efficacy” OR “durability” OR “satisfaction” OR “Transsexual voice questionnaire” OR “TVQ” OR “TVQ‐MtF” OR “Voice‐related quality of life” OR “VRQOL”) AND (“transgender” OR “transgendered” OR “transsexual” OR “person, transgendered” OR “persons, transgendered” OR “transgendered person” OR “transgender persons” OR “person, transgender” OR “persons, transgender” OR “transsexual” OR “transgender person” OR “transgenders” OR “transsexual persons” OR “person, transsexual” OR “persons, transsexual” OR “transsexual person” OR “gender identities” OR “identity, gender” OR “gender dysphoria” OR “gender affirming” OR “gender affirmation”).
Inclusion and Exclusion Criteria
Studies were reviewed if they met the following criteria: English language retrospective reviews or randomized controlled trials published between 2008 and 2023 that reported quantitative primary data on objective or subjective measures of efficacy. Other study designs, those published in other languages, or those using only qualitative methodology were excluded. An abstract review was performed by two independent reviewers at random using Rayyan (a systematic review organization platform) and conflicts were resolved by a third. A full text screen was then performed by two independent reviewers. A third independent reviewer resolved any inconsistencies.
Data Abstraction
Studies that met inclusion criteria were randomly assigned to reviewers via Rayyan for data abstraction. The following information was extracted: author, title, year, journal, study type, number of subjects, intervention, mean age, mean duration of hormone therapy, postoperative follow up time, pre‐ and postoperative F0, pre‐ and postoperative SF0, PRO scale used, pre operative PRO measure, postoperative PRO measure, complications, and limitations.
Risk of Bias
Two reviewers independently evaluated risk of bias using Cochrane risk of bias tools. 23 retrospective studies were evaluated using Risk Of Bias In Non‐randomized Studies—of Interventions (ROBINS‐I) and 1 RCT was evaluated using Risk of Bias (RoB).
Intervention Categorization
Surgical techniques within the literature were described using varied terminology; the techniques outlined in each paper were closely scrutinized to determine their appropriate category. Many papers included patients undergoing multiple interventions within the same procedure, such as LRG being performed concurrently with WG. To address each intervention separately, dummy variables for each intervention were assigned for meta‐regression analyses.
Objective Measures
Studies variously reported pre‐ and postintervention SF0 and F0 measurements. SF0 is defined as the habitual frequency at which vocal folds vibrate when speech sounds are produced and is measured as the median frequency recorded during speech. 15 F0 is defined as the average number of oscillations per second at which vocal folds vibrate and is measured during sustained vowel sounds /a or /e. 16 , 17
Reporting of recording methodology between studies varied. In one paper by Casado et al, F0 was undefined but an educated assumption was made based on other work by the same author. 18 Studies reported F0, SF0, both, or neither.
Only studies that reported their standard deviations or raw patient data were kept. Means and standard deviations were calculated manually in cases where studies gave raw data but no means or standard deviations. 18 , 19 , 20 , 21 Study populations were strictly isolated to MtF transgender patients. Again, means and standard deviations of transgender patient groups were calculated manually in cases where non‐transgender patients were excluded. 6 All measurements were reported in Hz.
Given the differences between F0 and SF0, with patients typically registering a higher pitch on sustained vowel sounds than in normal speech, 22 we analyzed both raw changes in SF0 and F0 in Hz and standardized effect sizes for combined analysis of all objective measures. We used Hedge's g to measure effect size, which is ideal for small sample sizes, as it offers a bias‐corrected alternative to Cohen's d to avoid overestimation. 23
Table 1 contains summary statistics for all calculated objective measures.
Table 1.
Summary of Studies for Objective Measures
| ID | Author | Measure | Interventions | Preop mean | SD | Postop mean | SD | Effect size | Standard error | Variance | N patients |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Chadwick et al 2 | SF0 | VT | 136.30 | 12.60 | 162.80 | 30.20 | 1.07 | 6.43 | 41.39 | 13 |
| 6 | Koçak et al 6 | F0 | CTA, LRG | 165.67 | 7.77 | 209.33 | 13.66 | 2.52 | 5.73 | 32.79 | 6 |
| 7 | Mastronikolis et al 7 | F0 | WG | 135.80 | 41.50 | 206.30 | 43.90 | 2.07 | 5.95 | 35.45 | 31 |
| 9 | Kim 9 | F0 | VFSRAC | 134.60 | 25.20 | 208.20 | 37.30 | 2.75 | 1.51 | 2.27 | 313 |
| 12 | Yılmaz et al 24 | F0 | WG, LRG | 150.00 | 7.00 | 219.00 | 7.00 | 12.44 | 0.92 | 0.84 | 35 |
| 12 | Yılmaz et al 24 | SF0 | WG, LRG | 146.00 | 5.00 | 215.00 | 7.00 | 13.49 | 0.85 | 0.71 | 35 |
| 13 | Casado et al 18 | F0 | VT, WG, LRG | 137.00 | 9.80 | 243.00 | 18.35 | 7.20 | 4.25 | 18.10 | 10 |
| 14 | Casado et al 19 | F0 | WG, LRG | 134.05 | 8.10 | 196.24 | 17.37 | 4.24 | 4.61 | 21.29 | 18 |
| 15 | Rapoport et al 20 | F0 | VT, WG | 151.47 | 33.60 | 212.87 | 42.86 | 1.90 | 7.26 | 52.77 | 18 |
| 15 | Rapoport et al 20 | SF0 | VT, WG | 130.20 | 32.30 | 162.60 | 28.70 | 1.30 | 5.62 | 31.62 | 18 |
| 16 | Husain et al 21 | F0 | WG, LRG | 145.00 | 30.70 | 245.20 | 60.00 | 1.81 | 19.82 | 392.74 | 5 |
| 19 | Aires et al 25 | F0 | VT, WG | 145.00 | 15.50 | 169.60 | 29.90 | 0.97 | 8.33 | 69.35 | 7 |
| 19 | Aires et al 25 | SF0 | VT, WG | 137.70 | 24.10 | 185.60 | 43.80 | 1.30 | 12.08 | 145.92 | 7 |
| 20 | D'haeseleer et al 26 | F0 | WG | 140.49 | 30.86 | 181.34 | 33.16 | 1.61 | 4.21 | 17.69 | 35 |
| 20 | D'haeseleer et al 26 | SF0 | WG | 128.02 | 17.43 | 167.21 | 31.05 | 1.70 | 3.82 | 14.58 | 35 |
| 21 | Mora et al 27 | F0 | CTA | 145.00 | 24.00 | 160.00 | 23.00 | 0.80 | 3.38 | 11.46 | 53 |
| 23 | Nerurkar et al 28 | F0 | VT, WG | 153.00 | 7.10 | 223.00 | 29.77 | 2.40 | 9.57 | 91.54 | 7 |
| 24 | Yılmaz et al 11 | F0 | WG | 152.00 | 12.00 | 195.00 | 14.00 | 4.08 | 1.97 | 3.88 | 27 |
| 24 | Yılmaz et al 11 | SF0 | WG | 158.00 | 11.00 | 200.00 | 15.00 | 3.80 | 2.06 | 4.26 | 27 |
| 25 | Brown et al 29 | F0 | VT | 148.00 | 39.00 | 175.00 | 35.00 | 0.91 | 5.56 | 30.93 | 48 |
| 25 | Brown et al 29 | SF0 | VT | 133.00 | 19.00 | 148.00 | 21.00 | 0.93 | 3.00 | 9.01 | 48 |
| 34 | Anderson 13 | F0 | WG, RADIESSE | 127.78 | 21.67 | 238.00 | 59.78 | 2.13 | 14.93 | 222.96 | 10 |
| 35 | Merrick et al 30 | F0 | VT | 135.90 | 28.30 | 161.50 | 24.80 | 1.14 | 6.28 | 39.40 | 19 |
| 36 | Nuyen et al 31 | SF0 | FL | 128.40 | 22.90 | 196.80 | 26.00 | 3.47 | 3.69 | 13.59 | 27 |
| 37 | Nuyen et al 12 | SF0 | FL | 134.00 | 18.00 | 191.00 | 28.00 | 2.83 | 1.58 | 2.48 | 162 |
| 38 | Leyns et al 32 | F0 | VT | 122.40 | 31.40 | 170.00 | 38.56 | 1.61 | 7.20 | 51.85 | 30 |
| 38 | Leyns et al 32 | SF0 | VT | 119.90 | 15.52 | 148.30 | 21.23 | 1.77 | 3.92 | 15.35 | 30 |
| 39 | Kelly et al 33 | SF0 | VT | 118.60 | 11.30 | 146.90 | 21.00 | 1.72 | 4.26 | 18.19 | 24 |
| 40 | Chang et al 34 | F0 | VT, WG | 175.00 | 28.00 | 194.00 | 52.20 | 0.48 | 7.23 | 52.24 | 28 |
| 40 | Chang et al 34 | SF0 | VT, WG | 143.00 | 23.00 | 163.00 | 26.00 | 1.01 | 3.62 | 13.14 | 28 |
| 41 | Hancock et al 35 | F0 | VT | 136.00 | 46.00 | 184.00 | 34.00 | 1.41 | 6.58 | 43.30 | 25 |
| 41 | Hancock et al 35 | SF0 | VT | 122.00 | 23.00 | 150.00 | 29.00 | 1.30 | 4.18 | 17.45 | 25 |
Studies evaluating effect sizes for SF0 and F0 across different interventions, including pre‐ and postoperative means, standard deviations (SD), effect size, standard error, variance, and patient counts.
Subjective Measures
As outlined in Table 2, studies utilized various patient reported outcomes, ranging from validated PROs such as the Trans Woman Voice Questionnaire (TWVQ) to unvalidated in‐house surveys based on the Visual Analogue Scale (VAS). Validated PROs in our study are defined as validated QoL surveys specific to transgender women undergoing voice feminization procedures. The unvalidated PROs in our study also measure voice‐related QoL in the context of the patient's perceived vocal femininity and satisfaction, but have not been thoroughly tested to confirm reliability, validity, and responsiveness. 36 Some utilized a VAS with endpoints that described a masculine versus feminine voice, 25 , 26 , 27 , 37 while others focused on general satisfaction with their procedure. 28
Table 2.
Patient‐Reported Outcome (PRO) Measures
| PRO Name | Description | Scale | Validated/unvalidated |
|---|---|---|---|
| TWVQ |
Trans Woman Voice Questionnaire for individuals who transition from male to female, Likert scale, 30 items, 0–4 scale Current Gold Standard TSEQ → TVQ(MtF) → TWVQ |
0 (best possible outcome) to 120 (worst possible outcome) | Validated |
| SpFv, SelfFem (VAS) | Self‐perceived Femininity of voice, Visual Analogue Scale | 1 (most masculine) to 10 (most feminine) | Unvalidated |
| VHI‐10 | A shortened, 10‐item version of the Voice Handicap Index | 0 (no handicap) to 40 (maximum handicap) | Unvalidated |
| TSEQ |
Transgender Self‐Evaluation Questionnaire, Likert scale, 30 items, 0–4 for each item TVQ(Mtf) and TWVQ are based on this scale TSEQ → TVQ(MtF) → TWVQ |
0 (best possible outcome) to 120 (worst possible outcome) | Validated |
| PA (VAS) | Patient Assessment Visual Analogue Scale | 1 (feminine voice) to 5 (masculine voice) | Unvalidated |
| VAS | Visual Analogue Scale, a measure of subjective characteristics or attitude | 1 (masculine) to 100 (feminine) | Unvalidated |
| VHI | Voice Handicap Index, 30‐item scale that measures perceived voice‐related handicap, scale from 0‐4 for each item | 0 (no handicap) to 120 (maximum handicap) | Unvalidated |
| V‐RQOL | Voice‐Related Quality of Life, assesses the impact of voice disorder, 20 items on Likert scale, 1‐5 for each item | 0 (best possible outcome) to 100 (worst possible outcome) | Unvalidated |
| TVQ (MtF) |
Transsexual Voice Questionnaire Male to Female, a variation of TSEQ that predates TWVQ specifically for MtF transition, 30 items based on 5 subscales, 0‐4 for each item Older version of TWVQ TSEQ → TVQ(MtF) → TWVQ |
0 (best possible outcome) to 120 (worst possible outcome) | Validated |
| Patient Subjective Satisfaction Score (PSSS) | patient's own assessment of their satisfaction with their voice, 1‐10 scale | 1 (unsatisfied) to 10 (most satisfied) | Unvalidated |
PRO measures summary, including descriptions, scales, validation status, and categorization.
Some studies used validated PROs meant to address general voice pathology, such as the Voice Handicap Index (VHI) and Voice Related Quality of Life (V‐RQOL), to measure voice feminization procedure outcomes. These were often used in place of a validated PRO appropriate for gender transition. 6 , 9 , 11 , 24 , 27 By using a general PRO to measure the quality of life of a transitioning patient, the perceived femininity of the voice was never directly addressed. As this is the primary reason for seeking a voice feminization procedure, validated general PROs such as the VHI were categorized with unvalidated PROs for this study due to their incomplete reflection of the patient population. The validation categorization of all PROs is outlined in Table 2.
While two studies accurately utilized the VHI to address general voice health as an adjunct to other validated gender care PROs, 26 , 29 to maintain consistency in analysis and a more conservative approach we included them in the unvalidated PRO category.
As with the frequency measures, all PRO measures were converted to standardized effect sizes using Hedge's g. Many PROs, such as the TWVQ, use an inverted Likert scale where lower scores reflect positive outcomes, while others utilize more traditional scales where higher scores indicate positive outcomes. This directionality was accounted for during effect size calculation and standardized where a lower Hedge's g reflects negative outcomes (ie, perceived masculinity) and higher Hedge's g reflects positive outcomes (ie, perceived femininity) and is outlined for all PROs in Table 2.
Table 3 contains summary statistics for all calculated subjective measures.
Table 3.
Summary of Studies for Subjective Measures
| ID | Author | Measure | Interventions | Preop mean | SD | Postop mean | SD | Effect size | Standard error | Variance | N patients |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Chadwick et al 2 | TWVQ, validated | VT | 74.90 | 10.90 | 54.80 | 18.60 | 1.40 | 3.73 | 13.92 | 13 |
| 6 | Koçak et al 6 | V‐RQOL, unvalidated | CTA, LRG | 60.00 | 4.33 | 84.17 | 7.64 | 2.49 | 3.20 | 10.27 | 3 |
| 9 | Kim 9 | VHI, unvalidated | VFSRAC | 57.90 | 16.31 | 48.70 | 16.59 | 0.72 | 0.72 | 0.52 | 313 |
| 12 | Yılmaz et al 24 | VHI, unvalidated | WG, LRG | 45.90 | 2.40 | 27.50 | 3.10 | 8.08 | 0.38 | 0.14 | 35 |
| 12 | Yılmaz et al 24 | TVQ, validated | WG, LRG | 96.80 | 17.40 | 35.60 | 8.30 | 4.60 | 2.20 | 4.84 | 35 |
| 13 | Casado et al 18 | TSEQ, validated | VT, WG, LRG | 70.30 | 8.52 | 40.00 | 6.70 | 4.52 | 1.94 | 3.76 | 10 |
| 14 | Casado et al 19 | TSEQ, validated | WG, LRG | 49.50 | 4.99 | 20.75 | 1.75 | 6.44 | 1.40 | 1.97 | 18 |
| 19 | Aires et al 25 | SpFv, unvalidated | VT, WG | 2.80 | 1.80 | 7.70 | 2.40 | 2.48 | 0.65 | 0.42 | 7 |
| 19 | Aires et al 25 | TWVQ, validated | VT, WG | 98.30 | 9.20 | 54.10 | 25.00 | 1.95 | 7.44 | 55.38 | 7 |
| 20 | D'haeseleer et al 26 | VHI, unvalidated | WG | 56.00 | 22.00 | 36.00 | 22.00 | 1.15 | 2.88 | 8.30 | 70 |
| 20 | D'haeseleer et al 26 | TWVQ, validated | WG | 84.00 | 17.00 | 51.00 | 17.00 | 2.45 | 2.23 | 4.95 | 35 |
| 21 | Mora et al 27 | VHI, unvalidated | CTA | 19.00 | 9.00 | 15.00 | 9.00 | 0.56 | 1.29 | 1.68 | 106 |
| 22 | Quinn et al 37 | VAS, unvalidated | VT | 71.97 | 18.62 | 29.13 | 17.24 | 3.00 | 2.39 | 5.72 | 102 |
| 23 | Nerurkar et al 28 | PSSS, unvalidated | VT, WG | 4.00 | 0.48 | 8.00 | 1.70 | 2.47 | 0.53 | 0.28 | 7 |
| 24 | Yılmaz et al 11 | VHI, unvalidated | WG | 38.00 | 5.64 | 24.00 | 2.65 | 3.21 | 0.81 | 0.66 | 27 |
| 25 | Brown at al 29 | VHI, unvalidated | VT | 19.80 | 10.50 | 15.00 | 9.60 | 0.60 | 1.51 | 2.27 | 48 |
| 25 | Brown at al 29 | TWVQ, validated | VT | 86.10 | 24.20 | 67.90 | 29.00 | 0.84 | 4.06 | 16.45 | 48 |
| 35 | Merrick et al 30 | TWVQ, validated | VT | 90.70 | 20.10 | 50.10 | 11.00 | 2.55 | 4.43 | 19.59 | 19 |
| 36 | Nuyen et al 31 | SelfFem, unvalidated | FL | 36.00 | 41.00 | 85.00 | 22.00 | 1.58 | 5.78 | 33.41 | 27 |
| 41 | Hancock et al 35 | TSEQ, validated | VT | 87.50 | 13.40 | 58.75 | 28.09 | 1.32 | 4.20 | 17.67 | 25 |
Studies assessing effect sizes for validated and unvalidated PROs across different interventions, including pre‐ and postoperative means, standard deviations (SD), effect size, standard error, variance, and patient counts.
Data Analysis
Meta‐Analyses
We conducted a series of meta‐analyses to evaluate the overall effects of all interventions in our data set on both objective and subjective measures, utilizing the metafor package within R. Our analysis covered three main components: raw changes in fundamental frequency for both SF0 and F0, effect size analysis for the objective SF0/F0, and effect size analysis for subjective PROs.
Objective Measures
For objective measures, we first analyzed the raw mean differences in Hz for both SF0 and F0. For each, standard errors were calculated, and weights were assigned based on inverse variance. The data were pooled using a random‐effects model with the Restricted Maximum Likelihood (REML) method to address heterogeneity. Forest plots were generated for both SF0 and F0 to display the mean changes in Hz, weights, and heterogeneity statistics (τ², I², and Q) alongside total patient counts.
We then conducted a similar meta‐analysis utilizing the standardized effect sizes for F0 and SF0 to allow for analysis as a single measure.
Subjective Measures
For the subjective measures, as before, standard errors were calculated and weights were determined based on inverse variance. The same random‐effects model with the REML method was employed to combine the data. A forest plot was generated with weight percentages, heterogeneity statistics, and patient counts.
Meta‐Regressions
Meta‐regressions were conducted to explore the effects of individual interventions on both the objective and subjective outcomes as previously described.
Objective Measures
We first performed a meta‐regression on the mean differences in Hz for both SF0 and F0. We adjusted for multiple interventions, including VT, WG, CTA, FL, VFSRAC, RADIESSE, and LRG. Standard errors were calculated and used to weigh the effect sizes and forest plots were generated. We then repeated the regression using effect sizes for SF0 and F0 using the same methodology.
Subjective Measures
Finally, similar meta‐regression was performed using effect sizes with validated, unvalidated, and all PROs.
Heterogeneity Testing
Due to significant variations expected between surgical techniques, patient populations, and the necessary inclusiveness of our study design, we expected high heterogeneity even after analyzing meta‐regressions. To explore the sources of this heterogeneity further, we devised an additional test.
We calculated the dispersion of changes in F0, SF0, combined SF0/F0, validated PROs, unvalidated PROs, and any PROs within each procedure to analyze the variability of surgical outcomes. We used the standard deviation of the mean difference of each procedure as a measure of variability. In this analysis, a high standard deviation indicates larger dispersion, less predictability, less stability, and greater variability in surgical outcomes, suggesting heterogeneity due to factors such as differences in surgical technique, individual patient physiology, and post‐operative care.
Statistically Significant Surgical Interventions Versus VT
After determining which interventions showed statistical significance in our meta‐regression analyses, we performed another pair of meta‐regressions to compare the efficacy of statistically significant surgical interventions against VT using both objective and subjective measures.
Correlation Analysis
As increasing vocal pitch to the feminine range is the primary reason transitioning patients seek out voice feminization procedures, we analyzed the correlations between changes in SF0/F0 and validated, unvalidated, and any PROs. We ran Pearson's product‐moment correlation analysis of validated PROs versus SF0/F0, unvalidated PROs versus SF0/F0, and any PRO versus SF0/F0.
Risk of Bias Analysis
To assess the impact of overall bias, we excluded studies identified with a severe risk of bias and repeated all meta‐analyses, meta‐regressions, and correlation analyses to evaluate any shifts in effect sizes and heterogeneity metrics. Additionally, a trim and fill analysis was conducted to further examine the potential for publication bias.
Results
Twenty‐four studies were included in the meta‐analysis as listed in Tables 1 and 3. Twenty‐three studies were retrospective analyses and 1 study was a randomized controlled trial. There were a total of 893 unique patients across studies. The number of patients utilizing each intervention and PRO are outlined further in the summary statistics in Supplemental Table 1, available online.
Meta‐Analyses
Objective Measures
As shown in Figure 2, the analysis of SF0 in Hz showed a pooled mean change of 36.02 Hz (95% confidence interval [CI]; 27.16 to 44.88), with significant heterogeneity (τ² = 305.53; I² = 97.6%; Q = 951.94). For F0, the mean change was 53.90 Hz (95% CI; 43.29 to 64.51), with high heterogeneity (τ² = 686.33; I² = 98.30%; Q = 935.71).
Figure 2.

Forest plot of SF0 and F0 changes. Mean change in Speaking Fundamental Frequency (SF0) and Fundamental Frequency (F0) in Hertz (Hz) with 95% confidence intervals across studies. Significant improvements are observed with high heterogeneity (SF0: I² = 97.6%; F0: I² = 98.30%). Total N: SF0 = 479, F0 = 725.
Figure 3 shows the effect size for all objective measures was 2.39 (95% CI; 1.76 to 3.02), also indicating notable variability (τ² = 3.81; I² = 97.20%; Q = 421.89).
Figure 3.

Forest plot of all objective measures using Hedge's g. Bias‐corrected effect sizes (Hedge's g) for SF0/F0 across studies, with 95% confidence intervals. Significant variability is noted (I² = 97.20%). Total N: 1204. The pooled effect size indicates overall vocal pitch improvement post‐intervention.
Subjective Measures
The analysis of all subjective measures shown in Figure 4 yielded a pooled effect size of 2.35 (95% CI; 1.84 to 2.86). The heterogeneity statistics indicated substantial variation among the studies (τ² = 2.13; I² = 96.20%; Q = 384.35).
Figure 4.

Forest plot of all subjective measures using Hedge's g (All PROs). Bias‐corrected effect sizes (Hedge's g) for patient‐reported outcomes (PROs) across studies, with 95% confidence intervals. Significant variability is evident (I² = 96.20%). Total N: 1089. The pooled effect size highlights improvements in quality of life following interventions.
Meta‐Regressions
Objective Measures
As shown in Figure 5, when measuring SF0 in raw Hz, significant improvements were observed for VT (mean difference [MD], 17.27 Hz; 95% CI; 11.66 to 22.87 Hz; P < .0001), WG (MD, 26.02 Hz; 95% CI; 18.52‐33.53 Hz; P < .0001), FL (MD, 62.14 Hz; 95% CI; 51.74‐72.55 Hz; P < .0001), and LRG (MD, 42.98 Hz; 95% CI; 27.27‐58.69 Hz; P < .0001). For F0 in Hz, improvements were seen for VT (MD, 26.46 Hz; 95% CI; 13.87 to 39.05 Hz; P < .0001), WG (MD, 35.57 Hz; 95% CI; 21.48‐49.66 Hz; P < .0001), VFSRAC (MD, 73.60 Hz; 95% CI; 36.91‐110.29 Hz; P < .0001), RADIESSE (MD, 74.65 Hz; 95% CI; 25.74‐123.57 Hz; P = .0028), and LRG (MD, 37.19 Hz; 95% CI; 17.43 to 56.95 Hz; P = .0002). Changes in F0 for CTA were not statistically significant. Heterogeneity measures for the SF0 regression (τ² = 48.89; I² = 76.70%; Q = 52.66) were much smaller than for F0 (τ² = 348.14; I² = 94.10%; Q = 301.84).
Figure 5.

Mean change in hertz (Hz) by Intervention. Mean change in Hz for various interventions, with 95% confidence intervals. Interventions are grouped by Speaking Fundamental Frequency (SF0) and Fundamental Frequency (F0). Significant improvements are noted for most interventions, with high heterogeneity (SF0: I² = 76.70%; F0: I² = 94.10%).
Figure 6 shows pooled effect sizes for objective measures. SF0 showed statistically significant increases for VT (g, 0.86; 95% CI; 0.46 to 1.26; P < .0001), WG (g, 1.21; 95% CI; 0.65 to 1.77; P < .0001), FL (g, 3.05; 95% CI; 2.24 to 3.86; P < .0001), and LRG (g, 12.28; 95% CI; 8.19 to 15.64; P < .0001). F0 effect sizes were increased for WG (g, 1.68; 95% CI; 0.54 to 2.82; P = .0039) and LRG (g, 3.57; 95% CI; 1.72 to 5.42; P = .0002). Heterogeneity was again markedly less for SF0 (τ² = 0.23; I² = 67.50%; Q = 34.68) compared to F0 (τ² = 2.26; I² = 93.20%; Q = 110.78).
Figure 6.

Objective (SF0/F0) effect size by intervention. Effect sizes (Hedge's g) for different interventions targeting SF0 and F0, with 95% confidence intervals. Most interventions demonstrate significant improvements, with notable heterogeneity (SF0: I² = 67.50%; F0: I² = 93.20%).
Subjective Measures
The meta‐regression for subjective measures is shown in Figure 7. Validated PROs showed significant positive effect for both VT (g, 1.47; 95% CI; 0.19 to 2.76; P = .024) and WG (g, 2.73; 95% CI; 0.60 to 4.86; P = .012), though it is notable that LRG was the only other intervention that included validated PROs. Unvalidated PROs showed significant positive effects for VT (g, 1.56; 95% CI; 0.69 to 2.44; P = .0005), WG (g, 1.61; 95% CI; 0.81 to 2.41; P < .0001), and LRG (g, 4.50; 95% CI; 2.32 to 6.67; P < .0001), with LRG showing the largest effect size. Meta‐regression calculated using all PROs showed similar trends, with VT (g, 1.32; 95% CI; 0.68 to 1.96; P < .0001), WG (g, 1.82; 95% CI; 1.07 to 2.57; P < .0001), and LRG (g, 1.90; 95% CI; 0.72 to 3.07; P = .0015) showing statistical significance. Heterogeneity was still very high for the all PRO subjective regression, though lower than the heterogeneity observed for the all PRO subjective meta‐analysis (τ² = 1.43; I² = 91.70%; Q = 248.03 vs τ² = 2.26; I² = 96.25%; Q = 384.35, respectively).
Figure 7.

Subjective (PROs) effect size by intervention. Effect sizes (Hedge's g) for various interventions based on validated, unvalidated, and all patient‐reported outcomes (PROs), with 95% confidence intervals. Significant subjective improvements are seen, with high heterogeneity noted (validated PROs: I² = 91.60%; unvalidated PROs: I² = 90.00%; All PROs: I² = 91.70%).
Heterogeneity Testing
The combined variability or dispersion of changes of all measures by intervention is shown in Figure 8. We showed higher standards of deviation for WG compared to VT for SF0 (WG standard deviation of Hedge's g [SDg] = 6.29; VT SDg = 0.66), F0 (WG SDg = 3.64; VT SDg = 2.22), combined SF0/F0 (WG SDg = 4.49; VT SDg = 1.72), unvalidated PROs (WG SDg = 3.37; VT SDg = 0.99), and any PRO (WG SDg = 2.43; VT SDg = 1.91). Validated PROs were the only measure where VT showed higher standards of deviation than WG (WG SDg = 2.00; VT SDg = 2.40) and while CTA and FL exhibited lower outcome variability overall compared to WG or VT, these two interventions were not represented thoroughly enough in our data set to draw conclusions using this analysis. Overall, these models suggest higher variability in surgical outcomes like WG compared to VT alone.
Figure 8.

Standard deviation of effect sizes (Hedge's g) by Intervention. Depicts the variability (standard deviation) in effect sizes for F0, SF0, combined SF0/F0, and both validated and unvalidated PROs across different interventions. Higher variability is noted for interventions like WG, suggesting a source of heterogeneity.
Statistically Significant Surgical Interventions Versus VT
The meta‐regression findings comparing only statistically significant surgical interventions to VT are shown in Figure 9. For objective measures, VT had an effect size of 0.74 (95% CI: −0.04 to 1.52; P = .0624), while WG, FL, and LRG collectively showed a significant increase in effect size (g, 2.61; 95% CI: 1.84 to 3.38; P < .0001). For subjective PRO measures, VT showed an effect size of 1.37 (95% CI: 0.60 to 2.14; P = .0005), whereas WG and LRG showed a larger effect size (g, 2.51; 95% CI: 1.75 to 3.27; P < .0001), indicating stronger effects with surgical interventions. The heterogeneity for SF0 or F0 was high (τ² = 1.85; I² = 94.50%; Q = 210.95), as were all PROs (τ² = 2.24; I² = 97.10%; Q = 1340.04), again indicating substantial variability across studies.
Figure 9.

Statistically significant surgical interventions versus voice therapy. Effect sizes (Hedge's g) comparing statistically significant surgical interventions (WG, LRG, FL) to voice therapy (VT) for both SF0/F0 and PROs, with 95% confidence intervals. Surgical interventions show greater improvements. High heterogeneity is observed (SF0/F0: I² = 94.50%; All PROs: I² = 97.10%).
Correlation Analysis
As shown in Figure 10, Pearson's product‐moment correlation comparing mean SF0/F0 with validated PROs showed a moderately strong association (r 2 = .6592; P = .05). SF0/F0 versus unvalidated PROs showed a very strong association (r 2 = .9289, P = .0001). Overall correlation between SF0/F0 and all PROs again shows a strong association (r 2 = .8297; P = .0001).
Figure 10.

Correlation Between Mean Objective and Subjective Effect Sizes. This scatter plot shows the correlation between mean objective effect sizes (SF0, F0) and subjective effect sizes (PROs). Strong correlations are observed for unvalidated (r = .93) and all PROs (r = .83), with moderate correlation for validated PROs (r = .66). Lines represent linear regressions for each PRO type.
Risk of Bias Analysis
After excluding studies with overall severe risk of bias, a statistically negligible uniform leftward shift in effect sizes was observed. Overall heterogeneity metrics showed negligible mixed responses as well, with some measures experiencing increased variability while others decreased. Notably, the meta‐regression for validated PROs for VT was the only metric that displayed a significant change, with a pre‐corrected effect size of 1.56 (95% CI; 0.69 to 2.44; P = .0005) shifting to 0.13 (95% CI; −1.15 to 1.40; P = .8462). As shown in Supplemental Figure 1, available online the trim and fill analysis did not suggest any missing studies but still indicated high heterogeneity despite a lack of publication bias, which is in line with the rest of our results.
A GRADE summary of the most relevant findings is outlined in Table 4.
Table 4.
GRADE Summary of Findings Table for Voice Feminization Interventions
| Outcome | Intervention | N patients (studies) | Effect size (95% CI) | Certainty of evidence (GRADE) | Comments |
|---|---|---|---|---|---|
| Statistically Significant Objective Measures | |||||
| Speaking Fundamental Frequency (SF0) | Voice Therapy (VT) | 193 (8 studies) | 0.86 [0.46, 1.26] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 67.5%). No publication bias detected. Significant improvement in SF0 with VT. |
| Wendler Glottoplasty (WG) | 171 (6 studies) | 1.21 [0.65, 1.77] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 67.5%). No publication bias detected. Significant improvement in SF0 with WG. | |
| Feminization Laryngoplasty (FL) | 189 (2 studies) | 3.05 [2.24, 3.86] |
⊕⊕⊝⊝ low |
Low certainty due to high heterogeneity (I² = 67.5%) and limited study numbers (2 studies). No publication bias detected. Significant improvement in SF0 with FL. | |
| Laser Reduction Glottoplasty (LRG) | 35 (1 study) | 12.28 [8.92, 15.64] |
⊕⊝⊝⊝ very low |
Very low certainty due to single study data, potential bias, and high heterogeneity (I² = 67.5%). No publication bias detected. Highly significant improvement in SF0 with LRG. | |
| Fundamental Frequency (F0) | WG | 276 (12 studies) | 1.68 [0.54, 2.82] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 93.2%). No publication bias detected. Significant improvement in F0 with WG. |
| LRG | 35 (5 studies) | 3.57 [1.72, 5.42] |
⊕⊕⊝⊝ low |
Low certainty due to high heterogeneity (I² = 93.2%) and limited study numbers. No publication bias detected. Notable improvement in F0 with LRG. | |
| Statistically Significant Subjective Measures | |||||
| Validated PROs | VT | 132 (6 studies) | 1.47 [0.19, 2.76] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 91.6%). No publication bias detected. Positive improvement in validated PROs with VT. |
| WG | 126 (5 studies) | 2.73 [0.59, 4.86] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 91.6%). No publication bias detected. Strong improvement in validated PROs with WG. | |
| Unvalidated PROs | VT | 164 (3 studies) | 1.56 [0.69, 2.44] |
⊕⊕⊝⊝ low |
Low certainty due to high heterogeneity (I² = 90.0%) and limited study numbers. No publication bias detected. Positive improvement in unvalidated PROs with VT. |
| WG | 215 (5 studies) | 1.61 [0.81, 2.41] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 90.0%). No publication bias detected. Consistent improvement in unvalidated PROs with WG. | |
| LRG | 38 (2 studies) | 4.50 [2.32, 6.67] |
⊕⊕⊝⊝ low |
Low certainty due to high heterogeneity (I² = 90.0%) and few studies. No publication bias detected. Notable improvement in unvalidated PROs with LRG. | |
| All PROs | VT | 296 (9 studies) | 1.32 [0.68, 1.95] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 91.7%). No publication bias detected. Positive improvement in overall PROs with VT. |
| WG | 341 (10 studies) | 1.82 [1.07, 2.57] |
⊕⊕⊕⊝ moderate |
Moderate certainty due to high heterogeneity (I² = 91.7%). No publication bias detected. Strong improvement in overall PROs with WG. | |
| LRG | 101 (4 studies) | 1.90 [0.72, 3.07] |
⊕⊕⊝⊝ low |
Low certainty due to high heterogeneity (I² = 91.7%) and limited study numbers. No publication bias detected. Positive improvement in overall PROs with LRG. | |
Effect sizes and GRADE levels for statistically significant interventions. Certainty levels range from very low (LRG) to moderate (VT, WG) across measures of SF0, F0, and PROs, and reflect significant improvements despite high heterogeneity.
Discussion
All of our meta‐analyses showed statistically significant improvement in both objective and subjective measures for all interventions, though there was very high heterogeneity across all studies. While this was expected given the overall inclusiveness of our study design and the inherent variability in surgical procedures and other factors that were too granular to be accounted for in this type of analyses, we believe this top‐level trend of consistent improvement in all measures among all studies is notable.
Our meta‐regression findings show enhancement in vocal pitch by raw Hz across all interventions when assessed by either F0 and SF0, except for CT. The observed effect size analysis largely corroborates the trends shown in the raw Hz findings, though VFSRAC and RADIESSE are no longer statistically significant when bias‐corrected. Taken with the subjective measure results discussed below, this is not surprising given that VFSRAC, RADIESSE, and FL are novel procedures poorly represented in the literature and do not have the statistical power yet to draw firm conclusions about efficacy.
Our analysis also showed that surgical approaches to voice feminization result in larger gains in frequency alteration and patient satisfaction compared to VT alone. This suggests that while VT is a valuable tool for vocal feminization, surgical interventions may offer a greater increase in pitch and ultimate satisfaction by allowing transitioning individuals to attain better gender‐congruent expression.
Our analysis of subjective PRO measures demonstrates positive effect sizes for WG, VT, and LRG. However, it is notable that the differences in QoL between VT and WG are minimal, suggesting comparable patient satisfaction in both non‐surgical and at least certain surgical approaches.
Contrary to a recent study that posited a lack of correlation between F0 and validated PROs, 10 our analysis including both validated and unvalidated PROs offers a different conclusion. Our findings reveal a statistically significant strong correlation between subjective and objective measures in two of our analyses, and a moderately strong correlation in the most stringent validated PRO category. Although there is an expected disparity between validated and unvalidated PROs due to the differences in their methodological rigor, our data firmly establish that improvements in vocal pitch are significantly associated with enhanced quality of life irrespective of the PRO used. Nonetheless, we emphasize that while all interventions primarily target pitch elevation, achieving a feminine voice encompasses a wider array of vocal characteristics 5 than frequency alone.
That said, as illustrated by the lack of comprehensive quality PRO data for novel procedures like FL and VFSRAC, the enduring use of unvalidated instruments highlights a significant gap in standardization of outcome reporting. The ongoing utilization of such instruments, even with the existence of the TWVQ, suggests a collective inertia that warrants attention from clinicians and researchers. This calls for a concerted effort to standardize measuring outcomes.
While our analysis shows correlation between objective and subjective measures, we recognize that patient satisfaction is multi‐faceted and complex. Current subjective PRO measures fail to address the discord that might arise between a patient's self‐perceived vocal femininity and external social validation from third‐parties. 38 , 39 , 40 Such incongruence can pose significant emotional distress to individuals who seek social congruence in their gender expression. 38 Therefore, the authors advocate for an improved assessment framework by working toward standardized third‐party measurement systems. This would complete the triad of what in our view constitutes a truly comprehensive assessment: objective measures, subjective measures, and patient independent third‐party measures. While third‐party tools like the telephone test 40 or machine‐learning algorithms 39 have been proposed, a fully robust approach has yet to be achieved and deserves further attention from the research community.
Limitations and Future Directions
While our meta‐analysis provides some important validation of efficacy for certain voice feminization procedures, there are still several limitations that must be addressed. While comprehensive data were available for VT and WG, the paucity of data on other interventions posed challenges. Particularly, PRO analysis for novel techniques such as VFSRAC and FL could not be thoroughly evaluated due to insufficient evidence. The lack of extensive research on these procedures limits our ability to confirm their purported benefits and warrants further investigation as more data becomes available.
Our analysis faced the inherent limitation of high heterogeneity, as expected in meta‐analyses involving surgical procedures. Attempts to identify underlying causes for this heterogeneity were constrained by the absence of comprehensive information on potentially influential factors such as age, race, or comorbidities. While differences in patient physiology are expected to contribute to variability, we hypothesize that a significant portion of the heterogeneity is ascribable to nuances within surgical techniques—specifically within the WG category as alluded to in our dispersion heterogeneity test. For instance, we observed considerable variability in the ratio of vocal fold de‐epithelialization during glottoplasty from 33% to 50%, 11 , 18 , 19 , 21 , 25 , 41 the use of laser versus microlaryngeal scissors for de‐epithelialization, 29 and in postoperative care protocols including the adjunctive use of VT. It should also be emphasized that WG was the only surgical intervention with enough data to allow analysis of heterogeneity; exploration of other interventions will require more thorough data sets.
Variability in follow‐up durations added complexity to our analysis, which constrained our ability to assess long‐term outcomes. The evaluation of combined therapies was also challenging, as multiple interventions in the same patient population prevented truly isolating individual contributions given the potential additive effects of adjunctive interventions despite the use of comprehensive meta‐regressions. Additionally, the use of unvalidated PROs, although necessary due to the scarcity of validated PROs, introduced uncertainty in effect size comparisons with standardized instruments like the TWVQ. Our findings underscore the need for standardized measures but do not endorse the continued use of unvalidated PROs. Lastly, LRG showed an exceptionally high SF0 effect size due to an unusually low standard deviation recorded by Yilmaz et al, 24 which could skew results, though the overall trend still suggests effectiveness.
Conclusion
Our meta‐analysis provides a comprehensive evaluation of voice feminization procedures at this point in time, demonstrating that surgical interventions yield more significant improvements in vocal pitch than VT, and such pitch enhancements are positively associated with improved quality of life. Despite these findings, inadequate data on novel techniques and reliance on unvalidated PROs indicate a pressing need for standardized outcome measures and the incorporation of third‐party assessments.
Author Contributions
Kristopher Lanham, concept and design, data acquisition, manuscript composition, critical review, statistical analysis, risk of bias analysis; Bradley A. Melnick, data acquisition, manuscript composition, critical review, risk of bias analysis; Madeline J. O'Connor, data acquisition, manuscript composition, critical review, PRISMA review; Angelica Bartler, data acquisition, manuscript composition, critical review; Kelly C. Ho, data acquisition, critical review; Rolando J. Casas Fuentes, data acquisition, statistical analysis; Robert D. Galiano, concept and design, critical review, supervision.
Disclosures
Competing interests
None of the authors have any conflicts of interest to disclose.
Funding
None.
Supporting information
Supplemental Figure 1. Risk of bias and trim and fill analysis. Risk of bias assessment for retrospective studies using ROBINS‐I and for RCTs using RoB. Domains are color‐coded to indicate judgment levels. Trim and fill analysis funnel plot shows no missing studies, with high heterogeneity for both objective (I² = 97.56%) and subjective measures (I² = 94.73%). This suggests robustness without significant publication bias but high heterogeneity.
Supplemental Table 1. Summary of Study Metrics. Summary of study metrics, including total number of unique patients and distribution of patients across interventions.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request. The data include the specific analysis tools and code used in the study along with all relevant data sets.
References
- 1. Oles N, Darrach H, Landford W, et al. Gender affirming surgery: a comprehensive, systematic review of all peer‐reviewed literature and methods of assessing patient‐centered outcomes (part 1: breast/chest, face, and voice). Ann Surg. 2022;275(1):e52‐e66. 10.1097/SLA.0000000000004728 [DOI] [PubMed] [Google Scholar]
- 2. Chadwick KA, Coleman R, Andreadis K, Pitti M, Rameau A. Outcomes of gender‐affirming voice and communication modification for transgender individuals. Laryngoscope. 2022;132(8):1615‐1621. 10.1002/lary.29946 [DOI] [PubMed] [Google Scholar]
- 3. Yılmaz T, Özer F, Aydınlı FE. Laser reduction glottoplasty for voice feminization: experience on 28 patients. Ann Otol Rhinol Laryngol. 2021;130(9):1057‐1063. 10.1177/0003489421993728 [DOI] [PubMed] [Google Scholar]
- 4. Gross M. Pitch‐raising surgery in male‐to‐female transsexuals. J Voice. 1999;13(2):246‐250. 10.1016/S0892-1997(99)80028-9 [DOI] [PubMed] [Google Scholar]
- 5. Simpson AP. Phonetic differences between male and female speech. Lang Linguistics Compass. 2009;3(2):621‐640. 10.1111/j.1749-818X.2009.00125.x [DOI] [Google Scholar]
- 6. Koçak I, Akpınar ME, Çakır ZA, Doğan M, Bengisu S, Çelikoyar MM. Laser reduction glottoplasty for managing androphonia after failed cricothyroid approximation surgery. J Voice. 2010;24(6):758‐764. 10.1016/j.jvoice.2009.06.004 [DOI] [PubMed] [Google Scholar]
- 7. Mastronikolis NS, Remacle M, Biagini M, Kiagiadaki D, Lawson G. Wendler glottoplasty: an effective pitch raising surgery in male‐to‐female transsexuals. J Voice. 2013;27(4):516‐522. 10.1016/j.jvoice.2013.04.004 [DOI] [PubMed] [Google Scholar]
- 8. Kunachak S, Prakunhungsit S, Sujjalak K. Thyroid cartilage and vocal fold reduction: a new phonosurgical method for male‐to‐female transsexuals. Ann Otol Rhinol Laryngol. 2000;109(11):1082‐1086. 10.1177/000348940010901116 [DOI] [PubMed] [Google Scholar]
- 9. Kim HT. A new conceptual approach for voice feminization: 12 years of experience. Laryngoscope. 2017;127(5):1102‐1108. 10.1002/lary.26127 [DOI] [PubMed] [Google Scholar]
- 10. Hao Y, Trilles J, Brydges HT, et al. Meta‐analysis of validated quality of life outcomes following voice feminization in transwomen. J Craniofac Surg. 2024;35(1):53‐58. 10.1097/SCS.0000000000009742 [DOI] [PubMed] [Google Scholar]
- 11. Yılmaz T, Kuşçu O, Sözen T, Süslü AE. Anterior glottic web formation for voice feminization: experience of 27 patients. J Voice. 2017;31(6):757‐762. 10.1016/j.jvoice.2017.03.006 [DOI] [PubMed] [Google Scholar]
- 12. Nuyen BA, Qian ZJ, Campbell RD, Erickson‐DiRenzo E, Thomas J, Sung CK. Feminization laryngoplasty: 17‐year review on long‐term outcomes, safety, and technique. Otolaryngol Head Neck Surg. 2022;167(1):112‐117. 10.1177/01945998211036870 [DOI] [PubMed] [Google Scholar]
- 13. Anderson JA. Pitch elevation in trangendered patients: anterior glottic web formation assisted by temporary injection augmentation. J Voice. 2014;28(6):816‐821. 10.1016/j.jvoice.2014.05.002 [DOI] [PubMed] [Google Scholar]
- 14. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hee Lee J, Humes LE. Effect of fundamental‐frequency and sentence‐onset differences on speech‐identification performance of young and older adults in a competing‐talker background. J Acoust Soc Am. 2012;132(3):1700‐1717. 10.1121/1.4740482 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhu S, Chong S, Chen Y, Wang T, Ng ML. Effect of language on voice quality: an acoustic study of bilingual speakers of Mandarin Chinese and English. Folia Phoniatr Logop. 2022;74(6):421‐430. 10.1159/000525649 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Gerhard D. Pitch extraction and fundamental frequency: history and current techniques. Technical Report TR‐CS 2003‐06; 2003.
- 18. Casado JC, ÓConnor C, Angulo MS, Adrián JA. Glotoplastia de Wendler y tratamiento logopédico en la feminización de la voz en transexuales: resultados de la valoración pre‐ vs. poscirugía. Acta Otorrinolaringol Esp. 2016;67(2):83‐92. 10.1016/j.otoeng.2015.02.003 [DOI] [PubMed] [Google Scholar]
- 19. Casado JC, Rodríguez‐Parra MJ, Adrián JA. Voice feminization in male‐to‐female transgendered clients after Wendler's glottoplasty with vs. without voice therapy support. Eur Arch Otrhinolaryngol. 2017;274(4):2049‐2058. 10.1007/s00405-016-4420-8 [DOI] [PubMed] [Google Scholar]
- 20. Rapoport SK, Park C, Varelas EA, et al. 1‐year results of combined modified Wendler glottoplasty with voice therapy in transgender women. Laryngoscope. 2023;133(3):615‐620. 10.1002/lary.30225 [DOI] [PubMed] [Google Scholar]
- 21. Husain S, Campe L, Mirza N. Modification of Wendler glottoplasty for male to female gender transition. J Voice. 2023:S0892199723000279. In press. 10.1016/j.jvoice.2023.01.028 [DOI] [PubMed] [Google Scholar]
- 22. Iwarsson J, Hollen Nielsen R, Næs J. Mean fundamental frequency in connected speech and sustained vowel with and without a sentence‐frame. Logoped Phoniatr Vocol. 2020;45(2):91‐96. 10.1080/14015439.2019.1637455 [DOI] [PubMed] [Google Scholar]
- 23. Enzmann D. Notes on effect size measures for the difference of means from two independent groups: the case of Cohen's d and Hedges’ g. University of Hamburg, Institute of Criminal Sciences; 2015. 10.13140/2.1.1578.2725 [DOI]
- 24. Yılmaz T. Sequential Wendler glottoplasty and laser reduction glottoplasty for voice feminization. Laryngoscope. 2023;134:1133‐1138. 10.1002/lary.30958 [DOI] [PubMed] [Google Scholar]
- 25. Aires MM, De Vasconcelos D, Lucena JA, Gomes AOC, Moraes BT. Effect of Wendler glottoplasty on voice and quality of life of transgender women. Braz J Otorhinolaryngol. 2023;89(1):22‐29. 10.1016/j.bjorl.2021.06.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. D'haeseleer E, Papeleu T, Leyns C, Adriaansen A, Meerschman I, Tomassen P. Voice outcome of glottoplasty in trans women. J Voice. 2023:S0892199723000152. In press. 10.1016/j.jvoice.2023.01.013 [DOI] [PubMed] [Google Scholar]
- 27. Mora E, Cobeta I, Becerra A, Lucio MJ. Comparison of cricothyroid approximation and glottoplasty for surgical voice feminization in male‐to‐female transsexuals. Laryngoscope. 2018;128(9):2101‐2109. 10.1002/lary.27172 [DOI] [PubMed] [Google Scholar]
- 28. Nerurkar NK, Nagree Z, Malik E, Jahnavi. Vocal outcomes following pitch alteration surgeries. Indian J Otolaryngol Head Neck Surg. 2023;75(4):2741‐2746. 10.1007/s12070-023-03837-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Brown SK, Chang J, Hu S, et al. Addition of Wendler glottoplasty to voice therapy improves trans female voice outcomes. Laryngoscope. 2021;131(7):1588‐1593. 10.1002/lary.29050 [DOI] [PubMed] [Google Scholar]
- 30. Merrick G, Figol A, Anderson J, Lin RJ. Outcomes of gender affirming voice training: a comparison of hybrid and individual training modules. J Speech Lang Hear Res. 2022;65(2):501‐507. 10.1044/2021_JSLHR-21-00056 [DOI] [PubMed] [Google Scholar]
- 31. Nuyen B, Qian ZJ, Rakkar M, Thomas JP, Erickson‐DiRenzo E, Sung CK. Diagnosis and management of vocal complications after chondrolaryngoplasty. Laryngoscope. 2023;133(9):2301‐2307. 10.1002/lary.30518 [DOI] [PubMed] [Google Scholar]
- 32. Leyns C, Daelman J, Adriaansen A, et al. Short‐term acoustic effects of speech therapy in transgender women: a randomized controlled trial. Am J Speech Lang Pathol. 2023;32(1):145‐168. 10.1044/2022_AJSLP-22-00135 [DOI] [PubMed] [Google Scholar]
- 33. Kelly V, Hertegård S, Eriksson J, Nygren U, Södersten M. Effects of gender‐confirming pitch‐raising surgery in transgender women a long‐term follow‐up study of acoustic and patient‐reported data. J Voice. 2019;33(5):781‐791. 10.1016/j.jvoice.2018.03.005 [DOI] [PubMed] [Google Scholar]
- 34. Chang J, Brown SK, Hu S, et al. Effect of Wendler glottoplasty on acoustic measures of voice. Laryngoscope. 2021;131(3):583‐586. 10.1002/lary.28764 [DOI] [PubMed] [Google Scholar]
- 35. Hancock AB, Garabedian LM. Transgender voice and communication treatment: a retrospective chart review of 25 cases. Int J Lang Commun Disord. 2013;48(1):54‐65. 10.1111/j.1460-6984.2012.00185.x [DOI] [PubMed] [Google Scholar]
- 36. Dacakis G, Davies S, Oates JM, Douglas JM, Johnston JR. Development and preliminary evaluation of the transsexual voice questionnaire for male‐to‐female transsexuals. J Voice. 2013;27(3):312‐320. 10.1016/j.jvoice.2012.11.005 [DOI] [PubMed] [Google Scholar]
- 37. Quinn S, Oates J, Dacakis G. Perceived gender and client satisfaction in transgender voice work: comparing self and listener rating scales across a training program. Folia Phoniatr Logop. 2022;74(5):364‐379. 10.1159/000521226 [DOI] [PubMed] [Google Scholar]
- 38. T'Sjoen G, Moerman M, Van Borsel J, et al. Impact of voice in transsexuals. Int J Transgend. 2006;9(1):1‐7. 10.1300/J485v09n01_01 [DOI] [Google Scholar]
- 39. Bensoussan Y, Park C, Johns M, et al. A comparison of an artificial intelligence tool to fundamental frequency as an outcome measure in people seeking a more feminine voice. Laryngoscope. 2021;131(11):2567‐2571. 10.1002/lary.29605 [DOI] [PubMed] [Google Scholar]
- 40. Meister J, Kühn H, Shehata‐Dieler W, Hagen R, Kleinsasser N. Perceptual analysis of the male‐to‐female transgender voice after glottoplasty‐the telephone test: perceptual analysis of the transgender voice. Laryngoscope. 2017;127(4):875‐881. 10.1002/lary.26110 [DOI] [PubMed] [Google Scholar]
- 41. Meister J, Hagen R, Shehata‐Dieler W, Kühn H, Kraus F, Kleinsasser N. Pitch elevation in male‐to‐female transgender persons—the Würzburg approach. J Voice. 2017;31(2):244.e7‐244.e15. 10.1016/j.jvoice.2016.07.018 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental Figure 1. Risk of bias and trim and fill analysis. Risk of bias assessment for retrospective studies using ROBINS‐I and for RCTs using RoB. Domains are color‐coded to indicate judgment levels. Trim and fill analysis funnel plot shows no missing studies, with high heterogeneity for both objective (I² = 97.56%) and subjective measures (I² = 94.73%). This suggests robustness without significant publication bias but high heterogeneity.
Supplemental Table 1. Summary of Study Metrics. Summary of study metrics, including total number of unique patients and distribution of patients across interventions.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request. The data include the specific analysis tools and code used in the study along with all relevant data sets.
