Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: J Voice. 2019 Apr 13;34(5):748–762. doi: 10.1016/j.jvoice.2019.03.006

Longitudinal Case Study of Transgender Voice Changes under Testosterone Hormone Therapy

Gabriel J Cler 1,2, Victoria S McKenna 2, Kimberly L Dahl 2, Cara E Stepp 1,2,3,4
PMCID: PMC6790287  NIHMSID: NIHMS1524429  PMID: 30987859

Abstract

The purpose of this study was to comprehensively evaluate voice and speech changes in one healthy 30-year-old transgender male undergoing testosterone therapy for transition. Testing occurred at three timepoints before cross-sex hormone therapy and every two weeks thereafter for one year. Data collected included measures of acoustics, aerodynamics, and laryngeal structure and function via flexible laryngoscopy. Analysis included acoustic correlates of pitch, loudness, voice quality, and vocal tract length, as well as perceptual measures of voice quality and gender. Speaking fundamental frequency (fo) lowered from 183 Hz to 134 Hz. Phonatory frequency range (i.e., minimum and maximum singing range) shifted from a range of D#3 to E6 to a range of A2 to A5. Perceptual measures of voice quality indicated no negative changes. Naive listeners reliably rated the participant’s speech samples as male after 37 weeks on testosterone. Few studies document in detail the variety of voice changes that occur during cross-sex hormone therapy, focusing instead on fo alone. This study adds to the literature a comprehensive case study of speech and voice changes experienced by one transmasculine participant undergoing testosterone therapy.

Keywords: transgender, transmasculine, endoscopy, aerodynamics, acoustic, HRT, testosterone

1. Introduction

Transgender voice has become highly relevant in speech-language pathology clinical practice, but research in this area is still lacking.1 Of the transgender voice studies available, most focus on transfeminine voice, with fewer studies evaluating transmasculine voice.1,2 This focus has been explained by the fact that transmasculine people typically are not seen clinically for voice masculinization. The relative absence of transmasculine individuals from voice clinics is likely a multifactorial issue, which include psychosocial and socio-political factors3. Within the speech-language pathology field, many clinicians report a lingering belief that masculinization is accomplished solely by the effects of testosterone on the voice.2,4 In contrast, estrogen therapy does not feminize the voice, so transfeminine clients are often seen in the voice clinic.4 The effects of testosterone therapy (e.g., hormone replacement therapy or HRT) on all aspects of transmasculine voice are not yet characterized, and in fact testosterone therapy alone may not result in voice satisfaction.57 Transmasculine speakers experience a range of gender-related voice problems,8 which may or may not be resolved with hormone therapy and speech therapy.

There are limited studies investigating transmasculine voice,5,6,16,17,7,915 many of which only consider one or two domains of voice – typically focusing on fundamental frequency (fo), which is perceived as pitch. Some studies suggest that speakers may experience changes and restrictions in a variety of voice domains, including: pitch range/variability, vocal control/stability, glottal function, and voice quality.16,18 Voice quality in particular has been suggested to be reduced following testosterone therapy,9,12,16 but some evidence suggests that, while variable, it usually remains within normal limits.16 An even smaller set of studies have tracked longitudinal voice changes in trans men undergoing testosterone therapy (summarized in Table 1).5,7,1214,17 These studies typically characterize speaking fo5,7,1214,17 during sustained vowels or reading, and occasionally assess phonatory frequency range5,7,12,16 (that is, the entire range of singing fo that can be produced) and acoustic measures including jitter,7,16 shimmer,7,12,16 and noise-to-harmonic ratio.12,16 Some studies have also measured self-perception of voice.5,16

Table 1.

Results of longitudinal studies in trans men before and after testosterone therapy.

Study N Task Speaking fo change Steepest
change
Other measures
7Van Borsel 2 /ɑ/ and Reading 200–220Hz to 130–160Hz 4 mo Jitter =
Shimmer =
PFR=*
12 Damrose 1 /ɑ/ 228 to 113 Hz 3–4 mo Shimmer ^
NHR ^
PFR=*
13 Deuster 11 Reading 79.4Hz (8.78ST) 2–3 mo
5Nygren 50 Reading 192 to 125Hz PFR=
14Irwig & 16Hancock 7 /ɑ/, Reading 135–183 to 87–128Hz (mean: −6.4 ST) Jitter v
Shimmer v
NHR ~
PFR=
17Zimman 3 Reading 167–177Hz to 122–132Hz First three formants (F1–3) ~

PFR: phonatory frequency range (maximum and minimum fo that the speaker can produce)

NHR: noise-to-harmonic ratio

*

Authors suggest that PFR reduced, but recalculating results in ST rather than Hz shows a stable PFR (<10% change)

Key

=:

(average participant shows) no change;

^:

increased;

v:

decreased;

~:

inconsistent between participants

The scarcity of research on transmasculine voice, and particularly the scarcity of evidence tracking speakers longitudinally, prevents endocrinologists and speech-language pathologists from providing transmasculine speakers with accurate prognoses of voice changes under testosterone (“T”). Understanding which voice features may be affected, over what period, and to what degree informs decisions regarding whether and when other approaches to voice masculinization, such as voice therapy, may be warranted. Further, given the diversity of voice-related goals among transmasculine speakers,8 a fuller understanding of expected changes is necessary to best serve this population with an evidence-based approach.

Here we present, following CARE guidelines19, an in-depth case study of one transmasculine speaker’s voice changes under testosterone. Analyses follow one speaker from pre-testosterone therapy (baseline) to twelve months on testosterone with twice monthly comprehensive data collection enabling videoendoscopic assessment, aerodynamic measures, acoustic measures, clinician perception of voice changes, and listener perception of speaker gender.

2. Methods

2.1. Participant, study design, and dosage

The participant was a 30-year-old transmasculine individual (assigned female at birth) who worked as a researcher. As such, he actively contributed to the design of the study (e.g., suggesting measures of interest and negotiating the schedule of data collection), in an example of participatory action research.20 He was a heavy voice user at work as well as an avocational but high-intensity (6 hrs/week) choral singer. He had no history of speech, language, or voice issues and had not completed any voice therapy either prior to or during the course of the study. His gender identity was self-reported as male and he used he/him/his pronouns. He was receiving regular medical care from a gender-affirming medical clinic, and was in the process of medical testing before starting testosterone when he first participated in this study. He had begun socially transitioning five months before and had not yet begun the legal processes involved with changing his name and gender on governmental identification. He completed his name change around week 6 of this study and aligned other identification over the following months. He began pursuing gender-affirming surgery around week 20 of the study and had that surgery during week 32. Accordingly, he wore a chest binder for all recording sessions in the baselines through week 35, after which he was healed sufficiently to not require a binder or dressings.

He completed weekly intramuscular injections of testosterone cypionate for the purpose of masculinization. Dosage was set at 40 mg per injection (0.2 mL of 200 mg/ml solution) for weeks 1-42 and was increased to 60 mg per injection (0.3 mL of 200 mg/ml) for the remaining time period by his medical providers.

2.2. Data collection

Baseline data were collected three times in the two weeks prior to testosterone therapy onset. Data were then collected approximately every two weeks thereafter for one year. Data included videos of laryngeal structure and function collected via flexible laryngoscopic imaging, acoustic recordings, perceptual measures, and aerodynamic measures collected using clinically-available tools.21 Data were collected by study staff who were not blinded to the purpose of the study, but who did not have particular hypotheses about each measure.

2.2.1. Flexible laryngoscopy

Laryngoscopic video and acoustic signals were recorded with the Digital Stroboscopy System (Kay Elemetrics, Lincoln Park, NJ) with both halogen and strobe light source via a distal imaging chip (light source and video processor EPK-1000; pediatric endoscope, VNL-1070STK, 3.3 mm width; both Pentax, Tokyo, Japan). Video was digitized at 30 frames/s with a frame size of 480 × 360 pixels. Tasks are delineated in the appendix and included sustained /i/ at comfortable, high, and low pitch, and at normal, soft, and loud loudness; pitch and loudness glides; vocal diadochokinesis (DDK) maneuvers; and singing.

2.2.2. Acoustic recordings

Acoustic recordings generally occurred on the same day as the flexible laryngoscopy recordings and took place in sound-treated rooms (standard acoustic booths produced by IAC Acoustics). Acoustic recordings were made with a standard headset microphone (WH20; Shure, Niles, IL) placed approximately 6–10 cm from the mouth at a 45° angle from the midline. Signals were preamplified by an RME Quadmic II (RME, Haimhausen, Germany) and sampled at 44100 Hz with 16-bit resolution using a MOTU UltraLite-mk3 Hybrid (model UltraLite3Hy; MOTU, Cambridge, MA). Recordings were made using SONAR software (Cakewalk, Boston, MA). Tasks are delineated in the appendix and included producing isolated vowels, spontaneous speech, reading, and singing.

2.2.3. Aerodynamic recordings

Aerodynamic recordings were completed using the Phonatory Aerodynamic System (PAS; KayPentax, Lincoln Park, New Jersey. Tasks are delineated in the appendix and included /a-pa-pa-pa-pa-pa/ and /i-pi-pi-pi/ trains for intraoral subglottal pressure estimates and /pa-pa-pa-pa/ trains for phonation threshold pressure estimates. For the subglottal pressure estimates, the participant was instructed to produce /pa/ tokens at a comfortable pitch and loudness at a rate of approximately 90 bpm (demonstrated with metronome during first session). For the phonation threshold pressure measures, the participant was instructed to begin at a comfortable pitch and loudness and decrease his vocal volume until he was unable to phonate.

2.2.4. Habitual recordings

Habitual fo recordings were made using a hand-held recorder (H4n Handy Recorder, Zoom, Hauppauge, NY) and a neck-surface mounted accelerometer (Hot Spot accelerometer; K&K Sound, Coos Bay, OR). Accelerometer data were recorded at 44100 Hz. The participant applied the accelerometer with double-sided tape and carried the recorder in his pocket, recording his voice use for 3–4 hours during the work day. Recordings were made approximately monthly; the participant chose when to apply the sensor and was instructed to go about his day as normal. He did not log his activities separately from the acoustic recordings themselves.

2.2.5. Other data

The participant recorded his testosterone dosage schedule, his testosterone blood levels (only assessed when medically advised, so there is no baseline testosterone level prior to HRT), and his subjective impressions of his voice quality and ease of use, during both daily life and in particular when singing in a choral setting. At each acoustic recording session, the participant recorded his self-rating of difficulty and fatigue of singing in a high-pitched, whispered voice, on a range of 1 (easy, no fatigue) to 10 (most difficult, highest fatigue). The participant also completed the Voice-Related Quality of Life (V-RQOL) scale at each acoustic recording session.

2.2.6. Listener perception of gender

Eight young adult listeners (4 cisgender females, 4 cisgender males; M: 20.9 years, SD: 2.8 years) provided ratings of the participant’s gender during a single visit. All listeners were native speakers of American English, passed a hearing screening at a minimum threshold of 25 dB at octaves from 125 to 8000 Hz, and reported no history of speech, language, or hearing disorders. No listener had any known prior interactions with the primary study participant. Listeners provided written consent in accordance with the Boston University Institutional Review Board.a

The listener perception study was conducted in a quiet room at the study site. Recordings were presented via circumaural headphones (Sennheiser HD280 Pro). Listeners were first presented with two sample recordings from speakers other than the study participant (one cisgender female, one cisgender male), which they used to adjust the computer volume to a comfortable level. Presentation order of sample recordings was counter-balanced across listeners.

Experimental recordings were then presented to listeners, who were instructed to indicate the speaker’s gender and their confidence in that decision by sliding a marker along a 100 mm visual analog scale (VAS). Listeners were told that some recordings may sound alike, but that each recording should be rated separately and carefully. Listeners based their gender perception ratings on excerpts from the Rainbow Passage (sentences 2-4) read by the study participant at each timepoint over the course of the study. Excerpts were approximately 13 seconds (s) long (M: 13.33s, range: 12.6–14.6 s) and normalized for peak intensity using MATLAB. Listeners rated each sample twice; they were first presented with all 26 recordings in a random order followed by a second presentation of the 26 recordings in a different random order, for a total of 52 recordings. Listeners played each sample once and recorded their ratings on the VAS using a custom-designed interface developed in MATLAB. The VAS ranged from “definitely male” (0 mm) to “definitely female” (100 mm) with intermediate anchors of “probably male/female” (27 mm/73 mm) and “guessing male/female” (49 mm/51 mm).b Listeners could place the marker at any point along the VAS except the midpoint (50 mm) so that a choice between male and female genders was required. The listening task lasted approximately 20 minutes.

2.3. Data analysis

All data analyses, source (i.e., acoustic, laryngoscopy video, etc), and the speech or voice task over which the analysis was performed are summarized in Table 2. Details are below. Subjective measures are reported with reliability measures. Acoustic and aerodynamic measures were not reanalyzed for reliability, given that they are objective measures. When data analysts could be blinded, they were. In other cases, analysts were study staff and thus were not blinded. In all cases, there were no specific hypotheses about the amount or direction of change (or stability).

Table 2.

Overview of measures

Measure (units) Source Token
Pitch and loudness fo – min/max/mean (Hz, ST) Acoustic Sentences 2-4 of Rainbow Passage
fo – habitual (Hz) Accelerometer Daily conversation
fo variability (Hz, ST) Acoustic Sentences 2-4 of Rainbow Passage
Loudness and variability (dB SPL) Acoustic Sentences 2-4 of Rainbow Passage
Physiological measures Formants (Hz) and laryngeal height (cm) Acoustic Steady-state /ɑ/, /i/, / æ/, /eɪ/
Subglottal pressure estimates (cmH2O) Aerodynamic ɑ-pɑ-pɑ-pɑ-pɑ-pɑ, i–pi-pi-pi-pi-pi
Phonation threshold pressure (cmH2O) Aerodynamic pɑ-pɑ-pɑ-pɑ
Vibratory ratings Laryngoscopy video Comfortable modal sustained vowel
Voice quality Voice quality Expert listener ratings (CAPE-V) CAPE-V sentences
CPP (dB) Acoustic Sentences 2-4 of Rainbow Passage
LH ratio (dB) Acoustic Sentences 2-4 of Rainbow Passage
Airflow (L/s) Aerodynamic /ɑ/ during ɑ-pɑ-pɑ-pɑ-pɑ-pɑ
Jitter (%) Acoustic Steady-state /ɑ/
Shimmer (%) Acoustic Steady-state /ɑ/
Harmonics-to-noise ratio (dB) Acoustic Steady-state /ɑ/
Perception of gender Gender perceptual rating Listener ratings Sentences 2-4 of Rainbow Passage

2.3.1. Pitch and loudness correlates

All fo traces were exported from Praat22 after manual inspection and adjustments were made to Praat settings as needed. Custom scripts in MATLAB (Mathworks, Natick, MA) were used to convert fo traces to semitones for further processing. Because of the logarithmic relationship between fo in Hz and the perception of pitch, measures of differences in fo (e.g., standard deviation as a correlate of intonation, phonatory frequency range) in Hz will be misleading. This issue is resolved by converting all frequencies to semitones (ST) as in Eq 1, which represent each frequency as a change in frequency (f) from a reference frequency (fref).

ST=12×log2(ffref) Eq.1
2.3.1.1. Mean fo (correlate of pitch) and fo variability (correlate of intonation)

Mean fo was calculated in Hz and in ST, using the mean fo from the first baseline session (190.1 Hz) as fref. For measuring variability, fo traces were converted to semitones (ST) via the formula in Eq 1, in which f is the extracted frequency in Hz and fref is the mean fo of that session. Variability is then calculated as the standard deviation of fo traces in ST.

2.3.1.2. Phonatory frequency range (PFR) and pitch breaks

Phonatory frequency range (PFR) is calculated from the minimum modal frequency and maximum falsetto frequency that an individual can produce.23 The average adult PFR is 38 ST for cisgender men and 37 ST for cisgender women23 and is used widely in the voice clinic.24 PFR was calculated in ST using Eq. 1, with the maximum fo as f and the minimum fo as fref; maximum and minimum fo were extracted in Praat from a glissando or discrete half-steps up and down the scale, whichever produced more extreme fo.25

Pitch breaks during a descending glissando on /i/ during the laryngoscopy examinations were hand-notated during vibratory analysis (see 2.3.2.4 Vibratory ratings for more details). The time-locked acoustic signal was examined in Praat and the last fo before the break and the first fo after it were extracted. These are used as a marker of stability and control (i.e., as a trained singer, the participant should have no pitch breaks during a glissando) and as a marker of the location of the register shift from falsetto to modal.

2.3.1.3. Habitual fo

Accelerometer recordings were filtered between 50 and 450 Hz; pitch contours were extracted from Praat manually and evaluated using custom scripts in MATLAB.

2.3.1.4. Sound pressure level

Sound pressure level was calculated as the amplitude of the microphone signal normalized by a calibration procedure. During each recording session, an electrolarynx was placed at the lips and a sound level meter (CM-150; Galaxy Audio, Wichita, KS) was placed at the microphone. The root-mean-square (RMS) of the microphone amplitude during 3 different electrolarynx levels were regressed against the known dB SPL measured with the sound level meter.26 The resulting scaling factor and intercept were used to convert the RMS of the microphone signal during the Rainbow Passage to dB SPL. The RMS of the microphone amplitude was calculated over 50 ms windows and averaged across the entire sample.

2.3.2. Physiological changes

2.3.2.1. Formants and estimated vocal tract length

Formants were extracted from prolonged productions of vowels /ɑ, i, æ, eɪ / by a trained technician, blinded to study hypotheses, using Praat. Formants were calculated over a steady portion of the vowel for a total of 12–20 estimates per timepoint. Estimates were averaged within vowel type and date (e.g., all repetitions of /i/ were averaged together) and then across all four vowel types. The fourth formant was used to estimate vocal tract length in cm using Eq. 2, in which n is the formant number (4), c is the speed of sound in air (34,300 cm/s), and Fn is the measured formant location in Hz (approximating the fourth vocal tract resonance).

vocal tract length=(2n1)c4Fn Eq.2
2.3.2.2. Subglottal pressure estimates

Raw data were extracted from PAS software and processed further using custom MATLAB scripts. A semi-automated algorithm identified the point of maximum intraoral pressure of the /p/ productions preceding each vowel (i.e., /pɑ/); the peak pressures were each inspected by a trained technician who was blinded to the study hypotheses. Only the middle three productions in each string are used in order to avoid known effects of initial and final utterance positions on intraoral subglottal pressure estimates.27 Estimates were averaged over each utterance to get an average maximum subglottal pressure per timepoint.

2.3.2.3. Phonation threshold pressure

Raw data were extracted from PAS software and processed further using custom MATLAB scripts. These scripts allowed a trained technician, blinded to study hypotheses, to identify the final two peaks of pressure of the /p/ production in each /pɑ-pɑ-pɑ/ train (i.e., pressure during the bolded /p/ here: /pɑpɑpɑpɑpɑ/). The script extracted the maximum pressure during the final two peaks of each /pɑ-pɑ-pɑ/ train and averaged them. Phonation threshold pressure (PTP) for each timepoint was calculated as the average of that mean value over all /pɑ-pɑ-pɑ/ trains produced.

2.3.2.4. Vibratory ratings

Ratings of the laryngoscopy videos were made by a certified speech-language pathologist specializing in voice using the Voice-Vibratory Assessment With Laryngeal Imaging (VALI) form,28 which elicits ratings of a variety of factors including glottal closure, amplitude of vocal fold movement, magnitude of mucosal wave, phase closure, and regularity. All ratings were made based on video of comfortable sustained modal phonation. The rater was study staff and as such was not blinded to the purpose of the study; however, the videos were presented in pseudorandomized order so that the timepoint of each video was unknown to the rater.

2.3.3. Voice quality measures

Voice quality changes were assessed perceptually and via acoustic and aerodynamic correlates.

2.3.3.1. Expert listener ratings

A certified speech-language pathologist specializing in voice completed ratings via the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V)29 as implemented in a custom MATLAB graphical user interface. The speech-language pathologist listened to amplitude-normalized acoustic recordings of all six CAPE-V sentences and provided ratings for overall severity, roughness, breathiness, and strain. Samples were presented in pseudorandomized order, and twenty percent of the samples were repeated for intrarater reliability assessment. The rater was study staff and as such was not blinded to the purpose of the study.

2.3.3.2. Acoustic measures

Acoustic measures of voice quality were quantified using Analysis of Dysphonia in Speech and Voice software (ADSV)30 or Praat22. ADSV was used to calculate cepstral peak prominence (CPP) and low-high spectral ratio (LH ratio). Samples were sentences 2–4 of the Rainbow Passage and thus match the perceptual study (see 2.3.4 Listener perception of gender). Periods of silence between words were removed from each sample using a custom MATLAB script to obtain the most accurate values from the ADSV analysis. Analysis was conducted using the Rainbow Passage profile with the cepstral peak extraction range set to encompass the participant’s fo range during connected speech (90-300 Hz).

Praat’s “voice report” function was used to estimate jitter, shimmer, and harmonics-to-noise ratio (HNR). Although these measures are no longer preferred as measurements of voice quality,21 they are reported in the previous transmasculine literature and thus are presented here. A center portion of each vowel was identified by a trained technician and used for both formant calculation and voice quality measures (see 2.3.2.1 Formants and estimated vocal tract length). These were automatically extracted using custom MATLAB scripts to save extracted vowels as separate files and run a voice report using a custom Praat script. Although Praat utilizes several different algorithms to calculate jitter and shimmer, we report jitter (ppq5, %) and shimmer (apq5, %).

2.3.3.3. Airflow during vowels

Trains of /ɑ-pɑ-pɑ-pɑ/ were recorded and visualized using PAS software. Oral airflow (L/sec) was measured by manually selecting a stable portion in the center of each vowel. Only the middle three productions in each string were used to avoid possible effects of initial and final utterance positions. Airflow measurements were not taken from any production of /pɑ/ for which airflow did not return to 0 L/sec between syllables; four such productions were excluded. Airflow was averaged across productions for each timepoint.

2.3.4. Listener perception of gender

Intrarater reliability was assessed via repetition of each voice sample during the listener perception study. Listeners thus rated each sample twice, and Pearson’s correlation coefficients were calculated for each listener. Interrater reliability was assessed by calculating the intraclass correlation coefficient (ICC[C,k]) for consistency.31 All listener ratings were averaged to generate a single gender rating of the primary participant’s voice at each timepoint.

2.4. Statistical approach

Given both the large number of measures and the limitations inherent in a single-subject case study, we present and discuss full time courses of only a few measures of interest (fo of reading and habitual use, self-ratings of voice issues, and listener perception of gender). The remaining 30 measures are presented in full in the supplemental material for this paper. For each measure, we also performed two sample t-tests between baseline and final sessions (three baseline sessions and three final timepoints) to determine significant changes, as well as effect sizes (Cohen’s d) to indicate the magnitude of the changes. The p values are all uncorrected, as we wish to provide a summary statistic of which measures have likely changed given that all/many may be expected to change; that is, we have prioritized Type I errors over Type II errors. This is a conservative position in this case, such that if we conclude that something has not changed, it has likely not changed permanently. Although we are prioritizing long-term (permanent) changes with this data analysis method, all short-term changes (e.g., fo range instability) can be discerned from the graphs in the supplemental material and are discussed individually.

3. Results

Means and ranges are described below for all measures in detail, including notes of significant differences. Table 3 shows the mean and standard deviation of all measures over the three baseline sessions and the final three sessions, as well as effect sizes (Cohen’s d) and p values for changes between these baseline and final session averages (two-sample t-tests between three baseline timepoints and final three timepoints).

Table 3.

Effect sizes (Cohen’s d) for a variety of measures

Measure Baseline Final Effect
size
P
Mean SD Mean SD
Pitch and loudness fo (Hz) - reading 183.49 5.84 134.04 0.83 −11.86 <0.001
foSD (Hz) - reading 21.34 2.95 15.92 1.44 −2.34 0.046
fo (ST) - reading −0.74 0.58 −6.17 0.11 −13.03 <0.001
foSD (ST) - reading 1.94 0.31 2.01 0.20 0.28 0.75
Minimum pitch (Hz) 149 8.56 114 5.40 −4.86 0.004
Maximum pitch (Hz) 1294 20.9 848 37.65 −14.66 <0.001
Phonatory frequency range (ST) 37.46 1.27 34.73 1.42 −2.03 0.07
Loudness (dB SPL) 74.64 1.48 75.08 0.80 0.37 0.67
Loudness variability (dB SPL) 12.23 2.07 10.41 1.49 −1.01 0.28
Physiological 4th Formant location (Hz) 3694 33.3 3554 13.2 −5.52 0.002
Vocal tract length (cm) 16.3 0.14 16.9 0.06 5.74 0.002
Subglottal pressure (cm H2O) 8.17 0.98 7.23 0.89 −1.01 0.28
Phonation threshold pressure (cm H2O) 4.66 0.23 3.37 0.15 −6.60 <0.001
VALI AP Compression 1.00 0.00 0.67 0.58 −0.82 0.37
VALI Glottal closure 2.33 0.58 1.67 0.58 −1.15 0.23
VALI Phase closure 0.80 0.18 1.19 0.10 2.74 0.03
VALI Regularity 96.67 5.77 100.00 0.00 0.82 0.37
Voice quality CAPE-V Overall Severity (%) 3.70 0.99 1.04 0.44 −3.49 0.01
CAPE-V Roughness (%) 1.23 2.14 1.23 0.33 0 1
CAPE-V Breathiness (%) 3.32 1.15 0.00 0.00 −4.08 0.007
CAPE-V Strain (%) 0.00 0.00 0.00 0.00 0 1
CPP (dB) - reading 8.03 0.44 8.75 0.29 1.91 0.08
LH (dB) 35.65 0.66 29.38 1.40 −5.71 0.002
Airflow (L/s) 0.21 0.01 0.23 0.01 1.64 0.11
Jitter (ppq5) % 0.16 0.04 0.22 0.01 2.17 0.06
Shimmer (apq5) % 1.46 0.31 3.30 0.35 5.60 0.002
Harmonics-to-noise ratio (dB) 23.46 0.88 16.57 0.63 −8.97 <0.001
Perception of gender rating (100 = definitely female; 0 = definitely male) 96.2 1.24 30.03 2.83 −30.27 <0.001

Baseline values are over first three timepoints; final are over final three timepoints. Effect sizes are Cohen’s d between baseline and final sessions; p values are derived from two-sample t-tests over these six total timepoints (p < .05 are in bold).

3.1. Medical reports

The participant’s total testosterone levels were measured via blood test as medically indicated and are as follows: 408 ng/dL at week 13; 506 ng/dL at week 27; 413 ng/dL at week 41; 622 ng/dL at week 57. Typical adult cisgender male ranges are 250-827 ng/dL as per the clinical reference for the given immunoassay, while typical adult cisgender female ranges are undetectable. During cisgender male puberty, levels increase gradually from undetectable before the onset of puberty and then from 19 ng/dL to 482 ng/dL32.

3.2. Voice and singing self-report

The participant sang with a high-intensity choir that performs weekly sacred repertoire for a radio audience and a concert series of masterworks. He continued singing throughout this period of voice change (at least 6 hrs/week). He moved from second alto to tenor before voice change onset, continued as a tenor throughout the period of this study, and then sang as a baritone/bass as it became clear that both tenor parts sat unnaturally in his changed voice. Comfortable singing ranges changed from F3–D5 to G2–E4. Phonatory frequency range changes are discussed below. His V-RQOL scores did not waver from least-impacted scores (10/50 raw or 100/100 adjusted), indicating that, despite voice changes, he did not note any impact on his quality of life.

3.3. Pitch, pitch variability, and loudness

3.3.1. Speaking (reading and habitual)

During the three baseline sessions, the speaker’s mean fo during reading was 183 Hz. During the final three sessions, his mean fo was 134 Hz. Mean readin fo thus changed by −5.4 ST. When tracking his habitual pitch, however, the participant’s fo was 211 Hz (SD: 53 Hz) during a baseline measure and 137 Hz (SD: 28 Hz) during the final habitual data collection session, suggesting a change of −7.5 ST. Both of these are significant changes with large effect sizes.

The course of fo changes for both reading and habitual speaking tasks are shown in Figure 1A. Means and effect sizes between baseline (three baselines) and final (final three recordings) sessions are shown in Table 3. When binning the recordings into two-month groupings, the largest changes were seen between months 1–2 and 3–4 (−1.8 ST) and months 3–4 and 5–6 (−1.9 ST).

Figure 1.

Figure 1.

Panel A: Speaking fo (Hz) during a reading task (red circles) and during habitual speaking throughout the workday (grey open circles). Error bars are standard deviation. Panel B: Participant ratings of fatigue (purple, open circles) and difficulty (blue, closed circles). Panel C: Listener ratings of gender (red, closed circles), from 0 (definitely male) to 100 (definitely female), averaged across listeners. Error bars are standard error.

3.3.2. Speaking fo variability

In the first three baselines, the average speaking variability was 21.3 Hz; by the final three sessions, the variability was 15.9 Hz, which at first suggests a significant change. However, these apparent differences are an effect of the mean fo changing. When measured as the standard deviation in ST, speaking variability was 1.94 ST in baseline sessions and 2.01 ST in final sessions and not significantly different.

3.3.3. Phonatory Frequency Range (PFR)

Figure 2 shows the time course of maximum and minimum fo changes. In addition, pitch breaks during a descending glissando on /i/ are noted in vertical black lines as an indication of vocal instability and the note(s) of register change. The participant’s lowest producible fo lowered from D#3 to A2 (156 to 109 Hz), and highest fo changed from E6 to A5 (1318 to 890 Hz), both of which are significant changes with large effect sizes. If measured in Hz, one might interpret these changes as representing a reduction in range. However, when measured in semitones (ST) to account for the logarithmic relationship between Hz and perceived pitch, the highest note decreased by 6.3 ST and the lower note decreased by 6.2 ST, representing a stable pitch range that shifted down. Accordingly, the average PFR during the baseline sessions was 37.46 ST and 34.73 ST during the final sessions, which was not a significant change.

Figure 2.

Figure 2.

Maximum (open red triangles) and minimum (closed blue triangles) singing fo. Pitch breaks occurring during a descending glissando, indicative of a change from falsetto to modal, are indicated in vertical black bars. Frequency in Hz of octave Cs are noted on left, with middle C marked with dotted grey line. Musical notes marked with light grey lines with right labels.

3.3.4. Speaking amplitude

Loudness was approximated from microphone signal amplitude in dB SPL while the participant read sentences 2–4 of the Rainbow Passage. Mean amplitudes were unchanged when comparing the three baselines to the final three sessions: 74.6 dB SPL in baseline and 75.1 dB SPL in the final sessions. Amplitude variability, another correlate of intonation, was 12.2 dB SPL in baseline sessions and ended at 10.4 dB SPL during the final sessions (not significantly different).

3.4. Physiological measures

3.4.1. Formants and estimated vocal tract length

Formant location steadily decreased throughout the course of the experiment, resulting in an increasing estimated vocal tract length. The fourth formant had a mean value of 3693 Hz during the first three sessions and 3554 Hz during the final three sessions. These correspond to an estimated vocal tract length of 16.26 cm in the first sessions and 16.90 cm in the final three sessions, both of which are significant changes with large effect sizes.

3.4.2. Subglottal pressure estimates

Intraoral estimates of subglottal pressure suggested that it remained within normal ranges (5.40 cmH2O [SD 1.37]) for cisgender women and 6.65 cmH2O [SD 1.98] for cisgender men33,34) throughout the year. The mean subglottal pressure was 8.17 during baseline sessions and 7.23 during the final sessions, which was not a significant change.

3.4.3. Phonation threshold pressure

PTP measures gradually declined throughout the year from an average of 4.66 during the three baselines to 3.37 in the final three sessions. This change was significant with a large effect size.

3.4.4. Vibratory ratings

All vibratory ratings from the VALI remained consistent throughout the course the experiment,c with the exception of free edge contour. The rater (as had the SLP performing the flexible laryngoscopy) noted that there was a granuloma on the right vocal process for weeks 34–44. Follow-up with an otolaryngologist suggested that this finding was an intubation granuloma acquired when the participant underwent top surgery during week 32. As it was posterior to the vocal process and the participant typically had a posterior gap, this lesion likely did not affect his voice. The participant noted no differences in his voice during that time.

Glottal closure was typically rated with a small posterior gap or very small posterior gap (23/26 timepoints), with a rating of “posterior gap” at the first timepoint and “complete closure” at weeks 31 and 34. Anteroposterior compression remained low, between 0–1 out of 5 for all but weeks 4 and 29, which were rated 2. Mediolateral constriction was 0 at all timepoints except for the third baseline, which was rated as a 1. Regularity was rated 90–100% for all timepoints except weeks 6 and 12, which were rated at 70%.

The ratio of closed phase to open phase varied somewhat over the course of the experiment, with a mean ratio of 0.8:1 during the first three sessions and 1.19:1 during the final three sessions. The maximum ratio was 2:1 during week 27. The minimum was 0.66:1 during the baseline. This measure could not be calculated during week 12 because the strobe system during laryngoscopy did not reach a steady-state to track cycles. Phase symmetry was 100% across all timepoints except week 12, again because the stroboscopy did not track. Mucosal wave was rated an 8/10 at all timepoints. Amplitude was rated as a 4 for all timepoints except for week 20, which was rated a 2.

3.4.5. Self-assessment of fatigue and difficulty

Figure 1B shows the participant’s ratings of fatigue and difficulty while using his voice during the biweekly recordings. His ratings were initially low (1–3) in the baseline sessions and for several timepoints thereafter, eventually increasing to steady 4–8 ratings for the bulk of the experimental period, before returning to lower ratings (3–5) for the final sessions.

3.5. Voice quality

3.5.1. Expert perception of voice quality

All voice quality measures from the CAPE-V remained very low (indicating normal voice quality), with a maximum score of 5.1/100 for any percept by both the main rater and one additional rater who rated 20% of the samples for reliability. Statistically, some changes were significant. Overall severity reduced significantly from 3.7 to 1.04 and breathiness reduced from 3.32 over the three baselines and 0.0 over the final three sessions.

3.5.2. Acoustic and aerodynamic measures of voice quality

CPP increased from the start to end of the experimental period, with an average measure of 8.03 dB in the first three sessions and 8.75 dB in the final three sessions; these differences were not statistically significant and indicate a non-dysphonic voice quality throughout.35 CPP ranged from a minimum of 7.51 dB (first baseline) to a maximum of 10.11 dB (week 43), with an overall mean of 8.81 dB.

LH ratio steadily decreased throughout the course of the experiment. The mean LH value was 35.6 dB over the three baselines and 29.4 dB over the final three sessions, a significant change. LH ranged from 28.0 dB (week 48) to 36.1 dB (second baseline), with an overall mean of 31.6 dB; these are all within normal, non-dysphonic ranges (within M ± 1 SD from Lowell and colleages 35).

Airflow over the vowel /ɑ/ in a /ɑ-pɑ-pɑ-pɑ/ train had a mean of 0.21 L/s during the baselines (SD: 0.01) and 0.23 (SD: 0.01) during the final sessions, values that are within normal ranges and did not change significantly.27 Measures varied throughout the course of the year, with a range of 0.19–0.32 L/s and overall mean of 0.24 L/s.

Three common measures of voice stability and health have been previously reported as aberrant in the literature on trans men. Jitter, representing cycle-to-cycle variability in fo, was 0.16% during baseline sessions and 0.22% during the final sessions, which was not a significant change. Shimmer also increased from 1.49% to 3.30% from the baseline to final sessions, and harmonics-to-noise ratio decreased from 23.4 dB to 16.6 dB, both representing significant changes.

3.6. Listener perception of gender

Intrarater reliability ranged from 0.53 to 0.95 (M=0.82) for all listeners, and the ICC for interrater reliability was 0.96. Thus all listeners were considered sufficiently reliable and consistency across listeners was high. Figure 1C shows the listener ratings of the participant’s gender. The mean gender perceptual rating of the participant’s voice was 96.2 (SD: 6.3; 0=definitely male; 100=definitely female) during the baseline sessions and 30.0 (SD: 26.0) during the final sessions. The participant was reliably identified as female (≥65) through the first 15 weeks of testosterone therapy and reliably identified as male (≤35) after 37 weeks on testosterone. The mean gender perceptual rating between these timepoints was 42.7 (SD: 9.2), suggesting some ambiguity in the participant’s vocal presentation during this period. Listener perception of the participant’s gender showed the largest changes between weeks 15 and 17 (38.4) and weeks 25 and 28 (24.1).

4. Discussion

The purpose of this study was to comprehensively track one speaker’s voice during his first year of testosterone therapy for the purpose of masculinization.

4.1. Comparisons to previous transmasculine literature

Overall fo and the time course of the changes (see full details in supplemental material) observed in this study were consistent with the existing literature. The participant’s mean fo during reading ended at 134 Hz, resulting in a change of −5.4 ST. This change is as expected based on final fo values from the literature (shown in Table 1): Van Borsel7 (130–160 Hz), Nygren5 (125Hz), Irwig14 (87–128 Hz), and Deuster13 (96–140 Hz). This fo change is also consistent with reference values for a typical cisgender male range (85–155 Hz);36 however, some papers have suggested a more stringent cutoff of 131 Hz.2,6 Interestingly, the participant’s final speakin fo and modal singing range are nearly identical to his cisgender brother;d although all conclusions here must by nature be speculative, this finding suggests that changes may be driven by genetics and that future research may thus be able to predict what changes a transmasculine speaker could expect under testosterone. Further, while the participant’s voice changed perhaps moderately compared to other reports (−5.4 ST) and below the 8–12 ST37 change during cisgender male puberty, this degree of change may have been as expected for this speaker, rather than indicating a failure to drop further due to changes in laryngeal structure38,39 due to his age upon starting testosterone. Research does in fact suggest that the mechanism of fo-lowering induced by exogenous testosterone therapy is not identical to that which occurs during typical cisgender male puberty; specifically, the vocal folds are thought to increase mainly in mass rather than in mass and length, because the laryngeal cartilage grows minimally or not at all. This may help to explain the magnitude of fo change here.40

4.2. Vocal tract length

Cisgender men typically have lower formants due to body size (longer vocal tracts correspond with lower formants) and due to a lowering of the larynx during puberty, further lengthening the vocal tract. The average adult vocal tract length is 16.9 cm for cisgender men and 14.1 cm for cisgender women, with the large discrepancy emerging after approximately age 15 due to an increase in pharynx length rather than an increase in oral cavity length.41 The participant’s lowered formants suggest his vocal tract length increased to be consistent with mean average value (change from 16.26 cm to 16.90 cm). Studies of speech often focus on the first three formants (F1, F2, F3),17 as these can be volitionally modulated by tongue position (e.g., the strategy in voice feminization of “fronting” all sounds).42 However, higher magnitude formants, such as F4, examined here, are not as affected by articulation, and instead are more influenced by solely the length of the vocal tract from the glottis to the lips.43 To the listener, then, the lowered F4 values may be perceived as suggesting a more masculine voice regardless of F1-F3. Physiologically, these lower F4 values may have been caused by the participant’s larynx tilting and lowering on testosterone; functionally, this change could also be caused by the participant speaking with a consistently lowered laryngeal posture. Regardless, these results suggest that the participant’s formant resonance (F4) approximated those of cisgender men.

4.3. Voice quality

Previous reports have noted detrimental changes in voice quality following testosterone therapy in trans men. The results of our examinations of voice quality are not straightforward, but examination of the full traces (supplemental material) may provide a clearer picture.

Across the entire time period, LH ratio, jitter, shimmer, and HNR appeared to go towards a worsened voice qualitye and stabilize after week 20; of those, jitter had non-significant changes from baseline to final and the rest were significantly worse (see Table 3). CPP was variable throughout without a clear pattern, and did not significantly change from baseline to final sessions. Airflow increased during weeks 25 to 40, but appeared to return to baseline levels (non-significant change). Of these quantitative measures, only CPP is recommended as an acoustic marker for the assessment of voice quality.21

Expert listener perception noted a reduction in overall severity and breathiness, and all measures remained very low throughout (i.e., indicating a typical voice; maximally 5.1/100). Perceptual ratings of breathiness significantly decreased (see Table 3), indicating improved voice quality. Although we did not elicit self-perceived voice quality ratings, the participant did complete the V-RQOL, which remained at the least-impacted level, suggesting that any voice changes did not impact his day-to-day functioning. As a whole, it is unclear whether voice quality changed, and it appears to have not changed in a way that was perceptually meaningful.

Neither the participant nor the expert clinician noted a perception of “entrapment”, a voice quality change thought to be a consequence of exogenous testosterone therapy in transmasculine individuals and/or cisgender women, in which the vocal folds are thought to increase in mass but not in length and be trapped in a restrictive, ossified larynx.15,40,44 Perceptually, entrapment has been said to result in a voice that is weak, permanently hoarse, and devoid of the “right” harmonics.15 While our clinical protocols did not include a specific rating of “entrapment”, these percepts are also consistent with those experienced by individuals with hyperfunctional voice disorders, which are assessed with the protocols used here (e.g., the CAPE-V ratings). Even so, future research may wish to examine this specific percept in more detail.

It is unknown whether this participant’s experience represents a typical transmasculine experience. There have been reports of long-term changes in voice quality, both perceptually and those captured by some acoustic measures (albeit those that are generally no longer recommended), as well as the expected short-term vocal instability. What we can conclude, however, is that it is not inevitable that an individual’s voice quality will be negatively impacted. Transmasculine individuals are routinely warned (by medical staff, the trans community, and others) that their voice quality will suffer, which is of concern particularly to singers and other heavy voice users. Further careful research should examine self- and expert-listener perception as well as validated/recommended acoustic measures to determine the likelihood of long-term voice impacts.

4.4. Listener perception of gender

Very few studies9,10 have assessed how listeners perceive the gender of transmasculine speakers, and none have measured how these perceptions may change as speakers undergo testosterone therapy. The present case demonstrated that listener perception of the participant’s gender shifted over the course of the study. The shift in gender perception correlated strongly (r = 0.908, p < 0.001) with decreases in the participant’s fo. This association differs from the findings of Van Borsel and colleagues,10 who reported no correlation between fo and listener ratings of transmasculine speakers’ “maleness.” Note, however, that these were cross-speaker studies rather than within-speaker longitudinal measures as in the present study; thus the speakers had potentially all crossed the perceptual threshold. The point at which the shift in gender perception occurred in this case also differs from past studies. Here, the participant was perceived as male with an fo < 140 Hz, well below Spencer’s45 reported 160 Hz ceiling for male gender perception.

The speaker’s fo decreased fairly linearly from weeks 0–30, after which it remained steady (Figure 1). Interestingly, there was a non-linearity in the perception of the participant’s gender between an fo of 158 Hz and 150 Hz. This finding corresponds well to previous reports of typical cisgender male ranges (85–155 Hz) and typical cisgender female ranges (165–255 Hz).36

A potential factor in these different findings is listener characteristics. In a framework that regards listeners as active participants in gender attribution,8 each listener’s individual experiences with gender inform their assumptions about a speaker’s identity. These experiences are vast, changeable, and infinitely varied across listeners. No attempt was made to characterize the complex experiences with gender for the listeners in this study beyond collecting gender identity data. These data show that the listener perceptions reported here are from the perspectives of cisgender individuals. Those of individuals of other genders might differ in meaningful ways, as may those of listeners with experience with transmasculine friends or family. There is, of course, heterogeneity among the perceptions of cisgender individuals, given their own experiences with gender and with gender-expansive individuals. For example, while the gender ratings of most of our listeners followed a similar pattern, two listeners stood out for their unique responses. One submitted ratings that were almost all 0 (definitely male) or 100 (definitely female); the other always rated the participant as female but also showed the least intrarater reliability (all ratings ≥65; reliability = 0.53). Without further information on how these individuals define and interpret gender-based norms of communication, these results are seemingly anomalous, but they do call for further research into the influence of listener characteristics and experiences on gender perception. We also wish to note that we are not attempting to say that naive cisgender listeners are, or should be, the final arbiters of a speaker’s perceived gender. Instead, we wished to capture some aspect of how the participant’s voice might be perceived in the world, such as when he makes a phone call; this is a common source of misgendering and thus a concern that is frequently used as a measure of voice satisfaction/concern.5,8

4.5. Other factors

4.5.1. Hz versus ST

Some previous reports suggest a reduction in singing range, which is of concern to transmasculine speakers, particularly those who sing. However, our results match those of others, which suggest that, when measured in semitones, the participant’s phonatory frequency range shifted down by approximately 6 ST.

4.5.2. Intubation granuloma

The participant underwent surgery around week 32 after beginning testosterone. Following that surgery, the SLP performing the largyngoscopy noted an intubation granuloma (confirmed via an otolaryngologist). It was posterior to the vocal fold processes and did not appear to interfere with function; thus we considered this lesion an incidental finding, but note that clinicians should recall that transgender individuals are likely to have surgeries periodically and may suffer intubation trauma that should be considered in the case of sudden voice changes.

4.5.3. Chest binding

Transmasculine individuals often wear a restrictive chest binder and may use a hunched posture to mask feminine chest contours; both of these may affect proper respiratory support for voice production.1 The participant wore a chest binder for recording sessions in the baselines through week 35, after which he was healed enough from surgery to wear no binder or dressings. For this participant, there were no apparent changes between weeks 35 and 40 that might be directly attributable to the binder. However, this is an important consideration for clinicians and future studies in transmasculine individuals.

4.6. Baseline/Final sessions evaluation versus intermediate changes

For simplicity and due to the large number of measures and data types examined, we have presented primarily baseline and final session measures and indicated whether those measures changed during the course of testosterone therapy (see Table 3 for effect sizes and paired t-test results). Although Figures 1 and 2 show timecourses for several important measures, the timecourses of all measures may be of interest and are thus shown in the Supplemental materials. Although most measures show a pattern consistent with the t-test results in Table 3 (i.e., those that showed non-significant changes remained stationary throughout, and those that showed significant changes had a consistent trajectory throughout), a few outliers may be worth considering. For example, pitch range was consistent from baseline to the final sessions, but had a very reduced range (below 30 ST) during weeks 18-21. Similarly, airflow was unchanged between baseline and the final sessions, but was elevated weeks 27-30. This change preceded the granuloma, and thus may represent some other instability in the vocal motor control system.

4.7. Clinical recommendations

Clinical recommendations drawn from a single case study, even when comprehensive and faithful to reporting guidelines,19 are offered with sufficient caution regarding their generalizability. Nevertheless, the present case supports a departure from common advice to transmasculine speakers regarding the potential effects of testosterone on pitch range. Concerns about a reduction in pitch range were not borne out in this case. When measured in semitones, the participant’s range remained stable while shifting downward. This result will be of particular interest to transmasculine singers and their clinicians. Further, the different methods of eliciting fo (e.g., habitual speech during the workday versus elicited tasks of producing isolated vowels, reading, or spontaneous speech under recording conditions) resulted in different mean fo values. While those results are not unexpected, this does indicate that a stringent cutoff drawn from the literature is not recommended as a specific goal or measure of acceptably masculine voice. Similarly, pitch and pitch contours deemed masculine and feminine vary, sometimes widely, between cultures. Thus any goals must be set within the client’s cultural norms.

4.8. Limitations and future directions

This case study involved only one participant. In addition to the limitations inherent in any case study, this particular participant may not represent a typical transmasculine speaker beginning testosterone. During the course of the study, he worked in a research laboratory focused on voice and voice therapy and sang in a choir with professional singers and voice pedagogues. He was aware of the purpose of the recordings. Although he made no conscious effort to masculinize his voice except via testosterone, these results may not be representative of other transmasculine people. His age is also relevant (30–31), as many transmasculine people are now receiving hormone therapy at younger ages. As speakers age, their laryngeal cartilages gradually ossify.38,39 Thus younger speakers may see larger or different changes.

In addition, other acoustic measures have been suggested as markers of masculinity (e.g., center of gravity of /s/); we have focused on voice, specifically, but further work on articulatory and phonetic measures is warranted.46,47 The participant did not enroll in voice therapy or have voice lessons at any point before or during this research study, although he continued as an active choral singer. Future studies should examine the possible effects of voice therapy either concurrently or following voice lowering on testosterone. Evidence-based practice is vital in this area, especially given the social and economic barriers to voice therapy for transmasculine individuals.17

This study is an example of participatory action research in which the participant contributed to research design and research question formulation.20 This approach aims to ensure that concerns about outcomes and disparities held by members of the community being studied are not overlooked, as may occur when relying solely on outside researchers’ perspectives.17 As a single-subject case study, however, such contributions reflect the concerns of an individual member of the transmasculine community and so may not be representative of the larger group. Future work on transmasculine voice should consider the participatory action approach as a means of ensuring research is meaningful to the transmasculine community.

5. Conclusions

This paper presents a previously-unavailable in-depth assessment of one transmasculine individual’s speech and voice changes during the first year of hormone therapy with testosterone. Throughout the course of one year, the participant’s speaking fundamental frequency reduced from a typical cisgender female range to a typical cisgender male range. His singing range shifted approximately 6 ST, from an alto range to a baritone range. Physiological and perceptual measures indicated a healthy voice production throughout the course of the experiment. Interestingly, acoustic measures indicated that the participant’s vocal tract increased in length by 0.6 cm, suggesting that his larynx may have dropped or tilted as is expected in cisgender male puberty. Finally, listeners consistently attributed a male gender to the participant after approximately 37 weeks on testosterone. The data in this case study add to the scant literature on transmasculine voice. Future studies can assess a standard time-course for changes, which changes relate most to listener perception, and which factors could be targets of speech therapy for speakers who do not find their voices to be satisfactory.

Supplementary Material

1
2

Figure 3.

Figure 3.

View of the intubation granuloma on the right vocal process. Image is from week 36, one month following (unrelated) surgery. Granuloma was no longer visible during week 46.

6. Acknowledgments

The authors wish to thank Daniel Buckley for his help with laryngoscopy, perceptual evaluation, and discussions about this manuscript. We wish to further thank Jake Noordzij, Zach Morgan, Lauren Maclellan, and Lin Zhang for their help with data processing. Finally, we wish to thank Graham Grail and Carolyn Hodges-Simeon for their discussions about existing transmasculine literature and Jessica Kramer for discussions of participatory action research. This work was supported by the National Institutes of Health – National Institute on Deafness and Other Communication Disorders under grants F31 DC014872 (GJC), R01 DC015570 (CES), T32 DC013017 (Christopher A. Moore), and F32 DC017637 (GJC).

8. Appendix

8.1. Laryngoscopy protocol

Light source Tasks
Strobe Sustain /i/ at comfortable pitch/loudness
Soft /i/
Loud /i/
High pitch /i/
Low pitch /i/
Loudness glides (soft→loud→soft)
Vocal diadochokinesis (DDK) maneuvers:
/ihi ihi ihi ihi/, /isi isi isi isi/, /i/-sniff-/i/-sniff-/i/-sniff
Sing happy birthday
Sing warmups with melisma (i.e., do-mi-re-fa-mi-so etc, all on /i/)
Pitch glides (low→high and high→low) x 2
Halogen Loudness glides (soft→loud→soft)
Vocal diadochokinesis (DDK) maneuvers:
/ihi ihi ihi ihi/, /isi isi isi isi/, /i/-sniff-/i/-sniff-/i/-sniff
Sing happy birthday
Sing warmups with melisma (i.e., do-mi-re-fa-mi-so etc, all on /i/)
Pitch glides (low→high and high→low) x 2

8.2. Acoustic & aerodynamic recordings protocol

Measure Token Instructions
Isolated vowels /ɑ/, /i/, /u/, / æ/, /eɪ/, /oʊ/ Repeat 5 times each; ~3 seconds in duration
Spontaneous speech Suggested prompt:
-What did you do last weekend?
Reading Rainbow passage; CAPE-V sentences; reading passage loaded with corner vowels; SIT sentences Normal pitch/loudness
Singing /i/ Habitual pitch
Low pitch
High pitch
Low, glide to high
High, glide to low
Low, scale to high (step)
High, scale to low (step)
Sing warmups with melismas at different ranges
(i.e., do-mi-re-fa-mi-so etc, all on /i/)
Subglottal pressure estimates ɑ-pɑ-pɑ-pɑ-pɑ-pɑ, i–pi-pi-pi-pi-pi Record with mask and straw via PAS;
simultaneous mic & accel recordings (metronome x 90 bpm-ish)
2 recordings x 8 strings per recording
Normal pitch and loudness
Phonation threshold pressure pɑ-pɑ-pɑ-pɑ Record with mask and straw via PAS;
simultaneous mic & accel recordings
2 recordings
Begin at typical pitch and loudness
Decreased vocal volume until unable to phonate any longer
*no breaths between /pɑ/s

Footnotes

a

While the main study participant did actively consent to all procedures herein, his written consent was not required by the BU IRB, as a case study does not fall under the official definition of research.

b

Throughout this study, the terms “male” and “female” are used not as categories of biological sex, but as descriptors of gender. This usage is consistent with norms present in both colloquial speech and existing literature on voice and gender perception

c

Although there are no published clinical norms for the VALI, it is validated and the rating clinician indicated that all ratings were within clinically acceptable ranges and thus not indicative of hyperfunction or other voice issues.

d

Rainbow Passage sentences 2–4: 130 Hz (SD: 26.0 Hz); Phonatory frequency range: 100 Hz – 598 Hz; comfortable modal singing range: A2–D4

e

“Worse” is indicated by a lower LH ratio, HNR, and CPP, and higher jitter and shimmer.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

7 References

  • 1.Davies S, Papp VG, Antoni C. Voice and communication change for gender nonconforming individuals: Giving voice to the person inside. Int J Transgenderism. 2015;16(3):117–159. doi: 10.1080/15532739.2015.1075931 [DOI] [Google Scholar]
  • 2.Azul D Transmasculine people’s vocal situations: a critical review of gender-related discourses and empirical data. Int J Lang Commnn Disord. 2015;50(1):31–47. doi: 10.1111/1460-6984.12121 [DOI] [PubMed] [Google Scholar]
  • 3.Zimman L Transgender voices: Insights on identity, embodiment, and the gender of the voice. Lang Linguist Compass. 2018;12(8):e12284. doi: 10.1111/lnc3.12284 [DOI] [Google Scholar]
  • 4.McNeill EJM, Wilson JA, Clark S, Deakin J. Perception of Voice in the Transgender Client. J Voice. 2008;22(6):727–733. doi: 10.1016/J.JVOICE.2006.12.010 [DOI] [PubMed] [Google Scholar]
  • 5.Nygren U, Nordenskjöld A, Arver S, Södersten M. Effects on voice fundamental frequency and satisfaction with voice in trans men during testosterone treatment—A longitudinal study. J Voice. 2016;30(6):766.e23–766.e34. [DOI] [PubMed] [Google Scholar]
  • 6.Ziegler A, Henke T, Wiedrick J, Helou LB. Effectiveness of testosterone therapy for masculinizing voice in transgender patients: A meta-analytic review. Int J Transgenderism. 2018;19(1):25–45. doi: 10.1080/15532739.2017.1411857 [DOI] [Google Scholar]
  • 7.Van Borsel J, De Cuypere G, Rubens R, Destaerke B. Voice problems in female-to-male transsexuals. Int J Lang Commun Disord. 2000;35(3):427–442. doi: 10.1080/136828200410672 [DOI] [PubMed] [Google Scholar]
  • 8.Azul D, Arnold A, Neuschaefer-Rube C. Do Transmasculine Speakers Present With Gender-Related Voice Problems? Insights From a Participant-Centered Mixed-Methods Study. J Speech Lang Hear Res. 2018;61(1):25. doi: 10.1044/2017_JSLHR-S-16-0410 [DOI] [PubMed] [Google Scholar]
  • 9.Scheidt D, Kob M, Willmes K, Neuschaefer-Rube C. Do we need voice therapy for female-to-male transgenders? In: 2004 IALP Congress Proceedings. Brisbane, Australia; 2004. [Google Scholar]
  • 10.Van Borsel J, de Pot K, De Cuypere G. Voice and Physical Appearance in Female-to-Male Transsexuals. J Voice. 2009;23(4):494–497. doi: 10.1016/j.jvoice.2007.10.018 [DOI] [PubMed] [Google Scholar]
  • 11.Cosyns M, Van Borsel J, Wierckx K, et al. Voice in female-to-male transsexual persons after long-term androgen therapy. Laryngoscope. 2014;124(6):1409–1414. doi: 10.1002/lary.24480 [DOI] [PubMed] [Google Scholar]
  • 12.Damrose EJ. Quantifying the impact of androgen therapy on the female larynx. Anris Nasus Larynx. 2009;36(1):110–112. doi: 10.1016/J.ANL.2008.03.002 [DOI] [PubMed] [Google Scholar]
  • 13.Deuster D, Matulat P, Knief A, et al. Voice deepening under testosterone treatment in female-to-male gender dysphoric individuals. Eur Arch Oto-Rhino-Laryngology. 2016;273(4):959–965. doi: 10.1007/s00405-015-3846-8 [DOI] [PubMed] [Google Scholar]
  • 14.Irwig MS, Childs K, Hancock AB. Effects of testosterone on the transgender male voice. Andrology. 2017;5(1):107–112. doi: 10.1111/andr.12278 [DOI] [PubMed] [Google Scholar]
  • 15.Constansis AN. The Changing Female-to-Male (FTM) Voice. Radic Musicol. 2008;3. [Google Scholar]
  • 16.Hancock AB, Childs KD, Irwig MS. Trans Male Voice in the First Year of Testosterone Therapy: Make No Assumptions. J Speech Lang Hear Res. 2017;60(9):2472. doi: 10.1044/2017_JSLHR-S-16-0320 [DOI] [PubMed] [Google Scholar]
  • 17.Zimman L Voices in transition: Testosterone, transmasculinity, and the gendered voice among female-to-male transgender people. Linguist Grad Theses Diss. January 2012. [Google Scholar]
  • 18.Azul D, Nygren U, Södersten M, Neuschaefer-Rube C. Transmasculine People’s Voice Function: A Review of the Currently Available Evidence. J Voice. 2017;31(2):261.e9–261.e23. doi: 10.1016/J.JVOICE.2016.05.005 [DOI] [PubMed] [Google Scholar]
  • 19.Gagnier JJ, Kienle G, Altman DG, Moher D, Sox H, Riley D. The CARE guidelines: consensus-based clinical case reporting guideline development. J Med Case Rep.2013;7(1):223. doi: 10.1186/1752-1947-7-223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Minkler M, Wallerstein N, eds. Community-Based Participatory Research for Health : From Process to Outcomes. 2nd ed. San Francisco, CA: Jossey-Bass; 2013. [Google Scholar]
  • 21.Patel RR, Awan SN, Barkmeier-Kraemer J, et al. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. Am J Speech-Language Pathol. June 2018:1. doi: 10.1044/2018_AJSLP-17-0009 [DOI] [PubMed] [Google Scholar]
  • 22.Boersma P, Weenink D. Praat: doing phonetics by computer. 2015. [Google Scholar]
  • 23.Hollien H, Dew D, Philips P. Phonational Frequency Ranges of Adults. J Speech Lang Hear Res. 1971;14(4):755. doi: 10.1044/jshr.1404.755 [DOI] [PubMed] [Google Scholar]
  • 24.Zraick RI, Nelson JL, Montague JC, Monoson PK. The effect of task on determination of maximum phonational frequency range. J Voice. 2000;14(2):154–160. doi: 10.1016/S0892-1997(00)80022-3 [DOI] [PubMed] [Google Scholar]
  • 25.Barrett EA, Lam W, Yiu EML. Elicitation of minimum and maximum fundamental frequency and vocal intensity: Discrete half steps versus glissando. J Voice. 2018;0(0). doi: 10.1016/j.jvoice.2018.09.023 [DOI] [PubMed] [Google Scholar]
  • 26.McKenna VS, Stepp CE. The relationship between acoustical and perceptual measures of vocal effort. J Acoust Soc Am. 2018;144(3):1643–1658. doi: 10.1121/1.5055234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Holmberg EB, Hillman RE, Perkell JS. Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. J Acoust Soc Am. 1988;84(2):511–529. doi: 10.1121/1.396829 [DOI] [PubMed] [Google Scholar]
  • 28.Poburka BJ, Patel RR, Bless DM. Voice-Vibratory Assessment With Laryngeal Imaging (VALI) Form: Reliability of Rating Stroboscopy and High-speed Videoendoscopy. J Voice. 2017;31(4):513.e1–513.e14. doi: 10.1016/J.JVOICE.2016.12.003 [DOI] [PubMed] [Google Scholar]
  • 29.Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, Hillman RE. Consensus Auditory-Perceptual Evaluation of Voice: Development of a Standardized Clinical Protocol. Am J Speech-Language Pathol. 2009;18(2):124. doi: 10.1044/1058-0360(2008/08-0017) [DOI] [PubMed] [Google Scholar]
  • 30.Awan SN. Analysis of dysphonia in speech and voice: An application guide. 2011. [Google Scholar]
  • 31.McGraw KO, Wong SP. Forming Inferences about Some Intraclass Correlation Coefficients. Psychol Methods. 1996. doi: 10.1037/1082-989X.1.1.30 [DOI] [Google Scholar]
  • 32.August GP, Grumbach MM, Kaplan SL. Hormonal Changes in Puberty: III. Correlation of Plasma Testosterone, LH, FSH, Testicular Size, and Bone Age with Male Pubertal Development. J Clin Endocrinol Metab. 1972;34(2):319–326. doi: 10.1210/jcem-34-2-319 [DOI] [PubMed] [Google Scholar]
  • 33.Zraick RI, Smith-Olinde L, Shotts LL. Adult Normative Data for the KayPENTAX Phonatory Aerodynamic System Model 6600. J Voice. 2012;26(2):164–176. doi: 10.1016/J.JVOICE.2011.01.006 [DOI] [PubMed] [Google Scholar]
  • 34.Zraick RI, Smith-Olinde L, Shotts LL. Erratum: “Adult Normative Data for the KayPENTAX Phonatory Aerodynamic System Model 6600.’” Journal of Voice. 2012;26:164–176.” J Voice. 2013;27:2-4. doi: 10.1016/j.jvoice.2011.01.006 [DOI] [PubMed] [Google Scholar]
  • 35.Lowell SY, Colton RH, Kelley RT, Mizia SA. Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech. J Voice 2013;27(4):393–400. doi: 10.1016/J.JVOICE.2013.02.005 [DOI] [PubMed] [Google Scholar]
  • 36.Fitch JL, Holbrook A. Modal Vocal Fundamental Frequency of Young Adults. Arch Otolaryngol - Head Neck Snrg. 1970;92(4):379–382. doi: 10.1001/archotol.1970.04310040067012 [DOI] [PubMed] [Google Scholar]
  • 37.Hollien H, Green R, Massey K. Longitudinal research on adolescent voice change in males. J Acoust Soc Am. 1994;96(5):2646–2654. doi: 10.1121/1.411275 [DOI] [PubMed] [Google Scholar]
  • 38.Hately W, Evison G, Samuel E. The Pattern of Ossification in the Laryngeal Cartilages: A Radiological Study. Vol 38; 1965. [DOI] [PubMed] [Google Scholar]
  • 39.Garvin HM. Ossification of Laryngeal Structures as Indicators of Age. J Forensic Sci. 2008;53(5):1023–1027. doi: 10.1111/j.1556-4029.2008.00793.x [DOI] [PubMed] [Google Scholar]
  • 40.Romano T The Singing Voice During the First Two Years of Testosterone Therapy: Working with the Trans or Gender Queer Voice. Voice Opera Grad Theses Diss. May 2018. [Google Scholar]
  • 41.Goldstein UG. An articulatory model for the vocal tracts of growing children. 1977. [Google Scholar]
  • 42.Adler RK, Hirsch S, Pickering J. Voice and Communication Therapy for the Transgender/Gender Diverse Client : A Comprehensive Clinical Guide. 3rd ed.; 2018. [Google Scholar]
  • 43.Lammert AC, Narayanan SS. On Short-Time Estimation of Vocal Tract Length from Formant Frequencies Larson CR, ed. PLoS One. 2015;10(7):e0132193. doi: 10.1371/journal.pone.0132193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Baker J A report on alterations to the speaking and singing voices of four women following hormonal therapy with virilizing agents. J Voice. 1999;13(4):496–507. doi: 10.1016/S0892-1997(99)80005-8 [DOI] [PubMed] [Google Scholar]
  • 45.Spencer LE. Speech Characteristics of Male-to-Female Transsexuals: A Perceptual and Acoustic Study. Folia Phoniatr Logop. 1988;40(1):31–42. doi: 10.1159/000265881 [DOI] [PubMed] [Google Scholar]
  • 46.Zimman L Gender as stylistic bricolage: Transmasculine voices and the relationship between fundamental frequency and /s/. Lang Soc. 2017. doi: 10.1017/s0047404517000070 [DOI] [Google Scholar]
  • 47.Zimman L Variability in /s/ among transgender speakers: Evidence for a socially grounded account of gender and sibilants. Linguistics. 2017. doi: 10.1515/ling-2017-0018 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES