Abstract
Context
Trinucleotide repeats in the androgen receptor have been proposed to influence testosterone signaling in men, but the clinical relevance of these trinucleotide repeats remains controversial.
Objective
To examine how androgen receptor trinucleotide repeat lengths affect androgen-related traits and disease risks and whether they influence the clinical importance of circulating testosterone levels.
Methods
We quantified CAG and GGC repeat lengths in the androgen receptor (AR) gene of European-ancestry male participants in the UK Biobank from whole-genome and whole-exome sequence data using ExpansionHunter and tested associations with androgen-related traits and diseases. We also examined whether the associations between testosterone levels and these outcomes were affected by adjustment for the repeat lengths.
Results
We successfully quantified the repeat lengths from whole-genome and/or whole-exome sequence data in 181 217 males. Both repeat lengths were shown to be positively associated with circulating total testosterone level and bone mineral density, whereas CAG repeat length was negatively associated with male-pattern baldness, but their effects were relatively small and were not associated with most of the other outcomes. Circulating total testosterone level was associated with various outcomes, but this relationship was not affected by adjustment for the repeat lengths.
Conclusion
In this large-scale study, we found that longer CAG and GGC repeats in the AR gene influence androgen resistance, elevate circulating testosterone level via a feedback loop, and play a role in some androgen-targeted tissues. Generally, however, circulating testosterone level is a more important determinant of androgen action in males than repeat lengths.
Keywords: androgen receptor, testosterone, trinucleotide repeat, whole-genome sequence, whole-exome sequence
Male sex hormones, testosterone and its metabolites, are called androgens and positively regulate cell growth, proliferation, and anabolism. Androgen signaling plays pivotal roles not only in the development and function of the reproductive system in males but also in carcinogenesis, body composition, neurodegenerative disease, hair loss, erythropoiesis, psychiatric disorders, and metabolism, among others, by exerting various effects in various tissues (1-3). Circulating testosterone levels are subject to complicated regulation, such as binding to SHBG and albumin and an LH-mediated feedback loop (4-6). Recently, genetic determinants of circulating testosterone levels and their associations with diseases were revealed in more than 400 000 UK Biobank participants (5, 7). Despite this complexity, most biological effects of androgen signaling are ultimately dependent upon the androgen receptor (AR).
The AR is a ligand-activated nuclear receptor whose gene is located on chromosome X, and its receptor activity is known to be affected by a CAN trinucleotide repeat [(CAG)22 CAA] in the first exon encoding a polyglutamine chain (1, 2). CAG repeats are associated with a polymorphism (rs3032358) where the reference allele has 22 CAG repeats (or 23 GCA repeats) (8). It has been shown, mainly based on data in vitro, that a long CAG repeat suppresses the AR activity as a transcription factor (1, 2), and an extremely long CAG repeat (38 CAG repeats or more) causes a rare X-linked adult-onset neurodegenerative disease, spinal and bulbar muscular atrophy, or Kennedy's disease, which is the first coding repeat expansion disorder discovered in humans (9, 10). However, it remains controversial as to whether CAG repeat length affects other androgen-related traits or diseases in humans (1, 2). This may be because most previous studies had small sample sizes and the effect of CAG repeat length on circulating testosterone levels was not fully considered.
The first exon of the human AR gene also has a GGN trinucleotide repeat [(GGT)3 GGG (GGT)2 (GGC)17] encoding a polyglycine chain (1). GGC trinucleotide repeat is associated with a polymorphism (rs746853821) where the reference allele has 17 GGC repeats (11). A long GGC repeat also suppresses the AR activity, yet with less evidence (1, 12), and little is known about its clinical importance.
In the field of genetics, growing attention has been focused on short tandem repeats including trinucleotide repeats, and novel methods and databases have been developed, contributing to the identification of novel short tandem repeats in humans (13-18). ExpansionHunter is a newly developed tool (19), and it has been recently reported that quantification of CAG repeat length in the AR gene from whole-genome sequence (WGS) data using ExpansionHunter was successfully validated in comparison with the conventional PCR fragment sizing and that CAG repeat length was actually quantified in more than 70 000 males and females (20).
Here we show that CAG and GGC repeat lengths in the AR gene of more than 180 000 male participants in the UK Biobank were successfully quantified using ExpansionHunter, enabling us to examine how they affect androgen-related traits and diseases at scale. We also examined whether prediction of androgen-related outcomes using circulating total testosterone level could be improved by taking CAG and GGC repeat lengths into account. We reasoned that if the effect of testosterone was dependent upon the length of these repeats, then the association between testosterone levels and androgen-mediated outcomes would strengthen once the length of the repeats was accounted for. Overall, these results shed light on the role of common trinucleotide repeats in diseases.
Materials and Methods
Study Cohort
We used the UK Biobank for this study (21). As was summarized previously (22), between 2006 and 2010, approximately 500 000 participants aged 40 to 69 years were enrolled in the UK Biobank at multiple recruitment centers in the United Kingdom. After enrollment, the participants underwent anthropometric measurements, responded to questionnaires about lifestyle and medical history, and consented to health care record access. During follow-up, health care records were retrieved from the Hospital Episode Statistics database and the national death and cancer registries. Blood samples were collected and were used for genome-wide or exome-wide genotyping. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). All UK Biobank participants provided informed consent at recruitment, and data of those who withdrew consent were excluded.
Data Collection and Definitions
European-ancestry male participants in the UK Biobank were identified based on sex (data field 31, same as reported later) and genetic ethnic grouping (22 006), and their WGS and whole-exome sequence (WES) CRAM files were accessed in the DNAnexus platform. Clinical data were collected in and downloaded from the DNAnexus platform.
Collected data include age at recruitment (21 022), genetic principal components (22 009), UK Biobank assessment center (54), total testosterone (30 850), SHBG (30 830), self-reported hair/balding pattern (2395), self-reported date N46 first reported (male infertility) (132 084), self-reported age first had sexual intercourse (2139), self-reported lifetime number of sexual partners (2149), standing height (50), weight (21 002), self-reported usual walking pace (924), hand grip strength (46, 47), red blood cell count (30 010), hematocrit (30 030), neuroticism score (20 127), and glycated hemoglobin (30 750) at baseline. Total fat mass (23 278) and total lean mass (23 280) were assessed by dual-energy X-ray absorptiometry at the imaging visit (7). Diagnoses with International Classification of Diseases, Tenth Revision codes (41 270) and age at death (40 007) were collected in August 2023.
Fat mass and lean mass were adjusted for standing height (divided by square of standing height), and the mean of both hand grip strengths was adjusted for body weight (divided by body weight).
Total testosterone was quantified by chemiluminescent immunoassay (competitive binding) on Beckman Coulter DXI 800 from serum samples (23). Estimated bone mineral density (eBMD) of the heel was derived from quantitative ultrasound speed of sound and broadband ultrasound attenuation (78, 3144, 3146, 3147, 3148, 4101, 4103, 4104, 4105, 4106, 4120, 4122, 4123, 4124, and 4125), as described previously (24). Continuous traits were normalized by mean and SD, if needed.
In self-reporting questionnaires, those answering “Do not know,” “Prefer not to answer,” or “None of the above” were excluded from analyses. Lifetime number of sexual partners was set to 0 if self-reported age first had sexual intercourse was “Never had sex”; otherwise self-reported lifetime number of sexual partners was used. Male-pattern baldness was defined as “Pattern 2,” “Pattern 3,” or “Pattern 4” (with the control of “Pattern 1”) in self-reported hair/balding pattern. Male infertility was defined as the presence of any date (with the control of “NA”) in self-reported date N46 first reported. Slow walking pace was defined as “Steady average pace” or “Slow pace” (with the control of “Brisk pace”) in self-reported usual walking pace.
Other diseases were defined by International Classification of Diseases, Tenth Revision codes; prostate cancer was defined by C61 or D07; benign prostatic hyperplasia by N40; testicular cancer by C62; osteoporotic fracture by S22.0, S22.1, S32.0, S32.7, S42.2, S42.3, S52.5, S52.6, S72.0, S72.1, M48.4, M48.5, M49.5, or T08 (25); polycythemia by D45 or D75.1; depression by F32 or F33 (26); and type 2 diabetes by E11.
AR Genotyping
ExpansionHunter (Illumina Inc.) software version 5.0.0 was used to quantify the length of GCA repeat with a starting position of chrX:67545317 (GRCh38.p14, same as reported later) and that of GGC repeat with a starting position of chrX:67546515 of the AR gene from WGS and WES CRAM files. Reads were classified into 3 types: SPANNING, FLANKING, or INREPEAT (Fig. 1) (19). CAG repeat length was calculated by subtracting 1 from GCA repeat length. An extremely long CAG repeat was defined as 38 CAG repeats or more (20).
Figure 1.
Schematic description of types of reads. SPANNING reads are longer than the repeat, FLANKING reads are close to the repeat, and INREPEAT reads are shorter than the repeat (19).
Correlation and Regression Analysis
Pearson correlation was used to examine linear correlations between trinucleotide repeat lengths. The effects of CAG and GGC repeat lengths on androgen-related traits or diseases were estimated by linear or logistic regression with adjustment for age at recruitment, 10 ancestry principal components, and assessment center in those whose CAG and GGC repeat lengths and other covariates were available. The effects of circulating total testosterone level were estimated by linear or logistic regression with adjustment for age at recruitment, 10 ancestry principal components, assessment center, and SHBG, with or without CAG and GGC repeat lengths, in those whose CAG and GGC repeat lengths, circulating total testosterone level, and other covariates were available. The Bonferroni method was used for multiple-comparison correction. All analyses were conducted and all plots were created using R Studio version 4.2.2.
Results
Quantification of CAG and GGC Trinucleotide Repeat Lengths in the AR Gene
First, we quantified CAG repeat lengths in the AR gene of European-ancestry male participants in the UK Biobank from WGS data and WES data using the ExpansionHunter software. WGS data were available in 75 449 European-ancestry males, and CAG repeat length was successfully quantified in 75 269 of them. In over 99% of them, the type of reads was SPANNING, in which the repeat was shorter than reads and spanned by them (Fig. 1). WES data were available in 181 107 European-ancestry males, and CAG repeat length was successfully quantified in all of them, with a type of reads of SPANNING in over 60% of them and with a type of INREPEAT in which reads were shorter than the repeat and fully contained in it in about 20% of them (Fig. 1, Table 1).
Table 1.
Summary of CAG and GGC trinucleotide repeat length quantification
| Total | Type of reads | |||||
|---|---|---|---|---|---|---|
| SPANNING | FLANKING | INREPEAT | ||||
| CAG | Whole-genome | n (%) | 75 269 (100.0) | 75 081 (99.8) | 187 (0.2) | 1 (0.0) |
| Mean (SD) | 21.9 (3.0) | 21.9 (3.0) | 23.7 (4.9) | 54.0 (−) | ||
| Whole-exome | n (%) | 181 107 (100.0) | 114 904 (63.4) | 26 864 (14.8) | 39 339 (21.7) | |
| Mean (SD) | 22.1 (3.1) | 20.2 (1.9) | 23.7 (0.7) | 26.3 (1.9) | ||
| GGC | Whole-genome | n (%) | 75 365 (100.0) | 75 356 (100.0) | 9 (0.0) | 0 (0.0) |
| Mean (SD) | 17.0 (2.0) | 17.0 (2.0) | 16.6 (1.6) | — | ||
| Whole-exome | n (%) | 181 107 (100.0) | 179 667 (99.2) | 1350 (0.7) | 90 (0.0) | |
| Mean (SD) | 17.0 (2.0) | 16.9 (2.0) | 19.5 (2.1) | 24.1 (2.9) | ||
Summary of CAG and GGC trinucleotide repeat lengths in the AR gene quantified from whole-genome sequence data or whole-exome sequence data using ExpansionHunter.
CAG repeat length was distributed almost normally ranging from 5 to 54 (WGS) and from 4 to 43 (WES). The mean was 21.9 (WGS) and 22.1 (WES) (Fig. 2A, Table 1), equivalent to that in the reference allele of rs3032358 (23 GCA repeats or 22 CAG repeats). An extremely long CAG of 38 repeats or more was observed in 22 males (WGS; 0.029%) and 46 males (WES; 0.025%), and the prevalence was considered to be roughly equivalent to that reported in a previous study, about 0.03% (20). The mean CAG repeat length was the shortest in those with a type of SPANNING and the longest in those with a type of INREPEAT from both sequence data (Table 1).
Figure 2.
Quantification of CAG and GGC trinucleotide repeat lengths in the AR gene. (A) Distribution of CAG trinucleotide repeat lengths of males quantified from WGS data (n = 75 269) or from WES data (n = 181 107). (B) Association of CAG trinucleotide repeat length quantified from WGS data and that quantified from WES data (n = 75 158). (C) Distribution of GGC trinucleotide repeat lengths of males quantified from WGS data (n = 75 365) or from WES data (n = 181 107). (D) Association of GGC trinucleotide repeat length quantified from WGS data and that quantified from WES data (n = 75 254). (E) Association of CAG trinucleotide repeat length and GGC trinucleotide repeat length quantified from WGS (n = 75 207). (F) Association of CAG trinucleotide repeat length and GGC trinucleotide repeat length quantified from WES (n = 181 107).
Abbreviations: WES, whole-exome sequence; WGS, whole-genome sequence.
In 75 158 males whose CAG repeat length was successfully quantified both from WGS and WES data, CAG repeat length from the latter was shown to be strongly correlated with that from the former (r = 0.95) (Fig. 2B). Given the usefulness of ExpansionHunter in quantifying CAG repeat length from WGS shown in a previous report (20), our results suggest that CAG repeat length quantified from WES data using ExpansionHunter would be well correlated with that quantified using the conventional PCR fragment sizing.
Similarly, we quantified GGC repeat length using ExpansionHunter from WGS data in 75 365 males and from WES data in 181 107 males. The type of reads was SPANNING in over 99% of them (Table 1).
GGC repeat length was distributed almost normally ranging from 3 to 30 (WGS) and from 3 to 29 (WES). The mean was 17.0 (WGS and WES) (Fig. 2C, Table 1), equivalent to that in the reference allele of rs746853821 (17 repeats). In 75 254 males whose GGC repeat length was successfully quantified both from WGS and WES data, GGC repeat length from the latter was shown to be strongly correlated with that from the former (r = 0.99) (Fig. 2D).
Both CAG and GGC trinucleotide repeat lengths were successfully quantified from either WGS or WES data in 181 217 males. In 75 207 males in whom both CAG and GGC repeat lengths were quantified from WGS data, the 2 repeat lengths were not correlated with each other (r = −0.063) (Fig. 2E). Similarly, in 181 107 males in whom both CAG and GGC repeat lengths were quantified from WES data, the 2 repeat lengths were not correlated with each other (r = −0.052) (Fig. 2F).
Associations Between CAG and GGC Repeat Lengths and Androgen-related Traits and Diseases
We hypothesized that CAG and GGC trinucleotide repeat lengths could be associated with androgen-related traits (Table 2) and diseases (Table 3).
Table 2.
Summary of continuous traits of males for regression analyses
| Whole-exome | Whole-genome | |||
|---|---|---|---|---|
| n | Mean (SD) | n | Mean (SD) | |
| Age at recruitment (year) | 181 107 | 57.1 (8.1) | 75 207 | 57.1 (8.1) |
| Total testosterone (nM) | 171 263 | 12.0 (3.7) | 71 060 | 12.0 (3.7) |
| SHBG (nM) | 158 444 | 39.9 (16.8) | 65 883 | 39.9 (16.7) |
| Lifetime number of sexual partners (person) | 150 125 | 10.2 (92.5) | 62 418 | 10.4 (110.0) |
| eBMD (g/cm2) | 173 781 | 0.565 (0.122) | 71 982 | 0.566 (0.122) |
| Body height (cm) | 180 686 | 176.0 (6.8) | 75 009 | 176.0 (6.7) |
| Fat mass adjusted for body height (kg/m2) | 15 561 | 8.0 (2.8) | 7635 | 8.0 (2.8) |
| Lean mass adjusted for body height (kg/m2) | 15 561 | 17.8 (1.6) | 7635 | 17.9 (1.6) |
| Grip strength adjusted for body weight (kg/kg) | 179 827 | 0.468 (0.114) | 74 636 | 0.468 (0.113) |
| RBC (106 cells/µL) | 176 280 | 4.73 (0.37) | 73 215 | 4.74 (0.37) |
| Hct (%) | 176 280 | 43.3 (3.0) | 73 215 | 43.4 (3.0) |
| Neuroticism score | 149 840 | 3.6 (3.2) | 62 241 | 3.6 (3.2) |
| HbA1c (%) | 172 890 | 5.47 (0.67) | 71 818 | 5.46 (0.65) |
| Age at death (year) | 21 076 | 71.3 (7.4) | 8223 | 71.3 (7.3) |
Summary of continuous traits of males of European ancestry for regression analyses in whom CAG and GGC repeat lengths quantified from whole-genome or whole-exome sequence data and other covariates were available. Lifetime number of sexual partners and neuroticism score were based on self-reported questionnaire. Fat mass and lean mass were assessed by dual-energy X-ray absorptiometry.
Abbreviations: eBMD, estimated bone mineral density; HbA1c, glycated hemoglobin; Hct, hematocrit; RBC, red blood cell count.
Table 3.
Summary of binary traits of males for regression analyses
| Whole-exome | Whole-genome | |||
|---|---|---|---|---|
| Case (n) | Control (n) | Case (n) | Control (n) | |
| Male-pattern baldness | 122 667 | 57 234 | 50 951 | 23 761 |
| Prostate cancer | 12 458 | 168 649 | 5167 | 70 040 |
| BPH | 22 785 | 158 322 | 9439 | 65 768 |
| Testicular cancer | 296 | 180 811 | 125 | 75 082 |
| Male infertility | 435 | 180 672 | 194 | 75 013 |
| Osteoporotic fracture | 5520 | 175 587 | 2281 | 72 926 |
| Slow walking pace | 107 149 | 72 934 | 44 002 | 30 805 |
| Polycythemia | 671 | 180 436 | 269 | 74 938 |
| Depression | 9482 | 171 625 | 3889 | 71 318 |
| Type 2 diabetes | 19 097 | 162 010 | 7630 | 67 577 |
Summary of binary traits of males of European ancestry for regression analyses in whom CAG and GGC repeat lengths quantified from whole-genome or whole-exome sequence data and other covariates were available. Male-pattern baldness, male infertility, and slow walking pace were based on self-reported questionnaire, and the others were defined by International Classification of Diseases, Tenth Revision code(s).
Abbreviation: BPH, benign prostatic hyperplasia.
First, we focused on WES data given its larger sample size, and interestingly, CAG trinucleotide repeat length was shown to be positively associated with circulating total testosterone level (Fig. 3A). Given that a longer CAG repeat is considered to suppress AR transcription factor activity (1, 2), it has been suggested that a longer CAG repeat could induce androgen resistance and it could in turn elevate circulating testosterone levels via a feedback loop. Moreover, GGC trinucleotide repeat length was also shown to be positively associated with circulating total testosterone level (Fig. 3A), suggesting that a longer GGC repeat length could also cause androgen resistance and subsequent elevated circulating testosterone levels. It was also shown that CAG repeat length was slightly but positively associated with circulating SHBG level (Fig. 3A), although the association was not observed after adjustment for circulating total testosterone level.
Figure 3.
Associations between CAG and GGC trinucleotide repeat lengths in the AR gene and androgen-related traits and diseases (WES). (A) Results of linear regression to estimate associations of a 1 SD change in androgen-related continuous traits per 1 SD in CAG and GGC trinucleotide repeat lengths quantified from WES data, adjusted for age, 10 ancestry PCs, and assessment center (n = 181 107). Lifetime number of sexual partners (shown as just “Sexual partners”) was based on self-reported questionnaire. Fat mass and lean mass assessed by dual-energy X-ray absorptiometry were adjusted for body height, and grip strength was adjusted for body weight. (B) Results of logistic regression to estimate odds ratio in androgen-related diseases per 1 SD in CAG and GGC trinucleotide repeat lengths quantified from WES data, adjusted for age, 10 ancestry PCs, and assessment center (n = 181 107). Male-pattern baldness, male infertility, and slow walking pace were based on self-reported questionnaire, and the others were defined by International Classification of Diseases, Tenth Revision code(s).
Abbreviations: BPH, benign prostatic hyperplasia; eBMD, estimated bone mineral density; Hct, hematocrit; PC, principal components; RBC, red blood cell count; T, testosterone; WES, whole-exome sequence.
We therefore next tested whether the trinucleotide lengths could affect diseases specific to males, and both CAG and GGC repeat lengths were shown to be negatively associated with male-pattern baldness (Fig. 3B), suggesting that suppressed androgen signaling could be protective against male-pattern baldness. Further, GGC repeat length, but not CAG repeat length, was negatively associated with prostate cancer (Fig. 3B). Overall, however, the repeat lengths showed almost no association with other diseases specific to males and a related trait (Fig. 3A and 3B).
We also focused on anthropometric traits, finding that both CAG and GGC repeat lengths were positively, not negatively, associated with eBMD (Fig. 3A). CAG repeat length was negatively associated with body height, and GGC repeat length was positively associated with grip strength, both with marginal significance, but neither CAG nor GGC repeat was associated with fat mass, lean mass, or other muscle-related outcomes (Fig. 3A and 3B). Next, we examined associations with the other outcomes related to erythropoiesis, depression, metabolism, and longevity, but CAG and GGC repeat lengths showed almost no association with them (Fig. 3A and 3B).
Similarly, in males in whom both repeat lengths quantified from WGS data and other covariates for regression analyses were available, we replicated the findings wherein CAG and GGC repeat lengths were positively associated with circulating total testosterone level and eBMD and CAG repeat length was negatively associated with male-pattern baldness. GGC repeat length was shown to be positively associated with grip strength, and it was also found that CAG repeat length was negatively associated with hematocrit with marginal statistical significance (Fig. 4A and 4B).
Figure 4.
Associations between CAG and GGC trinucleotide repeat lengths in the AR gene and androgen-related traits and diseases (WGS). (A) Results of linear regression to estimate associations of a 1 SD change in androgen-related continuous traits per 1 SD in CAG and GGC trinucleotide repeat lengths quantified from WGS data, adjusted for age, 10 ancestry PCs, and assessment center (n = 75 207). Lifetime number of sexual partners (shown as just “Sexual partners”) was based on self-reported questionnaire. Fat mass and lean mass assessed by dual-energy X-ray absorptiometry were adjusted for body height, and grip strength was adjusted for body weight. (B) Results of logistic regression to estimate odds ratio in androgen-related diseases per 1 SD in CAG and GGC trinucleotide repeat lengths quantified from WGS data, adjusted for age, 10 ancestry PCs, and assessment center (n = 75 207). Male-pattern baldness, male infertility, and slow walking pace were based on self-reported questionnaire, and the others were defined by International Classification of Diseases, Tenth Revision code(s).
Abbreviations: BPH, benign prostatic hyperplasia; eBMD, estimated bone mineral density; Hct, hematocrit; PC, principal component; RBC, red blood cell count; T, testosterone; WGS, whole-genome sequence.
Associations Between Circulating Testosterone Level and Outcomes With and Without Adjustment for CAG and GGC Repeat Lengths
We further hypothesized that CAG and GGC trinucleotide repeat lengths could affect associations between circulating total testosterone level and androgen-related traits and diseases. We first focused on 157 420 males in whom both CAG and GGC repeat lengths quantified from WES data, total testosterone level, and other covariates, including SHBG, a protein known to be correlated with circulating total testosterone level (7) and an antagonist of androgen action (4, 5), were available.
Total testosterone level was shown to be associated with most of the androgen-related traits and diseases of interest, including type 2 diabetes, the number of lifetime sexual partners, and longevity, as was previously reported (7, 27, 28), as well as eBMD and male-pattern baldness, which were shown to be associated with the repeat length(s) from both WES and WGS data in our study (Fig. 5A and 5B).
Figure 5.
Associations between circulating total T level and androgen-related traits and diseases (WES). (A) Results of linear regression to estimate associations of a 1 SD change in androgen-related continuous traits per 1 SD in circulating total T level, adjusted for age, 10 ancestry PCs, assessment center, SHBG, and with (Total T [adj. for repeat lengths]) or without (Total T) normalized CAG and GGC trinucleotide repeat lengths quantified from WES data (n = 157 420). Lifetime number of sexual partners (shown as just “Sexual partners”) was based on self-reported questionnaire. Fat mass and lean mass assessed by dual-energy X-ray absorptiometry were adjusted for body height, and grip strength was adjusted for body weight. (B) Results of logistic regression to estimate odds ratio in androgen-related diseases per 1 SD in circulating total T level, adjusted for age, 10 ancestry PCs, assessment center, SHBG, and with (Total T [adj. for repeat lengths]) or without (Total T) normalized CAG and GGC trinucleotide repeat lengths quantified from WES data (n = 157 420). Male-pattern baldness, male infertility, and slow walking pace were based on self-reported questionnaire, and the others were defined by International Classification of Diseases, Tenth Revision code(s).
Abbreviations: BPH, benign prostatic hyperplasia; eBMD, estimated bone mineral density; Hct, hematocrit; PC, principal component; RBC, red blood cell count; T, testosterone; WES, whole-exome sequence.
Then we examined the associations between circulating total testosterone level and the outcomes after adjustment for CAG and GGC repeat lengths, finding that the effect of total testosterone level was unaffected by the presence or absence of adjustment for the repeat lengths (Fig. 5A and 5B). This suggests that taking CAG and GGC repeat lengths into account would not improve the clinical interpretation of the importance of androgen levels on androgen-associated diseases and traits.
Further, associations between CAG and GGC repeat lengths and the outcomes were unaffected by adjustment for circulating total testosterone level and SHBG. Of note, CAG and GGC repeat lengths were positively associated with eBMD and CAG repeat length was negatively associated with male-pattern baldness, even after adjustment for circulating total testosterone level and SHBG.
Similar results were obtained in 65 459 males in whom both CAG and GGC repeat lengths quantified from WGS data, circulating total testosterone level, and other covariates were available (Fig. 6A and 6B).
Figure 6.
Associations between circulating total T level and androgen-related traits and diseases (WGS). (A) Results of linear regression to estimate changes in 1 SD in androgen-related continuous traits per 1 SD in circulating total T level, adjusted for age, 10 ancestry PCs, assessment center, SHBG, and with (Total T [adj. for repeat lengths]) or without (Total T) normalized CAG and GGC trinucleotide repeat lengths quantified from WGS data (n = 65 459). Lifetime number of sexual partners (shown as just “Sexual partners”) was based on self-reported questionnaire. Fat mass and lean mass assessed by dual-energy X-ray absorptiometry were adjusted for body height, and grip strength was adjusted for body weight. (B) Results of logistic regression to estimate odds ratio in androgen-related diseases per 1 SD in circulating total T level, adjusted for age, 10 ancestry PCs, assessment center, SHBG, and with (Total T [adj. for repeat lengths]) or without (Total T) normalized CAG and GGC trinucleotide repeat lengths quantified from WGS data (n = 65 459). Male-pattern baldness, male infertility, and slow walking pace were based on self-reported questionnaire, and the others were defined by International Classification of Diseases, Tenth Revision code(s).
Abbreviations: BPH, benign prostatic hyperplasia; eBMD, estimated bone mineral density; Hct, hematocrit; PC, principal component; RBC, red blood cell count; T, testosterone; WGS, whole-genome sequence.
Discussion
It was unclear whether CAG and GGC trinucleotide repeat lengths in the first exon of the human AR gene are clinically important in the prediction of androgen-related outcomes. Here we show that CAG and GGC repeat lengths are associated with circulating testosterone level and some androgen-related traits and diseases, but they have little influence on the association of circulating testosterone level with androgen-related traits and disease risks.
Importantly, CAG and GGC repeat lengths were positively associated with circulating total testosterone level. It was suggested that genetically determined androgen resistance due to a longer CAG repeat could elevate testosterone secretion and induce hyperandrogenemia, because the LH-mediated feedback loop is kept activated in a compensatory manner. Further, not only a longer CAG repeat but also a longer GGC repeat whose role or clinical impact remained less clear could cause androgen resistance. One of the possible mechanisms could be that the encoded longer polyglycine chain could impair the AR activity by inducing a conformational change of the AR protein, just as the longer polyglutamine chain encoded by a longer CAG repeat does (1, 2). Given that the effects of CAG repeat on some of the outcomes were not consistent with those of GGC repeats, the 2 trinucleotide repeats might affect the AR activity in a different manner.
CAG and GGC repeat lengths did show associations with some of the outcomes, but the effects were generally small. For example, a longer CAG repeat length derived from WES data by 1 SD (3.1 repeats) (Table 1) was associated with a shorter body height (SD, 6.8 cm) with marginal significance by only 0.0062 SD (Fig. 3A), ie, 0.42 mm, which means that, even if the CAG is longer by 15 repeats (eg, extremely long 38 repeats vs the reference 23 repeats), body height is expected to be shorter by only 2.0 mm. In addition, a longer CAG repeat length by 1 SD was associated with a higher eBMD by 0.027 SD (Fig. 3A), whereas a higher SD of total testosterone level was associated with a higher eBMD by 0.10 SD (Fig. 5A), and the effect of the former was much smaller than the latter. These findings emphasize the minimal effect of CAG and GGC repeat lengths on clinical outcomes.
These data suggest that, in most of the androgen-targeted tissues, androgen resistance conditioned on CAG and GGC repeat lengths could be compensated for by increased androgen levels and that circulating testosterone level thus should remain a clinically relevant parameter to predict androgen-related outcomes. However, in hair follicles and possibly in the prostate (at least in the context of tumor development), both hyperandrogenemia and genetically determined androgen resistance matter, and in bone or some specific cell types in bone such as osteoblasts and possibly in skeletal muscle (at least in the context of spontaneous power), hyperandrogenemia as a consequence of genetically determined androgen resistance matters rather than androgen resistance itself.
A similar heterogeneity is observed in patients with type 2 diabetes, in that insulin action is impaired in some tissues such as skeletal muscle and the liver due to insulin resistance, whereas it is exaggerated in others such as vascular endothelial cells due to hyperinsulinemia (29). It remains to be elucidated what brings the observed mixed effects of hyperandrogenemia and androgen resistance in different tissues, and it is possibly due to tissue-dependent expression or activity of 5α reductase to convert testosterone into dihydrotestosterone with higher ligand activity intracellularly or aromatase to convert testosterone into estradiol (3), some membrane-bound receptors that have not been clearly identified (2, 30), androgen receptor coregulators, other transcription factors, or other downstream signaling transducer proteins.
Insulin/IGF-1 signaling and androgen signaling are considered to interact with each other and share some anabolic effects at least in some contexts (2, 29, 31-33). Given that insulin/IGF-1 signaling is well known to be involved in the regulation of aging and longevity (34), and the association between circulating testosterone levels and longevity was shown in this study and previous reports (27, 28), it would be of much interest to see how androgen resistance genetically determined by CAG and GGC repeat lengths could affect longevity in humans by a longer follow-up.
A strength of our study is that we used one of the largest datasets with over 180 000 European-ancestry males in which CAG and GGC trinucleotide repeat lengths can be quantified and circulating total testosterone and SHBG levels were measured. The large sample size and attention not only to repeat lengths but also to circulating testosterone levels may have helped to overcome the limitations of the previous studies in which the clinical impact of AR repeat lengths was not conclusive.
Still, it should be noted that, although free testosterone is considered to be associated with biologically active testosterone (4, 5), circulating total testosterone level measured by chemiluminescent immunoassay in the UK Biobank is not suitable for estimation of circulating free testosterone level, and it was not directly measured in the cohort (23). Next, we mainly used CAG and GGC trinucleotide repeat lengths quantified from WES data, and quantification of very long CAG repeats was difficult, in comparison with results from WGS data. This is probably because each read from the WES data of this cohort is 76 base pairs long and too short to span a long CAG repeat, and quantification from WGS data, with a longer read length of 151 base pairs long, in a larger scale would be helpful. Moreover, CAG repeat length is known to be dependent on ancestries (35), but we focused on participants of European ancestry in this cohort, and the clinical impact of CAG and GGC repeat lengths in other ancestries was not tested. Lastly, although somatic expansion of trinucleotide repeats is often observed in trinucleotide expansion diseases (10), our sequence data come from DNA extracted from circulating blood cells, and the effects of CAG and GGC repeat lengths especially in androgen-targeted tissues in the elderly remain unclear.
Nevertheless, taken together, our data clearly show that CAG and/or GGC trinucleotide repeat lengths are associated with circulating total testosterone level, male-pattern baldness, and eBMD. However, circulating testosterone levels frequently used in clinical practice are sufficient for prediction of most of the androgen-related outcomes, without accounting for trinucleotide repeat length.
Acknowledgments
We thank Prof. Sumito Ogawa (The University of Tokyo) for giving us advice on the molecular biology of androgen signaling.
Abbreviations
- AR
androgen receptor
- eBMD
estimated bone mineral density
- WES
whole-exome sequence
- WGS
whole-genome sequence
Contributor Information
Takayoshi Sasako, McGill University, Montréal, Québec H3T 1E2, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada; Tanaka Diabetes Clinic Omiya, Saitama 330-0846, Japan; Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo 113-0033, Japan.
Yann Ilboudo, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada.
Kevin Y H Liang, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada; Quantitative Life Sciences Program, McGill University, Montréal, Québec H3T 1E2, Canada.
Yiheng Chen, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada; Department of Human Genetics, McGill University, Montréal, Québec H3T 1E2, Canada.
Satoshi Yoshiji, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada; Department of Human Genetics, McGill University, Montréal, Québec H3T 1E2, Canada; Kyoto-McGill International Collaborative Program in Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan; Japan Society for the Promotion of Science, Tokyo 102-0083, Japan.
J Brent Richards, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec H3T 1E2, Canada; Department of Human Genetics, McGill University, Montréal, Québec H3T 1E2, Canada; Five Prime Sciences Inc, Montréal, Québec H3Y 2W4, Canada; 10Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec H3T 1E2, Canada; 11Department of Twin Research, King's College London, London WC2R 2LS, UK.
Funding
This research has been conducted using the UK Biobank resource under application number 27449. The J.B.R. Research Group is supported by the Canadian Institutes of Health Research (365825, 409511, 100558, 169303), the McGill Interdisciplinary Initiative in Infection and Immunity (MI4), the Lady Davis Institute of the Jewish General Hospital, the Jewish General Hospital Foundation, the Canadian Foundation for Innovation, the NIH Foundation, Genome Québec, the Public Health Agency of Canada, McGill University, Cancer Research UK (grant no. C18281/A29019), and the Fonds de Recherche Québec Santé (FRQS). J.B.R. is supported by an FRQS Mérite Clinical Research Scholarship. Support from Calcul Québec and Compute Canada is acknowledged. TwinsUK is funded by the Welcome Trust, Medical Research Council, European Union, the National Institute for Health Research-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. T.S. is supported by Medical Research Encouragement Prize of The Japan Medical Association and Fund for the Promotion of Joint International Research (Fostering Joint International Research; 23KK0301) by the Japan Society for the Promotion of Science (JSPS). S.Y. is supported by the JSPS Overseas Research Fellowship. The aforementioned funding agencies had no role in the design, implementation, or interpretation of this study.
Author Contributions
T.S. designed this study, acquired data, performed analyses, and wrote the manuscript. Y.I. supported data acquisition and analyses. K.Y.H.L. and Y.C. supported data acquisition. S.Y. supported data analyses. J.B.R. designed this study and reviewed and edited the manuscript. All authors contributed to the interpretation of results, critically revised the manuscript, and approved the final version. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. T.S. and J.B.R. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Disclosures
T.S. has received an endowment unrelated to this research from Eli Lilly; personal fees unrelated to this research from Boehringer Ingelheim, Daiichi Sankyo, Eli Lilly, Kowa, Novo Nordisk, Ono, and Sumitomo. J.B.R. is the founder and CEO of Five Prime Sciences, which provides research services for biotech, pharma, and venture capital companies for projects unrelated to this research. J.B.R. has served as an advisor to GlaxoSmithKline and Deerfield Capital. J.B.R.'s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline, and Biogen for projects unrelated to this research. The other authors have nothing to disclose.
Data Availability
The data that support the findings of this study are available from the UK Biobank but restrictions apply to the availability of these data, which were used under license for the present study and therefore are not publicly available. Data are, however, available from the authors on reasonable request and with permission from the UK Biobank Research Committee. Computational scripts used to conduct the present study are available from the corresponding authors on reasonable request.
References
- 1. Tirabassi G, Cignarelli A, Perrini S, et al. Influence of CAG repeat polymorphism on the targets of testosterone action. Int J Endocrinol. 2015;2015:298107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Davey RA, Grossmann M. Androgen receptor structure, function and biology: from bench to bedside. Clin Biochem Rev. 2016;37(1):3‐15. [PMC free article] [PubMed] [Google Scholar]
- 3. Swerdloff RS, Dudley RE, Page ST, Wang C, Salameh WA. Dihydrotestosterone: biochemistry, physiology, and clinical implications of elevated blood levels. Endocr Rev. 2017;38(3):220‐254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Selby C. Sex hormone binding globulin: origin, function and clinical significance. Ann Clin Biochem. 1990;27(Pt 6):532‐541. [DOI] [PubMed] [Google Scholar]
- 5. Simons P, Valkenburg O, Stehouwer CDA, Brouwers M. Sex hormone-binding globulin: biomarker and hepatokine? Trends Endocrinol Metab. 2021;32(8):544‐553. [DOI] [PubMed] [Google Scholar]
- 6. Anawalt BD, Matsumoto AM. Aging and androgens: physiology and clinical implications. Rev Endocr Metab Disord. 2022;23(6):1123‐1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Ruth KS, Day FR, Tyrrell J, et al. Using human genetics to understand the disease impacts of testosterone in men and women. Nat Med. 2020;26(2):252‐258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. National Center for Biotechnology Information . rs3032358 RefSNP Report—dbSNP—NCBI. Accessed April 24, 2024. https://www.ncbi.nlm.nih.gov/snp/rs3032358
- 9. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature. 1991;352(6330):77‐79. [DOI] [PubMed] [Google Scholar]
- 10. Depienne C, Mandel JL. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am J Hum Genet. 2021;108(5):764‐785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. National Center for Biotechnology Information. rs746853821 RefSNP Report—dbSNP—NCBI. Accessed April 24, 2024. https://www.ncbi.nlm.nih.gov/snp/rs746853821
- 12. Lundin KB, Giwercman A, Dizeyi N, Giwercman YL. Functional in vitro characterisation of the androgen receptor GGN polymorphism. Mol Cell Endocrinol. 2007;264(1-2):184‐187. [DOI] [PubMed] [Google Scholar]
- 13. Rocca MS, Ferrarini M, Msaki A, et al. Comparison of NGS panel and Sanger sequencing for genotyping CAG repeats in the AR gene. Mol Genet Genomic Med. 2020;8(6):e1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Mukamel RE, Handsaker RE, Sherman MA, et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science. 2021;373(6562):1499‐1505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Fearnley LG, Bennett MF, Bahlo M. Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep. 2022;12(1):13124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Fitzgerald T, Birney E. CNest: a novel copy number association discovery method uncovers 862 new associations from 200,629 whole-exome sequence datasets in the UK Biobank. Cell Genom. 2022;2(8):100167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Hujoel MLA, Sherman MA, Barton AR, et al. Influences of rare copy-number variation on human complex traits. Cell. 2022;185(22):4233‐4248.e27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Lundstrom OS, Adriaan Verbiest M, Xia F, et al. WebSTR: a population-wide database of short tandem repeat variation in humans. J Mol Biol. 2023;435(20):168260. [DOI] [PubMed] [Google Scholar]
- 19. Dolzhenko E, van Vugt J, Shaw RJ, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27(11):1895‐1903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zanovello M, Ibanez K, Brown AL, et al. Unexpected frequency of the pathogenic AR CAG repeat expansion in the general population. Brain. 2023;146(7):2723‐2729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203‐209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Lu T, Nakanishi T, Yoshiji S, Butler-Laporte G, Greenwood CMT, Richards JB. Dose-dependent association of alcohol consumption with obesity and type 2 diabetes: Mendelian randomization analyses. J Clin Endocrinol Metab. 2023;108(12):3320‐3329. [DOI] [PubMed] [Google Scholar]
- 23. UK Biobank . UK biobank biomarker project companion document to accompany serum biomarker data. Accessed April 24, 2024. https://biobank.ndph.ox.ac.uk/ukb/ukb/docs/serum_biochemistry.pdf
- 24. Zhou S, Sosina OA, Bovijn J, et al. Converging evidence from exome sequencing and common variants implicates target genes for osteoporosis. Nat Genet. 2023;55(8):1277‐1287. [DOI] [PubMed] [Google Scholar]
- 25. Ahn SH, Park SM, Park SY, et al. Osteoporosis and osteoporotic fracture fact sheet in Korea. J Bone Metab. 2020;27(4):281‐290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health. 2019;1(2):e63‐e77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Yeap BB, Marriott RJ, Antonio L, et al. Serum testosterone is inversely and sex hormone-binding globulin is directly associated with all-cause mortality in men. J Clin Endocrinol Metab. 2021;106(2):e625‐e637. [DOI] [PubMed] [Google Scholar]
- 28. Wang J, Fan X, Yang M, et al. Sex-specific associations of circulating testosterone levels with all-cause and cause-specific mortality. Eur J Endocrinol. 2021;184(5):723‐732. [DOI] [PubMed] [Google Scholar]
- 29. Kubota T, Kubota N, Kadowaki T. Imbalanced insulin actions in obesity and type 2 diabetes: key mouse models of insulin signaling pathway. Cell Metab. 2017;25(4):797‐810. [DOI] [PubMed] [Google Scholar]
- 30. Yu J, Akishita M, Eto M, et al. Src kinase-mediates androgen receptor-dependent non-genomic activation of signaling cascade leading to endothelial nitric oxide synthase. Biochem Biophys Res Commun. 2012;424(3):538‐543. [DOI] [PubMed] [Google Scholar]
- 31. Tortorella E, Giantulli S, Sciarra A, Silvestri I. AR and PI3K/AKT in prostate cancer: a tale of two interconnected pathways. Int J Mol Sci. 2023;24(3):2046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Sasako T, Umehara T, Soeda K, et al. Deletion of skeletal muscle Akt1/2 causes osteosarcopenia and reduces lifespan in mice. Nat Commun. 2022;13(1):5655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Hosoi T, Yakabe M, Sasakawa H, et al. Sarcopenia phenotype and impaired muscle function in male mice with fast-twitch muscle-specific knockout of the androgen receptor. Proc Natl Acad Sci U S A. 2023;120(4):e2218032120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Campisi J, Kapahi P, Lithgow GJ, Melov S, Newman JC, Verdin E. From discoveries in ageing research to therapeutics for healthy ageing. Nature. 2019;571(7764):183‐192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Ackerman CM, Lowe LP, Lee H, et al. Ethnic variation in allele distribution of the androgen receptor (AR) (CAG)n repeat. J Androl. 2012;33(2):210‐215. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the UK Biobank but restrictions apply to the availability of these data, which were used under license for the present study and therefore are not publicly available. Data are, however, available from the authors on reasonable request and with permission from the UK Biobank Research Committee. Computational scripts used to conduct the present study are available from the corresponding authors on reasonable request.






