Abstract
Understanding phenotypic sex differences has long been a goal of biology from both a medical and evolutionary perspective. Although much attention has been paid to mean differences in phenotype between the sexes, little is known about sex differences in phenotypic variability. To gain insight into sex differences in interindividual variability at the molecular level, we analyzed RNA-seq data from 43 tissues from the Genotype-Tissue Expression project (GTEx). Within each tissue, we identified genes that show sex differences in gene expression variability. We found that these sex-differentially variable (SDV) genes are associated with various important biological functions, including sex hormone response, immune response, and other signaling pathways. By analyzing single-cell RNA sequencing data collected from breast epithelial cells, we found that genes with sex differences in gene expression variability in breast tissue tend to be expressed in a cell-type-specific manner. We looked for an association between SDV expression and Graves’ disease, a well-known heavily female-biased disease, and found a significant enrichment of Graves’ associated genes among genes with higher variability in females in thyroid tissue. This suggests a possible role for SDV expression in sex-biased disease. We then examined the evolutionary constraints acting on genes with sex differences in variability and found that they exhibit evidence of increased selective constraint. Through analysis of sex-biased eQTL data, we found evidence that SDV expression may have a genetic basis. Finally, we propose a simple evolutionary model for the emergence of SDV expression from sex-specific constraints.
Significance Statement.
Prior research has revealed that genes with sex differences in expression may possess interesting evolutionary and biological properties. Despite that, most studies concerning phenotypic sex differences have focused on sex differences in mean phenotypic values. Although largely overlooked in the study of sex differences, phenotypic variability is a concept central to quantitative, population, and human genetics. In this work, we examine sex differences in gene expression variability across human tissues. We find that genes with sex differences in expression variability are enriched in important biological and disease-related pathways. Additionally, we find that these genes appear to be evolving under increased evolutionary constraints. Our results highlight the importance of examining sex differences in gene expression variability in fundamental biological, disease, and evolutionary contexts.
Introduction
Identifying, characterizing, and understanding phenotypic differences between the sexes has long been a major goal of biology. A very wide variety of traits in humans show significant sex differences, including morphological, behavioral, metabolic, immune, and disease traits (1–4). These trait differences are likely in part due to sex differences in gene expression and regulation (5–7). As a phenotype, gene expression is both relatively straightforward to measure and highly quantitative. As a result, many studies have examined differences in mean gene expression levels between the sexes in many species (8–11, 5, 12, 6). Sex-biased expression (SBE) is thought to arise from sex-specific genetic architectures, which are ultimately established by sex-determination systems initiated during early development (7). In addition to underlying sexually dimorphic phenotypes, genes with SBE have been found to possess many interesting evolutionary and genomic properties (6). For example, SBE has been associated with increased rates of evolution at both the coding and expression levels in Drosophila and other animals (10, 13, 6, 14). In humans, genes with SBE have been associated with reduced selective constraint (9, 5). Additionally, genes with SBE have been shown to be distributed across the genome in a nonrandom fashion with respect to the X chromosome—thereby linking the evolution of sex-biased genes to the unique evolutionary forces experienced by the X chromosome (10, 12, 15).
Relative to sex differences in phenotypic means, sex differences in phenotypic variabilities have been less thoroughly studied. In the biomedical sciences, there has been a long-standing assumption that females must generally be more phenotypically variable due to the menstrual or estrous cycles; however, there has been little empirical evidence to support this assumption (16). In fact, a meta-analysis of microarray studies in mice and humans revealed that males might have slightly higher expression variability (17), and a meta-analysis of various traits in mice indicated that sex differences in phenotypic variability are trait-dependent (18). Despite growing interest in understanding sex differences in phenotypic variability, the biological and evolutionary properties of genes with sex differences in gene expression variability have not, to our knowledge, been explored in humans. Here, we define genes with sex-differentially variable (SDV) expression as genes showing significant differences in interindividual gene expression variability between the sexes. From a medical standpoint, genes with SDV expression may underlie diseases with sex differences in prevalence or outcome. For example, one could imagine a disease partially caused by the aberrant expression of a certain gene (akin to the liability threshold model; (19)). If one sex has more variable expression of the gene in a normal healthy population, this may help explain a higher prevalence of that disease in that sex. Additionally, since more variable expression has been associated with reduced effectiveness of drug targets, it is possible that SDV expression may account for sex differences in drug effectiveness, which are known to exist (20, 21).
Understanding the evolutionary properties of these genes may give us novel insight into the evolution of sex differences. For example, genes with sex differences in expression variability may evolve under sex-specific constraints (e.g., purifying or stabilizing selection) and may be of particular functional importance for a given sex. Here, we examine the biological properties and evolutionary constraints associated with SDV expression across 43 human tissues.
Results
Identification and characterization of genes with sex differences in gene expression variability
We obtained RNA-seq count data for 43 tissues from the Genotype-Tissue Expression (GTEx) project (version 8). Since we sought to identify genes with sex differences in gene expression variability, we first compared several statistical methods using power simulations. Specifically, we compared three tests: the Fligner–Killeen test, Levene’s test, and a test based on negative binomial generalized additive models for location, scale, and shape (GAMLSS) (22). Since RNA-seq count data is known to follow an approximately negative binomial distribution (23, 24), we performed power simulations by drawing random samples from negative binomial distributions with identical means but different variances by varying the overdispersion term (see “Methods”). At each parameter combination, 1,000 pairs of samples were drawn, and tests were performed at an alpha of 0.05. We found that GAMLSS had the highest power across a range of parameters, and this was especially true when testing for differences in variance between samples of unequal size (Figure S1A–D). GAMLSS and Levene’s test maintained an adequate false positive rate (FPR) across a range of parameters. Beyond increased power, GAMLSS offers other advantages over the other two tests: GAMLSS negative binomial models allow for the independent parametrization of both the mean and the non-Poisson variation (overdispersion) terms as a function of variables and covariates of interest, including library size (25, 22).
When considering the intersample variation of gene expression, it is natural to consider the coefficient of variation: the standard deviation divided by the mean. This provides a mean-normalized statistic for variability. Notably, it is thought that the Poisson component of the coefficient of variation in RNA-seq data is due to technical variation, while the overdispersion component of the coefficient of variation contains the true biological variation (26). The overdispersion parameter fit by GAMLSS can, therefore, be considered the square of the biological coefficient of variation (25, 26). Observed overdispersion values and effect sizes in GTEx data were similar to the simulated values, as determined by models fit using GAMLSS (Figure S1E).
To determine the effect of sex on gene expression variation, we fit three negative binomial generalized additive models using the R library “gamlss” for each expressed gene in each tissue (22). To reduce the risk of confounding due to unequal sampling across various populations, we limited our analysis to European–American individuals. Significance of the effect of sex on overdispersion and mean expression level was determined through a pair of likelihood-ratio tests (see “Methods”). All models included covariates for age, RNA integrity number (RIN), ischemic time, and the first five genotype principal components (to control for population structure) for both mean and overdispersion terms. We initially considered genes to be SDV if the effect of sex on overdispersion was significant at FDR < 0.05. Since we observed that outliers dramatically influenced estimates of variation in some genes, as a secondary test, we performed randomization tests by randomly permuting the sex labels on genes with initially significant coefficients for the effect of sex on overdispersion and refitting the GAMLSS models (see “Methods;” Fig. 1A). Coefficients for the effect of sex on overdispersion were saved after each permutation. We used 1,000 rounds of permutation to generate empirical null distributions for the effect of sex on overdispersion. Initially significant genes with FDR-adjusted empirical P-values greater than 0.05 in the permutation test were no longer considered to be SDV (see “Methods”).
Across 43 tissues, we identified 3,895 unique genes with SDV expression in at least one tissue out of a total of 22,054 tested genes (Fig. 1A and B). We were able to identify SDV genes in 42 out of the 43 tested tissues (Fig. 1B). Examples of two genes with SDV expression in the pituitary gland and breast tissue are shown in Fig. 1(C). A total of 2,792 unique genes showed higher variability in males in at least one tissue, while 1,382 genes showed higher variability in females in at least one tissue. This imbalance between the sexes was largely driven by breast tissue. Surprisingly, 279 genes had higher variability in males in one tissue and higher variability in females in at least one other tissue. The vast majority (3,347) of SDV genes only showed SDV expression in one tissue, while 548 genes showed SDV expression in multiple tissues.
Breast tissue had the largest number of SDV genes, with 2,049 genes showing significantly higher variation in males and 125 genes showing significantly higher variation in females (Fig. 1B). Apart from a few tissues, SDV genes were not consistently expressed at higher or lower levels than non-SDV genes across tissues. This was true when examining both counts per million (CPM) and transcripts per million (TPM), suggesting that our discovery of SDV genes was not significantly biased toward genes of high or low expression (Figure S2).
Given that differences in variation are generally difficult to detect between groups, we performed a replication analysis using data from breast tissue to help validate our statistical approach. First, we randomly split individuals into discovery and replication sets—70% of males and females were placed in the discovery set, while the rest were placed in the replication set. We identified genes with sex differences in overdispersion in the discovery set using the combined likelihood-ratio test and permutation approach described above (both at FDR < 0.05). These genes were then tested in the replication set: genes with P-values for the overdispersion term less than 0.05 in both the likelihood-ratio test and permutation test were considered “replicated” if the sex with higher variability in the discovery set was also more variable in the replication set. We observed a replication rate of 21.9% (Figure S3). This replication rate is limited by reduced power due to the relatively small size of the replication set. To determine the significance of this replication rate we generated an empirical null distribution under the null hypothesis that our statistical approach was simply randomly sampling genes. Therefore, we randomly sampled genes from the discovery set and calculated the replication rate for each sample using the replication set. The median replication rate under the null hypothesis was found to be 8.4%, and out of 10,000 random samples none had a replication rate higher than 11.3% (Figure S3). This null replication rate should not be considered a type 1 error rate. Since we are randomly sampling genes from the set of all expressed genes, we are also capturing some true SDV genes in each sample. One would expect the replication rate under random sampling to be higher than 5%, even if the FDR is properly controlled at 5%.
Since our GAMLSS framework allowed us to simultaneously test for sex differences in both mean and overdispersion, we examined the relationship between SDV genes and genes with SBE. In several tissues, a majority of SDV genes also displayed SBE (Figure S4). Although a minority of SDV genes in breast tissue, 466 genes with significant sex differences in both mean and overdispersion showed discordance, with higher overdispersion but lower mean expression in one sex versus the other (Figure S4). Specifically, 387 of those genes simultaneously had significantly higher overdispersion in males and significantly higher mean expression in females.
Given that genes with SBE had been shown to be overrepresented on the X chromosome in a variety of tissues and species (10, 12, 15), we wondered if SDV genes were similarly enriched on the X chromosome. However, we found no consistent enrichment of SDV genes on the X chromosome; the enrichment patterns were highly tissue-dependent (Figure S5).
To identify biological processes associated with SDV gene expression, we performed an overrepresentation analysis using gene sets obtained from the hallmark gene set collection of MSigDB (27, 28). The hallmark gene set collection consists of nonredundant and refined gene sets associated with various biological processes and states (27). The hallmark gene set collection is ideal for our analysis since we identified few SDV genes in most tissues—a larger, more redundant, and less refined collection would introduce more noise while increasing the multiple testing burden for our overrepresentation analysis. We first performed an overrepresentation analysis at the tissue level and aggregated results across tissues using an empirical approach based on Fisher’s combined probability method (Fig. 2A and B; see “Methods”). Following the meta-analysis, we found that genes with higher variation in males were overrepresented in sets related to estrogen and androgen response, immune response, and cancer-associated pathways (Fig. 2A). Genes with increased variation in females were similarly overrepresented in sets related to immune response, estrogen response, and cancer-associated pathways (Fig. 2A). Notably, there was overlap in multiple sets between the two sexes even though the lists of SDV genes did not show much overlap. At the single-tissue level, genes with higher variation in males in breast tissue were overrepresented in sets related to estrogen response and immune response (Fig. 2B). This suggests that genes of particular biological importance to one sex (genes related to estrogen response in female breast tissue) may show higher variation in the opposite sex. To identify transcription factors whose targets are overrepresented in the set of SDV genes we downloaded the GTRD dataset from MSigDB (27, 28). The GTRD collection contains sets of transcription factors and their predicted binding targets (within the promoter region) (29). We first performed an overrepresentation analysis at the individual tissue level, followed by a meta-analysis across tissues using the same modified Fisher’s combined probability approach mentioned above (see “Methods”). Following the meta-analysis, we found that genes with higher variability in females were significantly enriched for targets of MAML1 and GLI1 (Fig. 2C). Genes with higher variability in males were significantly enriched for targets of KMT2D. GLI1 is a transcription factor that is part of the Hedgehog signaling pathway, which is involved in differentiation during development. However, in fully differentiated tissues, GLI1 can act as an oncogene and has been implicated in a wide variety of cancers, including glioblastoma, breast cancer, and colon cancer (30–32). MAML1 is a transcription factor involved in Notch signaling and is a transcriptional coactivator of Notch receptors (33). Additionally, MAML1 has been shown to interact cooperatively with GLI1 to regulate Hedgehog signaling (34). KMT2D is a broadly expressed histone methyltransferase that is believed to function as a tumor suppressor and is frequently mutated in a wide variety of cancers (35).
Single-cell RNA-seq data from breast epithelial cells show that genes with SDV expression tend to be expressed in a cell type-specific manner
To get further biological insight into genes with SDV expression in breast tissue, which had the most SDV genes identified out of any tissue by far, we analyzed a previously published single-cell RNA-seq dataset (36). With count data from individual 4 (female), we applied a standard single-cell RNA-seq pipeline using the “R” library “Seurat” (37) (see “Methods”). Our analysis recovered the three major cell types found in the breast epithelium: type 1 and type 2 luminal cells, along with basal cells (36) (Figure S6). To determine the cell-type specificity of each gene, we combined raw count data from each cell to generate a pseudo-bulk count matrix for each cell type and calculated counts per million (CPM) values for each gene. We then applied a standard measure of tissue specificity, , to the log-transformed CPM values across the three cell types (see “Methods”).
We found that genes exhibiting SDV expression were expressed in a significantly more cell-type-specific manner than non-SDV genes (P = 1.9e-41, Mann–Whitney U test; Fig. 2D). If there exist significant sex differences in interindividual cell-type composition variation, such cell-type-specific expression could lead to sex differences in expression variability in bulk tissue samples. Overall, these results suggest that SDV expression may be partly due to sex differences in interindividual cell-type composition variation.
SDV genes in the thyroid with higher variability in females are associated with Graves’ disease
Given the prevalence of sex-biased diseases (38), we wondered if genes with SDV expression could be associated with a classically sex-biased disease. Hence, we decided to examine Graves’ disease: a common autoimmune disease of the thyroid, which results in hyperthyroidism (39). The incidence of Graves’ disease is almost 10-fold higher in females than in males (39). We obtained GWAS summary data from the NHGRI-EBI GWAS catalog compiled from five studies on the genetics of Graves’ disease in various populations (40–45). Of the 36 genes implicated in Graves’ disease which were expressed in thyroid tissue, six displayed SDV expression with higher variation in females. This corresponds to a significant 9-fold enrichment of Graves’ associated genes among SDV genes with higher variability in females (P = 3.5e-5, hypergeometric test; Fig. 2E). Although many of the genes implicated in Graves’ disease are expressed in immune cells, a characteristic of Graves’ disease is the infiltration of immune cells into the thyroid (46, 39, 47). Overall, this analysis suggests a role for SDV expression in sex-biased diseases.
Genes with SDV expression show evidence of increased selective constraint acting on gene expression and sequence
Given that genes with SBE have been shown to be associated with relaxed selective constraint at both the sequence and expression levels, we asked whether SDV genes would similarly be associated with reduced constraint (5, 9). Stabilizing selection acting on gene expression is thought to reduce cis-regulatory variation and reduce the minor allele frequency (MAF) of cis-regulatory variants (48, 49). Therefore, if SDV expression is associated with reduced selective constraint, one would expect an enrichment of cis-regulatory variation in SDV genes. To determine whether SDV genes were enriched or depleted for cis-regulatory variation, we obtained expression quantitative trait loci (eQTL) summary data for each tissue from GTEx (v8). For each tissue with at least 20 autosomal SDV genes, we tested for enrichment of genes with significant eQTLs (eGenes, q-value < 0.1) within that tissue’s set of SDV genes. Only genes that were tested in both studies in a given tissue were considered. Contrary to expectation, we observed a depletion of eGenes within the set of SDV genes in most tissues, and this depletion was significant in breast tissue (Fig. 3A). We also tested whether the median fold enrichment of eGenes across the 20 tested tissues (median fold enrichment = 0.93) was lower than expected by chance. To this end, we randomly permuted the SDV labels in each tissue (genes were either SDV or not SDV in each tissue) and calculated a median fold enrichment across all tissues. In total, 10,000 permutations were performed and for each round of permutation, the median fold enrichment of eGenes in the set of SDV genes was recorded. We found that the observed median fold enrichment of eGenes among SDV genes was significantly lower than expected by chance suggesting that expression levels of SDV genes are evolving under elevated constraint (P = 1.3e-03, permutation test; Fig. 3B).
We examined if the observed depletion of cis-regulatory variation in SDV genes in breast tissue could be explained by their elevated cell-type specificity. For example, one could imagine that most expression variation (across bulk samples) in cell-type specific genes is driven by sample-to-sample differences in cell-type composition and not by cis-regulatory genetic factors. Therefore, one might expect to observe a depletion of eQTLs in cell-type-specific genes. Upon examination, we found no significant relationship between cell-type specificity and the presence or absence of a significant eQTL (P = 0.99, Mann–Whitney U test; Figure S7).
Although our analysis of eQTL data provided some evidence that expression levels of SDV genes are evolving under increased constraint, a depletion of cis-regulatory variation might also be expected if the gene expression variation observed in SDV genes was due to environmental factors and/or trans-regulatory genetic variation to a larger extent than in non-SDV genes. We, therefore, sought to quantify selective constraint using a complementary—and more direct—approach. To this end, we used 20-species phastCons scores to look for further evidence that the cis-regulatory regions of SDV genes were evolutionary constrained (50). In particular, we examined the sequence conservation in the proximal promoter regions of genes—100 bp upstream from the transcription start sites (TSSs). The phastCons score of a nucleotide can be interpreted as the probability that the nucleotide resides in a conserved element (50). We found that genes with SDV expression in at least one tissue showed significantly elevated phastCons scores compared to genes that did not exhibit SDV expression in any tissues (P = 4.1e-05, Mann–Whitney U test; Fig. 3C). Only genes expressed in all tissues with at least 20 SDV genes were considered to remove bias, i.e., a gene is more likely to be SDV if it is expressed in more tissues, and genes expressed in many tissues tend to be more evolutionarily constrained (51).
Similarly, we also examined the LINSIGHT scores of SDV genes. LINSIGHT scores combine information from intraspecific genetic polymorphism, interspecific genetic divergence, and functional genomic data to provide a probability that a mutation at a given base pair will have deleterious fitness consequences (52). We found that genes with SDV expression in at least one tissue showed significantly higher mean LINSIGHT scores within their proximal promoter regions than genes without SDV expression (P = 1.8e-02, Mann–Whitney U test).
Finally, we asked whether the increased constraint acting on the expression levels of SDV genes would be reflected in other metrics of dosage sensitivity. To do this, we compared LOEUF scores in SDV genes and non-SDV genes. LOEUF (loss-of-function observed/expected upper bound fraction) is a score that reflects the tolerance of a protein to loss-of-function mutations (51). The scores range from 0 to 2, with higher scores indicating higher tolerance to loss-of-function mutations and therefore reduced selective constraint. We only considered genes expressed in all tissues with at least 20 SDV genes and found that genes with SDV in at least one tissue had significantly reduced LOEUF scores compared to genes without SDV expression in any tissue (P = 4.2e-10, Mann–Whitney U test; Fig. 3E), corroborating that SDV genes tend to be more constrained at the expression level.
SDV genes in breast tissue show evidence of sex-specific genetic regulation
To determine if SDV genes showed evidence of sex-specific genetic regulation, we downloaded sex-biased eQTL (sbQTL) data from GTEx (12). We restricted our analysis to breast tissue, given that it was found to have both the largest number of SDV genes (in our study) and by far the largest number of sbQTLs of any tissue (12). To test if genes with sbQTLs were enriched for SDV genes, we performed a gene set enrichment analysis (GSEA). By only considering genes tested in both analyses and ranking genes according to their corrected -log10(P-values) in the sbQTL analysis, we found that SDV genes were significantly enriched at lower P-values (P = 1.2e-2, permutation test; Figure S8). Additionally, we found that all 13 SDV genes with higher variation in males and significant sbQTLs (FDR < 0.25, the threshold used in (12)) had eQTLs with larger effect sizes in males than females. This represents a significant enrichment of genes with larger eQTL effect sizes in males relative to the set of all genes with significant sbQTLs (P = 7.4e-4, hypergeometric test). These results indicate that SDV genes are enriched for sex-specific genetic architectures regulating gene expression and suggest that SDV expression may have an underlying genetic component.
Discussion
In this work, we examined a largely unappreciated aspect of transcriptomic sex differences: genes with sex differences in interindividual expression variation. In particular, we focused on genes that showed sex differences in interindividual expression overdispersion. We termed these genes SDV genes. Across 43 tested tissues, we found that roughly 18% of expressed genes showed SDV patterns of expression in at least one tissue (Fig. 1B). Our results were undoubtedly affected by the unequal number of samples in different tissues—we had more power to detect SDV genes in some tissues than others. Nevertheless, we identified more SDV genes in breast tissue than in any other tissue. Issues of sample size aside, from a physiological perspective, this seems logical since the breast is the only tissue tested known to have functional significance in only one sex: the production of milk is essentially female-specific. Other studies have found evidence that low interindividual transcriptional variation in a gene is indicative of increased constraint acting on the gene (53). Since the vast majority (94%) of SDV genes identified in breast tissue showed higher expression variability in males than in females, this is consistent with the idea that expression levels should be more functionally constrained in female breast tissue than in male breast tissue.
To gain biological insight into the nature of SDV genes, we performed a gene set overrepresentation analysis followed by meta-analysis. We found that SDV genes were enriched for genes from sets related to sex hormone response, immune response, and various signaling and cancer-associated pathways (Fig. 2A and B). Additionally, we found that SDV genes were enriched for the targets of three transcription factors implicated in cancer (Fig. 2C). The overrepresentation of SDV genes in gene sets related to various biological pathways and transcription factors indicates that SDV expression may be caused by SDV activity at the pathway level. In breast tissue, the overrepresentation of genes with high variability in males in sets related to estrogen response suggests that genes of particular biological importance to one sex may show higher variation in the opposite sex.
In order to further examine the biological significance of SDV genes in breast tissue, we analyzed previously published single-cell RNA sequencing data of breast epithelial tissue. We found that SDV genes (identified in the GTEx data) were expressed in a more cell-type-specific manner (Fig. 2D). These results indicate that SDV genes may be more likely to be involved in cell-type-specific functions. Additionally, this suggests that sex differences in gene expression variability may be partially driven by sex differences in cellular composition variability across individuals.
To examine a possible link between SDV expression and disease, we analyzed summary GWAS data from a classically female-biased disease: Graves’ disease. We found a significant enrichment of Graves’ associated genes in the set of SDV genes with higher variability in females in healthy thyroid tissue (Fig. 2E). In the future, the role of SDV expression in sex-biased diseases should be thoroughly investigated by examining expression in both healthy and diseased tissue.
We found that SDV genes tended to be depleted for cis-regulatory variation overall (Fig. 3A and B). Additionally, we showed that SDV genes show significantly increased constraint in both proximal promoter and protein-coding sequences. These results are consistent with increased constraint acting on the expression levels of SDV genes and suggest that SDV genes may be evolving under increased sex-specific constraint acting on gene expression.
We propose that SDV expression can arise through sex-specific increases in stabilizing selection acting on gene expression. To test this, we performed a forward Monte Carlo simulation of the evolution of a gene evolving under sex-specific stabilizing selection (Fig. 4). Our model assumes that the expression depends on the additive contributions of a shared genetic locus (this locus contributes equally to both sexes), two sex-specific loci (each locus only contributes to expression in one sex), and noise (to model environmental effects; see “Methods”). Therefore, an assumption of the model is some degree of sex-specific genetic architecture for SDV genes, which is supported by empirical evidence (Figure S8). The three loci were in linkage equilibrium. Increased stabilizing selection in one sex results in a difference in interindividual phenotypic variances between the two sexes—with the sex experiencing increased selection having relatively reduced variation (Fig. 4A and B). Additionally, increased selection in one sex results in reduced genetic variation in the set of shared genetic loci—consistent with our observations of reduced cis-regulatory variation in SDV genes (Fig. 4C). These results are qualitatively consistent with predictions from quantitative genetics theory. Within a single generation, stronger stabilizing selection in one sex—for example, females—would lead to reduced additive genetic variance in females compared to males. Assuming that phenotypic variance is the sum of additive genetic and environmental variances, and that environmental variances are the same in both sexes, it implies that phenotypic variance will be lower in females than in males. Additionally, increased stabilizing selection in females will decrease the additive genetic covariance between the sexes to a larger extent than identical stabilizing selection in both sexes. This corresponds to our observation of a reduction in the variance of the shared genetic locus under conditions of increased selection in females (Fig. 4C). A more detailed description of the theoretical predictions is presented in the Appendix.
Although sex-specific selection resulting in decreased expression variance in the sex with increased selection is one possible explanation for SDV expression, it is not the only possibility. Alternatively, SDV expression, in particular genes with higher variability in females, could be a result of hormonal fluctuations as part of the menstrual cycle. The overrepresentation of estrogen response genes among female SDV genes supports this as a possible explanation (Fig. 2A). These genes may also evolve under increased selective constraint in females. Both explanations for SDV expression suggest sex-specific or sex-biased functions for SDV genes, which could help explain the observation that a majority of SDV genes also exhibit SBE expression in a few tissues (Figure S4).
Overall, our results suggest that genes with sex differences in interindividual gene expression variability are possibly involved in sex-specific biological functions and may be evolving under sex-specific constraints acting on gene expression. These results highlight the possible importance of SDV genes in a functional and evolutionary sense, but future work needs to be done to properly understand the role of SDV genes in human health.
Methods
Power simulations
Random samples were generated from negative binomial distributions using the “R” function “rnegbin” from the library “MASS.” To simulate the effect of “sex” on the overdispersion parameter, samples were drawn from two negative binomial distributions with equal mean () parameters and different overdispersion (, or equivalently in the “MASS” documentation) parameters. Similarly, to determine FPRs, samples were drawn from identical distributions. To restrict the search space, was fixed at 1,500 for all simulations, and the size of both samples combined was fixed at 400. These parameters were typical of the gene-expression levels and sample sizes found in GTEx data. Moreover, varying the mean parameter only marginally affected power calculations. For the unbalanced group sizes simulations, the size of one group was 320 while the size of the other was 80. Both the baseline overdispersion parameter () and the difference in overdispersion between groups () were varied over the course of the simulation experiments. The variances of the two samples were compared using the Fligner–Killeen test (“fk.test” function from the “R” library “npsm”), Levene’s test (“levene.test” function from the “R” library “lawstat”), and a likelihood-ratio test implemented using negative binomial GAMLSS models. The more complex GAMLSS model parametrized the mean and the overdispersion parameters as functions of sex, and , while the simplified model parametrized only the mean in terms of sex with the overdispersion being modeled as constant. The P-value for the significance of the sex overdispersion term was saved. At each parameter combination 1,000 sample pairs were drawn and P-values from each of the three tests were saved. Power was calculated as the fraction of 1,000 simulations where the P-value fell below an alpha of 0.05. FPR was calculated as the fraction of 1,000 simulations where the P-value fell below and alpha of 0.05 when the two underlying distributions being sampled for a sample pair were identical. It was observed that power for Levene’s test markedly improved when log-transforming the sampled count data, while the Fligner–Killeen test showed largely unaltered power: hence samples were transformed as ln(counts + 1) before testing with Levene’s test and the Fligner–Killeen test. Raw samples were used for the GAMLSS likelihood-ratio test.
Identification of SDV genes
For each tissue, SDV genes were identified separately using RNA-seq count data downloaded from GTEx-v8 (dbGAP accession number # phs000424.v8.p2). To reduce confounding due to population structure, only European Americans were considered. First, CPM values were calculated using edgeR and genes with no expression (CPM = 0) in at least one individual were removed. To identify individuals that were global expression outliers, dimensionality reduction was then performed on log2 transformed CPM values using principal component analysis (PCA) with the “R” function “prcomp.” Since outliers can greatly affect estimates of variation, we removed individuals whose global expression profiles deviated significantly from other individuals in a given tissue. Individuals whose global expression patterns were more than 1.5 interquartile ranges higher than the third quartile or lower than the first quartile along any principal component explaining more than 5% of the total variance in the dataset were considered outliers and removed from further analysis.
Following global outlier removal three negative binomial models were fit to the count data (one gene at a time) using GAMLSS. The first and most complex model (model 1) parametrized the mean and non-Poisson variation terms of the negative binomial distribution (and , respectively) as and , where represents RNA integrity number and the five PCs represent genotype principal components obtained from GTEx. The offset term, to account for library size was calculated as , where the normalization factors were calculated using the “calcNormFactors” function from edgeR using the “upperquartile” method. Model 2 had the same form as model 1 for, however, for the formula for the non-Poisson variation was . Model 3 had the same formula as model 1 for , however, the formula for the mean was . To test for the significance of the effect of sex on non-Poisson variation () a likelihood ratio test was performed comparing model 1 and model 2. To test for the significance of the effect of sex on mean expression () a likelihood ratio test was performed comparing model 1 and model 3. The coefficients corresponding to the sex terms for both and ( and ) from model 1 were recorded for each gene. After every gene was tested in a given tissue P-values for and were corrected independently using the Benjamini–Hochberg (BH) procedure using the “R” function “p.adjust.”
Unsurprisingly, we noticed that individual expression outliers at the gene level dramatically affected our estimates of and . To account for this and to generally validate the robustness of our models we performed randomization tests for all genes with a BH corrected < 0.05. For each gene, the sex labels were randomly permuted, model 1 was fit to the randomized data, and values were recorded. We performed 1,000 permutations for every gene to generate an empirical null distribution for . An empirical P-value was calculated as the fraction of random permutations where > . These empirical P-values were then corrected using the BH procedure. In the end genes were considered SDV if the original BH corrected P-values () were less than 0.05 and the BH corrected empirical P-values were less than 0.05.
Hallmark overrepresentation analysis
Gene set annotations for collection “H” (hallmark gene sets) were downloaded from MSigDB using the “R” function “msigdbr” from the package “msigdbr.” Only tissues with at least 20 male or female high variability genes were considered. Enrichment of gene sets for male high variability and female high variability genes was calculated separately. Enrichment P-values for each gene set in each tissue were calculated using a hypergeometric test (using the “R” function “phyper”). For each gene set a meta-analysis was performed across all tested tissues using a variation of Fisher’s method. This approach was implemented as follows. For each gene set a test statistic was calculated by taking −2 times the sum of the log P-values in each tested tissue. In an ordinary implementation of Fisher’s method this test statistic would have a chi-squared distribution under the null hypothesis. However, given that the underlying P-values are based on a discrete test (the hypergeometric test) a core assumption of Fisher’s method is violated: namely that for an individual tissue P-values should be uniformly distributed under the null hypothesis. To overcome this, we generated empirical null distributions for the test statistic described above using a permutation approach: within each tissue we permuted the labels of each gene (not SDV, female high variability, and male high variability) and calculated a P-value for the label of interest (female high variability or male high variability) using a hypergeometric test. This permutation procedure was done for every tissue tested—finally resulting in a null value for the sum of log P-values test statistic described above. For each gene set 10,000 such permutations were performed to generate an empirical null distribution for the test statistic. The empirical P-values for each gene set were calculated by finding the fraction of permutations resulting in a test statistic larger than the one observed. These combined P-values across gene sets were corrected using the BH procedure and gene sets with corrected P-values < 0.05 were considered significant. Gene sets that were highly tissue/developmentally specific were not tested in the meta-analysis. The seven removed hallmark gene sets were pancreas beta cells, adipogenesis, myogenesis, spermatogenesis, bile acid metabolism, heme metabolism, and xenobiotic metabolism.
Transcription factor overrepresentation analysis
The analysis was performed in a fashion identical to the hallmark enrichment analysis except that the gene sets from collection “C3” subcollection “GTRD” were used.
Single-cell RNA sequencing analysis
The count matrix for individual 4 was downloaded from GSE113197 (36). All clustering and marker identification analysis was performed using the “R” library “Seurat” version 3.2.3. Cells which had fewer than 500 and more than 6,000 expressed features were removed. Cells with more than 10% of their transcripts coming from mitochondrial genes were removed. The counts were normalized using the function “SCTransform.” Both dimensionality reduction (using the function “RunUMAP”) and clustering (function: “FindClusters”) were performed using the first 10 principal components of the normalized counts. L1, L2, and basal cells were identified based on their expression of markers “KRT18,” “SPLI,” and “KRT14.” Clusters expressing “KRT18” and “SPLI” were deemed to be L1 cells, clusters only expressing “KRT18” were deemed to be L2 cells, and clusters expressing “KRT14” were taken to be basal cells.
To calculate the cell-type specificity of each gene raw counts were aggregated within each of the three cell types, generated three pseudo-bulk count vectors (one for each cell type). For each cell-type, CPMs were calculated by multiplying the pseudo-bulk count vector by 1 million and dividing by the sum of the vector. The CPM values were then log2 transformed. Cell-type specificity was calculated using the standard metric for tissue specificity , where is the vector representing a given gene’s expression across the three cell-types divided by the maximum expression of that gene in any of the three cell-types.
Graves’ disease analysis
Summary GWAS statistics were obtained from the NHGRI-EBI GWAS catalog for study EFO_0004237, downloaded on 08/04/2022 (40).
eQTL analysis
Summary statistics for single-tissue eQTL mapping performed in European–American individuals were downloaded from the GTEx-v8 website. Genes with at least one eQTL with q-value < 0.1 were considered to be eGenes. A two-sided Fisher’s exact test, as implemented in the “fisher.test” function, was used to test for the enrichment/depletion of eGenes among SDV genes in each tissue. Only tissues with at least 20 autosomal SDV genes were considered. Only autosomal genes with expression high enough to be tested in both the original eQTL study and our study were considered. To test if the median fold enrichment of eGenes across tissues was significant we randomly permuted the SDV labels in each tissue and calculated a median fold enrichment across all tissues. In total 10,000 permutations were performed and for each round of permutation the median fold enrichment of eGenes across the 20 tissues was recorded. The empirical P-value was calculated as the fraction of all permutations where the fold enrichment was at least as low as the observed fold enrichment.
PhastCons analysis
Genome-wide phastCons scores (in hg38 coordinates) generated from a 20 species alignment were downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phastCons20way/. TSS for all annotated genes were downloaded from BioMart using the “R” package biomaRt (54, 55). The most distal TSS was used for each gene. Only genes expressed in all tissues containing at least 20 SDV genes were tested (20 tissues overall). A gene was considered “SDV” if it displayed “SDV” expression in at least 1 of the 20 tested tissues, otherwise it was considered to be “not SDV.” Requiring genes to be expressed in all 20 tissues was done to remove bias, i.e., a gene is more likely to be SDV if it is expressed in more tissues and genes expressed in many tissues tend to be more evolutionarily constrained (51). Further increasing the number of tissues (by decreasing the minimum number of SDV genes allowed in each tissue) reduces power by decreasing the intersection of genes expressed across all tissues. Mean phastCons scores were obtained for each gene in a region 100 bp upstream from their TSSs.
LINSIGHT analysis
Data was downloaded from the Siepel Laboratory website: https://siepellab.labsites.cshl.edu/software/. The data was processed and analyzed in the same manner as the phastCons data following conversion to hg38 coordinates.
Loss of function analysis
Loss of function metrics were downloaded from https://gnomad.broadinstitute.org/downloads. In particular the v2.1.1 loss of function metrics by gene were used. The analysis was similar to the analysis for PhastCons score: only genes expressed across all tissues with at least 20 SDV genes were considered. LOEUF scores were compared between “SDV” and “not SDV” genes.
Sex-specific stabilizing selection simulation
The simulation assumed a diploid, diecious, and randomly mating population of 500 individuals. In each individual, the phenotype (expression) was modeled as the additive contribution of three genetic loci and environmental noise. In total, one of the genetic loci had equal effect in both sexes while each of the two other genetic loci only had effect in one sex or the other. The allelic effects of each genetic locus in each individual were initialized as draws from a standard normal distribution (mean = 0 and standard deviation = 1). The environmental component was also taken to be a random number drawn from a standard normal distribution. Similarly, mutation resulted in the addition of a number drawn from a standard normal distribution to the allelic effect and the mutation rate was set to 5e-3 per generation at each genetic locus in each individual. The fitness of each individual was taken to be , where “s” represented the strength of selection, “G” was the total sum of genetic allelic effects (from the shared and sex-specific locus), and “E” was the environmental effect. For the simulations with equal selection in both sexes “s” was 0.005, while in simulation with sex-specific selection “s” was 0.005 in males and 0.05 in females. Selection only affected viability and had no impact on fecundity. An individual survived to reproduce if a random draw from a uniform distribution on [0, 1] was less than or equal to the individual’s fitness; therefore, the fitness represents the probability of an individual surviving to reproduce. In each generation 100 males and 100 females were chosen and paired at random with replacement. Each pair produced five offspring and the sex of each offspring was determined randomly. The simulations were run for a total of 2,000 generations. Every 10 generations, the variance of the phenotype (across individuals in the population) was recorded along with the variance of the allelic effect of the shared genetic component. For both the equal selection and sex-specific selection models 30 independent simulations were run.
GSEA analysis of sbQTL data
Sex-biased QTL (sbQTL) data from breast tissue was downloaded from GTEx v8. Genes were only considered if they were tested in our analysis and in the sbQTL analysis. Genes were ranked according to corrected -log10(P-values) where the (Bonferroni) correction was made to account for multiple SNPs tested per gene. GSEA analysis was performed using the “R” function “fgsea” from the library “fgsea” (56).
Supplementary Material
ACKNOWLEDGEMENTS
We thank Zhao lab members for discussions about the project, Jason Mezey and Christina Leslie for helpful discussions during early stages of the project, Mike Love for statistical advice, and Andy Clark for the suggestion of power analysis. We thank Matt Hahn and Ziyue Gao for their comments and suggestions for a recent version of the manuscript. We thank Jason Banfelder and Bala Jayaraman from The Rockefeller University High Performance Computing (HPC) Center for their dedicated work in these extraordinary times. This manuscript was previously posted to bioRxiv: doi: https://doi.org/10.1101/2022.08.09.503366.
Appendix A Theory behind sex-specific stabilizing selection
The evolution of a single quantitative trait in males and females can be predicted from the 2-by-2 additive genetic variance–covariance matrix
(A1) |
where the diagonal terms represent the additive genetic variances in males and females, while the identical off-diagonal terms represent the additive genetic covariance between the sexes (57, 58). In the absence of directional selection, the change in the genetic variance–covariance matrix due to stabilizing selection over the course of a generation is (57, 59)
(A2) |
where is a matrix describing the strength of stabilizing selection within each sex
(A3) |
For stabilizing selection to occur the diagonal terms must be negative with decreasing values indicating stronger stabilizing selection. Substituting (A3) and (A1) into (A2) yields
(A4) |
By assuming that initially the trait has identical genetic variance in males and females (which is assumed at the start of our simulation): we can use the simplification . This leads to the following equations for and :
(A5a) |
(A5b) |
In the case of stronger stabilizing selection in females: . Additionally, theoretical and empirical evidence suggests that generally the magnitude of intrasex genetic variance is larger than the magnitude of intersex genetic covariance: (57). Therefore, (A5a) and (A5b) yields: . Following the assumptions made in our simulation (namely that the phenotypic variance in each sex is just the sum of the environmental variances and the additive genetic variances (G), and that the environmental variance is the same in both sexes) leads to the result that phenotypic variance will decrease to a larger extent in females than in males over the course of a generation. This is what we observed (however, over multiple generations) in our simulation (Fig. 4A and B).
To further relate our simulation results to the theory described above we can formulate the phenotypes and their variances from the simulation as follows:
(A6a) |
(A6b) |
and are the phenotypes in males and females respectively. , , and are the genetic contributions of the shared genetic locus, the male specific genetic locus, and the female specific genetic locus. The environmental contribution to the phenotypes is . From (A6a) and (A6b), since all the phenotypic components are independent (so their covariances are 0), the phenotypic variances are
(12) |
(A7a) |
(A7b) |
and the phenotypic covariances are
(A7c) |
From (A7a), (A7b), and (A7c) it becomes apparent that corresponds to , corresponds to , and B corresponds to . From the off-diagonal terms in Equation (A4) it is evident that increasing the strength of selection in females (decreasing ) makes more negative—which is consistent with the observed relative reduction in the genetic variance in the shared locus () (Fig. 4C).
Notes
Competing Interest: The authors declare no competing interest.
Contributor Information
Samuel Khodursky, Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA.
Caroline S Jiang, Department of Biostatistics, The Rockefeller University, New York, NY 10065, USA.
Eric B Zheng, Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA.
Roger Vaughan, Department of Biostatistics, The Rockefeller University, New York, NY 10065, USA.
Daniel R Schrider, Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA.
Li Zhao, Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA.
Funding
L.Z. was supported by the National Institutes of Health (NIH) MIRA R35GM133780, the Robertson Foundation, a Rita Allen Foundation Scholar Program, and a Vallee Scholar Program (VS-2020–35), a Monique Weill-Caulier Career Scientist Award, an Allen Distinguished Investigator Award through the Paul G. Allen Frontiers Group, and an Alfred P. Sloan Research Fellowship (FG-2018–10627). C.S.J. was supported by UL1 TR001866 from the National Center for Advancing Translational Sciences (NCATS), NIH Clinical and Translational Science Award (CTSA) program. E.B.Z. was supported by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the NIH under award number: T32GM007739 to the Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program.
Authors’ Contributions
S.K. and L.Z. conceived the project. S.K. performed the analysis with input from C.S.J. and R.V. concerning statistical matters. D.R.S. provided input for the population genetics simulation, appendix, and other evolutionary analyses. E.B.Z provided input for the population genetics simulation and code review. L.Z. supervised the project. S.K. and L.Z. wrote the manuscript with input from all authors.
Data Availability
The code generated for this work has been deposited on GitHub: https://github.com/LiZhaoLab/SDV.
References
- 1. Chella Krishnan K, Mehrabian M, Lusis AJ. 2018. Sex differences in metabolism and cardiometabolic disorders. Curr Opin Lipidol. 29: 404–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Choleris E, Galea LAM, Sohrabji F, Frick KM. 2018. Sex differences in the brain: implications for behavioral and biomedical research. Neurosci Biobehav Rev. 85: 126–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Klein SL, Flanagan KL. 2016. Sex differences in immune responses. Nat Rev Immunol. 16: 626–638. [DOI] [PubMed] [Google Scholar]
- 4. Ober C, Loisel DA, Gilad Y. 2008. Sex-specific genetic architecture of human disease. Nat Rev Genet. 9: 911–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Naqvi S, et al. 2019. Conservation, acquisition, and functional impact of sex-biased gene expression in mammals. Science. 365: eaaw7317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Parsch J, Ellegren H. 2013. The evolutionary causes and consequences of sex-biased gene expression. Nat Rev Genet. 14: 83–87. [DOI] [PubMed] [Google Scholar]
- 7. Williams TM, Carroll SB. 2009. Genetic and molecular insights into the development and evolution of sexual dimorphism. Nat Rev Genet. 10: 797–804. [DOI] [PubMed] [Google Scholar]
- 8. Catalán A, Hutter S, Parsch J. 2012. Population and sex differences in Drosophila melanogaster brain gene expression. BMC Genomics. 13: 654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gershoni M, Pietrokovski S. 2017. The landscape of sex-differential transcriptome and its consequent selection in human adults. BMC Biol. 15: 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Khodursky S, Svetec N, Durkin SM, Zhao L. 2020. The evolution of sex-biased gene expression in the Drosophila brain. Genome Res. 30: 874–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Lopes-Ramos CM, et al. 2020. Sex differences in gene expression and regulatory networks across 29 human tissues. Cell Rep. 31: 107795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Oliva M, et al. 2020. The impact of sex on gene expression across human tissues. Science. 369: eaba3066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Meisel RP, Malone JH, Clark AG. 2012. Faster-X evolution of gene expression in Drosophila. PLos Genet. 8: e1003013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Ranz JM, Castillo-Davis CI, Meiklejohn CD, Hartl DL. 2003. Sex-dependent gene expression and evolution of the Drosophila transcriptome. Science. 300: 1742–1745. [DOI] [PubMed] [Google Scholar]
- 15. Parisi M, et al. 2003. Paucity of genes on the Drosophila X chromosome showing male-biased expression. Science. 299: 697–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Beery AK, Zucker I. 2011. Sex bias in neuroscience and biomedical research. Neurosci Biobehav Rev. 35: 565–572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Itoh Y, Arnold AP. 2015. Are females more variable than males in gene expression? Meta-analysis of microarray datasets. Biol Sex Differ. 6: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Zajitschek SR, et al. 2020. Sexual dimorphism in trait variability and its eco-evolutionary and statistical implications. Elife. 9: e63170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Falconer DS. 1967. The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus. Ann Hum Genet. 31: 1–20. [DOI] [PubMed] [Google Scholar]
- 20. Simonovsky E, Schuster R, Yeger-Lotem E. 2019. Large-scale analysis of human gene expression variability associates highly variable drug targets with lower drug effectiveness and safety. Bioinformatics. 35: 3028–3037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Soldin OP, Mattison DR. 2009. Sex differences in pharmacokinetics and pharmacodynamics. Clin Pharmacokinet. 48: 143–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Rigby RA, Stasinopoulos DM. 2005. Generalized additive models for location, scale and shape. J R Stat Soc Ser C. 54: 507–554. [Google Scholar]
- 23. Robinson MD, Smyth GK. 2007. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 23: 2881–2887. [DOI] [PubMed] [Google Scholar]
- 24. Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 26: 139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. de Jong TV, Moshkin YM, Guryev V. 2019. Gene expression variability: the other dimension in transcriptome analysis. Physiol Genomics. 51: 145–158. [DOI] [PubMed] [Google Scholar]
- 26. McCarthy DJ, Chen Y, Smyth GK. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40: 4288–4297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Liberzon A, et al. 2015. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1: 417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Subramanian A, et al. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 102: 15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Kolmykov S, et al. 2021. GTRD: an integrated view of transcription regulation. Nucleic Acids Res. 49: D104–D111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Avery JT, Zhang R, Boohaker RJ. 2021. GLI1: a therapeutic target for cancer. Front Oncol. 11: 673154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Clement V, Sanchez P, de Tribolet N, Radovanovic I, Ruiz i Altaba A. 2007. HEDGEHOG-GLI1 signaling regulates human glioma growth, cancer stem cell self-renewal, and tumorigenicity. Curr Biol. 17: 165–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Sun Y, et al. 2014. Estrogen promotes stemness and invasiveness of ER-positive breast cancer cells through Gli1 activation. Mol Cancer. 13: 137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Wu L, et al. 2000. MAML1, a human homologue of Drosophila mastermind, is a transcriptional co-activator for NOTCH receptors. Nat Genet. 26: 484–489. [DOI] [PubMed] [Google Scholar]
- 34. Quaranta R, et al. 2017. Maml1 acts cooperatively with Gli proteins to regulate sonic hedgehog signaling pathway. Cell Death Dis. 8: e2942–e2942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Rao RC, Dou Y. 2015. Hijacked in cancer: the KMT2 (MLL) family of methyltransferases. Nat Rev Cancer. 15: 334–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Nguyen QH, et al. 2018. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat Commun. 9: 2028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Hao Y, et al. 2020. Integrated analysis of multimodal single-cell data. Cell. 184: 3573–3587.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Mauvais-Jarvis F, et al. 2020. Sex and gender: modifiers of health, disease, and medicine. Lancet. 396: 565–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. McIver B, Morris JC. 1998. The pathogenesis of graves’ disease. Endocrinol Metab Clin North Am. 27: 73–89. [DOI] [PubMed] [Google Scholar]
- 40. Buniello A, et al. 2019. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47: D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Chu X, et al. 2011. A genome-wide association study identifies two new risk loci for Graves’ disease. Nat Genet. 43: 897–901. [DOI] [PubMed] [Google Scholar]
- 42. Cooper JD, et al. 2012. Seven newly identified loci for autoimmune thyroid disease. Hum Mol Genet. 21: 5202–5208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Nakabayashi K, et al. 2011. Identification of independent risk loci for Graves’ disease within the MHC in the Japanese population. J Hum Genet. 56: 772–778. [DOI] [PubMed] [Google Scholar]
- 44. Sakaue S, et al. 2021. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 53: 1415–1424. [DOI] [PubMed] [Google Scholar]
- 45. Zhao S-X, et al. 2013. Robust evidence for five new Graves’ disease risk loci from a staged genome-wide association analysis. Hum Mol Genet. 22: 3347–3362. [DOI] [PubMed] [Google Scholar]
- 46. Dayan CM, et al. 1991. Autoantigen recognition by thyroid-infiltrating T cells in Graves disease. Proc Natl Acad Sci. 88: 7415–7419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Totterman TH, Andersson LC, Häyry P. 1979. Evidence for thyroid antigen-reactive t lymphocytes infiltrating the thyroid gland in graves'disease. Clin Endocrinol. 11: 59–68. [DOI] [PubMed] [Google Scholar]
- 48. Glassberg EC, Gao Z, Harpak A, Lan X, Pritchard JK. 2019. Evidence for weak selective constraint on human gene expression. Genetics. 211: 757 LP–757772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Josephs EB, Lee YW, Stinchcombe JR, Wright SI. 2015. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc Natl Acad Sci. 112: 15390 LP–15315395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Siepel A, et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15: 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Karczewski KJ, et al. 2020. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 581: 434–443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Huang Y-F, Gulko B, Siepel A. 2017. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet. 49: 618–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Fair BJ, et al. 2020. Gene expression variability in human and chimpanzee populations share common determinants. Elife. 9: e59929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Durinck S, et al. 2005. BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 21: 3439–3440. [DOI] [PubMed] [Google Scholar]
- 55. Durinck S, Spellman PT, Birney E, Huber W. 2009. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 4: 1184–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Korotkevich G, et al. 2021. Fast gene set enrichment analysis. bioRxiv. 60012. [Google Scholar]
- 57. Barker BS, Phillips PC, Arnold SJ. 2010. A test of the conjecture that G-matrices are more stable than B-matrices. Evolution. 64: 2601–2613. [DOI] [PubMed] [Google Scholar]
- 58. Lande R. 1980. Sexual dimorphism, sexual selection, and adaptation in polygenic characters. Evolution. 34: 292. [DOI] [PubMed] [Google Scholar]
- 59. Phillips PC, Arnold SJ. 1989. Visualizing multivariate selection. Evolution. 43: 1209–1222. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code generated for this work has been deposited on GitHub: https://github.com/LiZhaoLab/SDV.