Abstract
Despite extensive sex differences in human complex traits and disease, the male and female genomes differ only in the sex chromosomes. This implies that most sex-differentiated traits are the result of differences in the expression of genes that are common to both sexes. While sex differences in gene expression have been observed in a range of different tissues, the biological mechanisms for tissue-specific sex differences (TSSDs) in gene expression are not well understood. A total of 30 640 autosomal and 1021 X-linked transcripts were tested for heterogeneity in sex difference effect sizes in n = 617 individuals across 40 tissue types in Genotype–Tissue Expression (GTEx). This identified 65 autosomal and 66 X-linked TSSD transcripts (corresponding to unique genes) at a stringent significance threshold. Results for X-linked TSSD transcripts showed mainly concordant direction of sex differences across tissues and replicate previous findings. Autosomal TSSD transcripts had mainly discordant direction of sex differences across tissues. The top cis-expression quantitative trait loci (eQTLs) across tissues for autosomal TSSD transcripts are located a similar distance away from the nearest androgen and estrogen binding motifs and the nearest enhancer, as compared to cis-eQTLs for transcripts with stable sex differences in gene expression across tissue types. Enhancer regions that overlap top cis-eQTLs for TSSD transcripts, however, were found to be more dispersed across tissues. These observations suggest that androgen and estrogen regulatory elements in a cis region may play a common role in sex differences in gene expression, but TSSD in gene expression may additionally be due to causal variants located in tissue-specific enhancer regions.
Introduction
The difference between sexes has an increasingly important focus in the study of complex traits and disease phenotypes (1). Males and females differ not only in appearance and behavior but also in risk, incidence, prevalence, severity and age-at-onset of many diseases including autoimmune disease (2), cancer (3,4), cardiovascular disease (5) and neurological and psychiatric disorders (6–9). Despite these extensive differences, male and females share the vast majority of their genome, differing only in the sex chromosomes (10). This implies that most sex-differentiated traits, or traits that exhibit mean or variance differences between males and females, are the result of differences in the expression of genes that are common to both sexes (11). Indeed, sex differences in gene expression have been observed across a range of different tissues including liver, heart, and brain, and has been shown to have some tissue-dependent control (i.e. tissue-specific sex differences, TSSDs), in both humans and other species, which is indicated by low overlap of sex-differentially expressed genes across tissues (11–13). In these studies, gene expression is tested for mean differences between the sexes to identify sets of sex-differentially expressed genes within each tissue. These sets of genes are then compared across tissues, where genes that are not found at the intersection are deemed TSSD (14–19). This analytical approach is limited in that, first, there is no formal statistical testing when comparing sex-differentially expressed genes across tissues; second, there is no consideration for differences in the direction and magnitude of sex-differentiated expression across tissue types; and finally, failure to account for repeated observations from the same individual increases the rates of false-positive and false-negative associations.
Sex differences in gene expression have different underlying mechanisms on the autosomal and sex chromosomes. In mammals, females have two copies of the X chromosome, while males have only one (and a Y chromosome). Mechanisms have evolved to balance allele dosages differences in X-linked genes between the sexes, where one of the X chromosomes in females is randomly inactivated. However, 15% to 23% of X-linked genes have been shown to escape X-inactivation (20), leading to X-linked gene expression variability across the sexes. For example, inactivated X-linked genes in the non-pseudoautosomal region (non-PAR) of the X chromosome are expected to show no difference in expression between the sexes, while X-linked genes in the non-PAR that escape X chromosome inactivation are expected to have higher expression in females compared to males. Widespread sex differences in autosomal gene expression have led to the assumption of a sex-specific genetic architecture. Many studies have aimed to identify sex-specific genetic control of autosomal gene expression (21–24); however, the evidence for differential genetic control of gene expression across the sexes is largely inconsistent. Indeed, our previous work has shown that, on average, males and females share a common autosomal genetic control of gene expression in whole blood (25). This is consistent with sex differences in gene expression being due to differences in the regulatory genome across the sexes (10,11) and may be due to differential binding of sex hormones affecting the expression of nearby genes.
Sex differences in human autosomal gene expression may be developed secondary to that of sex-determining gene expression through the influence of sex-specific hormones. Here, a regulatory cascade initiated by the expression of genes on the sex-determining regions of the sex chromosomes may initiate differential expression in many mediator genes, which then regulate expression of downstream genes through interactions with sex-specific hormones, such as estrogen and androgen. This type of regulatory cascade may then affect low-order complex traits, such as metabolism (26) and DNA methylation (27), or higher-order complex traits, such as disease susceptibility (1,10,11). For example, estrogen may be involved in the development and progression of systemic lupus erythematosus (SLE), an autoimmune disease mostly observed in females, where female SLE patients have been shown to have increased levels of estrogen and active estrogen metabolites in serum (28,29), and increased risk of developing SLE in post-menopausal women who have undergone hormone replacement therapy (30). Further, the precise action of disease-associated genes on disease pathophysiology typically acts in a tissue-specific manner (31,32). One possible mechanism for (tissue-specific) sex differences in human gene expression is the interaction of sex-specific hormones with (tissue specific) regulatory elements that are located in a cis region of the target gene. These genes can have non-sex-specific transcription factor binding sites that act in combination with androgen and estrogen binding sites to interact with sex-specific hormones and produce sex-specific gene regulation (33,34). This type of regulatory mechanism has been observed in experimental animal models for candidate genes [e.g. Slp expression in mice (35,36)].
In this study, we use data from the Genotype-Tissue Expression Project (GTEx) and the Roadmap Epigenome Project to investigate TSSD in human gene expression and elucidate the underlying biological mechanisms.
Results
The GTEx dataset
RNA-seq data from the Genotype-Tissue Expression project (GTEx v7 release) was available in n = 617 deceased human donors across 43 non-diseased tissue types. Sample sizes per tissue ranged from n = 80 in brain (substantia nigra) to n = 491 in muscle (skeletal) with a mean of n = 223 across the 43 tissue types, and the proportion of females within each tissue ranged from 27% females in minor salivary gland to 45% females in adrenal gland, with a mean of 36% females across all 43 tissue types (Supplementary Material, Fig. S1). The number of autosomal transcripts identified as expressed in each tissue ranged from 19 432 in whole blood to 25 038 in pituitary, with a mean of 22 837 across all tissues; similarly, the number of expressed X-linked transcripts ranged from 627 in whole blood to 867 in pituitary (mean of 771) (see Supplementary Material, Table S1). We identified breast (mammary), pituitary and minor salivary glands as tissue outliers, where the adjustment of PEER (probabilistic estimation of expression residuals) factors to correct for latent experimental confounders (37–39) significantly influenced sex difference effect sizes, or the regression coefficient from a linear regression of gene expression on sex, for a large proportion of transcripts. These tissues were therefore excluded from subsequent analyses (see Supplementary Material, Supplementary Notes 1.1 and 1.2 for full details).
TSSDs in gene expression
Widespread sex-differentiated expression in the GTEx dataset was first confirmed by identifying a total of 353 (262 autosomal and 91 X-linked) unique transcripts that showed significant mean differences in expression between the sexes in at least one of the 40 tissue types using a within tissue Bonferroni corrected significance threshold (see Supplementary Material, Supplementary Notes 1.3 and Fig. S2). We then tested for TSSD in gene expression for 30 640 autosomal and 1021 X-linked transcripts (corresponding to 30 572 autosomal and 1021 X-linked genes) that were expressed in at least two tissue types (15 025 autosomal and 495 X-linked transcripts were expressed in all 40 tissue types) in a mixed linear regression model framework. The equality of sex difference effect sizes for each transcript across tissue types was tested with a sex-by-tissue interaction term while controlling for repeated observations from the same individual (see Materials and Methods). Using a mixed linear model to explicitly test for TSSD is more robust than approaches looking for (lack of) overlap in significant sex difference across tissues. It avoids false-positive results caused by lack of power due to low sample size in individual tissues and false-negative results in tissues that show sex difference in all tissues but of unequal size (see Supplementary Material, Supplementary Notes 1.4). A total of 65 autosomal and 66 X-linked transcripts (each corresponding to a unique gene) showed a significant sex-by-tissue interaction at a Bonferroni corrected significance threshold of PTSSD ≤ 1.58 × 10−6, indicating heterogeneity in sex difference effect size across tissue types. These 131 (65 autosomal and 66 X-linked) transcripts are hereafter referred to as TSSD transcripts. We observed enrichment for TSSD transcripts on the X chromosome compared to the autosome (66/1021 X-linked versus 65/30640 autosomal TSSD transcripts; Fisher’s exact, P = 1.20 × 10−62). Exclusion of 170 X chromosome transcripts known to escape X inactivation (40) did not influence our result (18/851 X-linked versus 65/30 640 autosomal TSSD transcripts; Fisher’s exact, P = 7.12 × 10−12). Further, we found that X-linked TSSD transcripts were more often concordant in direction for male/female sex differences across the tissues compared to autosomal TSSD transcripts (33/66 X-linked versus 12/65 autosomal concordant TSSD transcripts; Fisher’s exact, P = 2.01 × 10−4). Among the transcripts concordant in direction for male/female sex differences, X-linked TSSD transcripts more often showed female-bias expression compared to autosomal TSSD transcripts (26/33 X-linked versus 2/12 autosomal concordant TSSD transcripts; Fisher’s exact P = 2.68 × 10−4). These results are consistent with sex differences being driven by X inactivation. The remaining TSSD transcripts showed discordant direction of sex differences in at least one of the expressed tissues. To test for chromosomal enrichment of TSSD on the autosome, we calculated the mean TSSD F statistic across all transcripts in each chromosome and compared the observed test statistic to what is expected by chance by circular genomic permutation (see Materials and Methods) (41). We did not observe enrichment for TSSD on any chromosome after accounting for multiple testing (Padjusted > 0.83).
Sensitivity analysis of the 131 (65 autosomal and 66 X-linked) TSSD transcripts included removal of individual tissues from the dataset, in turn, which altered the results for 10 (7 autosomal and 3 X-linked) TSSD transcripts, where the effect of the interaction term became non-significant (PTSSD > 0.01) upon removal of a single tissue from the analysis. This observation is consistent with extreme TSSD in gene expression, where a transcript exhibits sex differences in a single tissue alone. The SALL1 (Spalt Like Transcription Factor 1) gene on chromosome 16, for example, showed significant heterogeneity in sex difference effect sizes across 29 tissue types with PTSSD = 1.76 × 10−11. The Supplementary Material, Figure S3, shows mean Transcripts Per Kilobase Million (TPM) expression across tissues of 8.00 (SD = 11.58) and 7.91 (SD = 11.18) in males and females, respectively, with the highest mean TPM expression across individuals shown in thyroid tissue. Figure 1 shows that TSSD may be driven by the sex differences in adipose (visceral omentum) tissue, where SALL1 shows higher expression in females compared to males (P = 4.94 × 10−17), with the majority of the remaining tissues with 95% confidence intervals overlapping the null of no difference in expression between the sexes. Indeed, the exclusion of adipose (visceral omentum) tissue from the analysis removed the detected heterogeneity in sex difference effect sizes (PTSSD = 0.80), indicating that TSSD in this transcript is being driven by adipose (visceral omentum) tissue alone. Full details of the identified TSSD transcripts are given in the Supplementary Material, Table S2.
Figure 1.
SALL1 on chromosome 16 showed evidence for extreme TSSD in gene expression in adipose (visceral omentum) tissue. SALL1 showed significant heterogeneity in sex difference effect sizes across 29 tissue types with PTSSD = 1.76 × 10−11. A negative sex difference effect size indicates that males have higher expression for the transcript compared to females, and a positive sex difference effect size indicates that females have higher expression for the transcript compared to males. Exclusion of adipose (visceral omentum) tissue from analysis removed the detected heterogeneity (PTSSD = 0.80), indicating that TSSD in this transcript is being driven by one tissue alone. Bars are 95% confidence intervals.
We found that our TSSD results are robust to adjustment for potential confounders (e.g. cohort, age, reported race, body mass index (BMI) and death classification) and ensuring the TSSD results are not affected by the unequal male/female within tissue sample sizes (Supplementary Material, Fig. S1). Full details of sensitivity analyses are in Supplementary Notes 1.5.
TSSD in X-linked gene expression
We observed enrichment for X-linked TSSD transcripts in the PAR of the X chromosome (15/33 PAR versus 51/990 non-PAR TSSD transcripts; Fisher’s exact, P = 3.91 × 10−11) and among genes known to escape X-inactivation (48/170 escape/variable VERSUS 9/392 inactive TSSD transcripts; Fisher’s exact, P = 4.87 × 10−19). The Supplementary Material, Figure S4, shows the sex difference effect size distribution of TSSD transcripts on the X chromosome. Consistent with (40), the majority of non-PAR TSSD transcripts have higher expression in females compared to males across the expressed tissues, while the majority of PAR TSSD transcripts have higher expression in males compared to females across the expressed tissues. Among the 66 X-linked TSSD transcripts is XIST, a gene that is expressed exclusively from the X-inactivation center of the inactive X chromosome, and plays an essential role in X-inactivation (42). XIST had concordant direction of sex differences across tissues, with higher expression in females compared to males, but showed significant heterogeneity in sex difference effect sizes across 40 tissue types (PTSSD = 1.63 × 10−109) (Supplementary Material, Fig. S5). While the statistical model used to detect TSSD will not detect heterogeneity purely due to differences in sample size or levels of expression across tissues, we performed sensitivity analysis to confirm no underlying bias. We found sex difference effect sizes for XIST versus sample size (Pearson correlation r = 0.19; 95% CI, –0.13 to 0.48) and mean TPM expression in males (Pearson correlation coefficient of r = 0.06; 95% CI, –0.25 to 0.37) and females (Pearson correlation coefficient of r = 0.07; 95% CI, –0.25 to 0.37) in each tissue showed no significant correlation (Supplementary Material, Figs S6 and S7). We observed incomplete and variable escape from X-inactivation in the PAR of the X chromosome across tissue types. For example, CD99 showed concordant direction of sex differences with higher expression in males compared to females but significant heterogeneity in sex difference effect sizes across all 40 tissue types (PTSSD = 2.19 × 10−193). These results indicate that X chromosome inactivation is relatively stable across tissue type, with some evidence for variable or incomplete escape from X inactivation, also described in (40).
Tukiainen et al. demonstrated that the X-linked KAL1 gene has biallelic expression exclusively in lung tissue, providing strong evidence for tissue-specific escape from X-inactivation (40). The Supplementary Material, Figure S8, illustrates the mean TPM expression for KAL1 across 38 expressed tissue types. As shown, mean TPM expression across tissues was 3.92 (SD = 3.39) and 4.26 (SD = 4.44) in males and females, respectively, with the highest mean TPM expression across individuals shown in female lung tissue. Indeed, in our analysis, KAL1 showed evidence for TSSD in gene expression (PTSSD = 8.84 × 10−8; Fig. 2), with discordant direction of sex difference effect sizes across all expressed tissues. In particular, we observed higher expression in females compared to males in lung tissue, and the exclusion of lung tissue from the analysis removed the detected heterogeneity in sex difference effect sizes across tissue types (PTSSD = 0.38), indicating that lung tissue alone is driving the TSSD.
Figure 2.
The X-linked KAL1 shows evidence for extreme TSSD in gene expression in lung tissue. KAL1 showed significant heterogeneity in sex difference effect sizes across 38 tissue types with PTSSD = 8.84 × 10−8. A negative sex difference effect size indicates that males have higher expression for the transcript compared to females, and a positive sex difference effect size indicates that females have higher expression for the transcript compared to males. Exclusion of lung tissue from analysis removed the detected heterogeneity (PTSSD = 0.38), indicating that TSSD in this transcript is being driven by lung tissue alone. These results are in agreement with observations made in (40). Bars are 95% confidence intervals.
TSSD in autosomal gene expression
Figure 3 shows the sex difference effect size distribution of autosomal TSSD transcripts. As shown, a total of 10 TSSD transcripts are concordant in the male-bias direction, two show concordance in the female-bias direction and the remaining autosomal TSSD transcripts are discordant across tissue types. We hypothesized that the detected heterogeneity in sex difference effect sizes in the 65 autosomal TSSD transcripts is due to the interaction of sex-specific hormones, androgen and estrogen; tissue-specific enhancers located in a cis region of the target gene; and other regulatory elements in the region, as has been demonstrated for specific examples in mice (35,36). To investigate this, we identified 49/65 autosomal TSSD transcripts with cis-expression quantitative trait loci (eQTLs) (PeQTL < 5 × 10−8) in at least one tissue (mean of 12 tissues) within a 1 Mb window of the target transcript using cis-eQTL summary-level data from the 40 tissue types downloaded from the GTEx Portal (https://gtexportal.org/home/datasets, accessed 20 November 2017), and a control set of 65 autosomal transcripts with stable sex differences (SSDs), of which 39 had cis-eQTLs in a mean of 4 tissues. SSD transcripts are transcripts that show differences in gene expression between the sexes but are concordant in sex difference effect sizes across tissues. These were defined as transcripts with Psex < 1 × 10−4, Ptissue > 0.05 and PTSSD > 0.05 in the mixed linear regression model (see Materials and Methods).
Figure 3.
Sex difference effect size distribution for 65 autosomal TSSD transcripts across 40 tissue types. Blue indicates a negative sex difference effect size, where males have higher expression for the transcript compared to females. Red indicates a positive sex difference effect size, where females have higher expression for the transcript compared to males. A total of 10 TSSD transcripts are concordant in the male-bias direction, two show concordance in the female-bias direction and the remaining 53 autosomal TSSD transcripts are discordant across tissue types. The effect of the interaction term for seven autosomal TSSD transcripts (indicated in red text) became non-significant (PTSSD > 0.01) upon removal of a single tissue (indicated by a star) from the analysis. Transcripts are ordered by chromosome and by position on the chromosome. White crosses indicate missing data.
To understand the cis regulatory landscape for TSSD versus SSD transcripts, we first asked whether the top cis-eQTLs across tissues for the set of transcripts were in or near estrogen receptor (ER) or androgen receptor (AR) binding motifs. We found that none of the top cis-eQTLs for the TSSD or SSD transcripts were in ER or AR binding motifs, and no significant difference was observed in the distance between the top cis-eQTLs and the nearest AR (mean 173 kb for TSSD transcripts versus 106 kb for SSD transcripts; t statistic = −1.34, df = 84.86, P = 0.18) or ER (mean 215 kb for TSSD transcripts versus 132 kb for SSD transcripts; t-statistic = −0.53, df = 85.58, P = 0.59) binding motif. Next, we asked whether the top cis-eQTLs were located in enhancer regions across 127 tissue types from the Roadmap Epigenome Project (43) (see Supplementary Material, Table S3). A total of 17 and 11 top cis-eQTLs for TSSD and SSD transcripts, respectively, were found in enhancer regions across a median of 10 tissue types. We tested for homogeneity of variances for the number of tissue types the enhancer regions are found in as an indication of the degree of tissue specificity. A small variance or a variance of zero indicates that the top cis-eQTLs are in enhancer regions that are relatively stable across the tissue types; that is, the enhancer regions are consistently found in the same genomic position across the 127 Roadmap Epigenome tissue types. Alternatively, a large variance indicates that the top cis-eQTLs are in enhancer regions that are more dispersed across the 127 tissue types and represents more tissue-specificity. As illustrated in Figure 4, we found a nominally significant difference in variances, where top cis-eQTLs for TSSD transcripts were in enhancer regions that were more dispersed across the 127 tissue types, as compared to the top cis-eQTLs for SSD transcripts (variance of 929.93 for TSSD cis-eQTLs versus variance of 32.09 for SSD cis-eQTLs; Fligner–Killeen test, P = 0.02). Top cis-eQTLs for TSSD and SSD transcripts that were not in enhancer regions were found to be a similar distance away from the nearest enhancer (3.09 kb for TSSD transcripts versus 4.45 kb for SSD transcripts; t-statistic = −0.51, df = 53.80, P = 0.62).
Figure 4.
A total of 17 and 11 top cis-eQTLs for TSSD and SSD transcripts, respectively, were found in enhancer regions across a median of 10 Roadmap Epigenome tissue types. We tested for homogeneity of variances for the number of tissue types the enhancer regions are found in as an indication of the degree of tissue specificity. We found a nominally significant difference in variances, indicating that the top cis-eQTLs for TSSD transcripts were in enhancer regions that were more dispersed, indicating more tissue specificity, across the 127 Roadmap Epigenome tissue types as compared to the top cis-eQTLs for SSD transcripts (variance of 929.93 for TSSD cis-eQTLs versus variance of 32.09 for SSD cis-eQTLs, Fligner–Killeen test P = 0.02).
We illustrate our results with a detailed example. RP4-610C12.4 on chromosome 20 was among the top autosomal TSSD transcripts that showed concordant but significant heterogeneity in sex difference effect sizes across 31 tissue types, with higher expression in males compared to females (PTSSD = 4.58 × 10−30; Supplementary Material, Fig. S9). The top cis-eQTL for RP4-610C12.4, single nucleotide polymorphism (SNP) rs2424800, is located inside the transcript and is associated with increased expression in brain (caudate basal ganglia) tissue (PeQTL = 7.59 × 10−14). We examined the cis region around SNP rs2424800 and found that the closest AR and ER binding motifs were 119 kb and 638 kb downstream from the SNP, respectively. Interestingly, SNP rs2424800 was found to be in a weak enhancer region exclusive to brain (hippocampus) and, indeed, was found to be nominally associated with increased expression of RP4-610C12.4 in brain (hippocampus) tissue (PeQTL = 6 × 10−8) (Fig. 5). Notably, RP4-610C12.4 had cis-eQTLs in 13 other tissue types, where SNP rs2424800 was a shared cis-eQTL (PeQTL < 5 × 10−8) for seven other brain tissues as well as lung and esophagus (muscularis). We hypothesize that male-bias expression of RP4-610C12.4 across the 31 tissue types is achieved through an interaction between the nearby AR binding motif and the corresponding sex-specific hormone, with brain-specific expression due to an interaction between the top cis-eQTL and the enhancer region in the target tissue (Supplementary Material, Fig. S10). Additional variation across tissues may be caused by additional cis-eQTLs with tissue-specific effects.
Figure 5.
The top cis-eQTL (blue star) for RP4-610C12.4, SNP rs2424800, is located inside the transcript and is associated with increased expression in brain (caudate basal ganglia) tissue (PeQTL = 7.59 × 10−14). Examining the cis region around SNP rs2424800 showed that the closest androgen receptor (AR) binding motif (brown oval) is 119 kb downstream from the SNP. SNP rs2424800 was found to be in a weak enhancer region exclusive to brain (hippocampus). SNP rs2424800 is nominally associated with increased expression of RP4-610C12.4 in brain (hippocampus) tissue (PeQTL = 6 × 10−8).
Autosomal TSSD transcripts and complex traits and disease
We previously described that KAL1 on the X chromosome showed evidence for TSSD in gene expression, which was driven by lung tissue alone. Indeed, a differential expression analysis of lung function showed approximately 1.50 times lower expression of KAL1 in patients with severe versus mild idiopathic interstitial pneumonias (44). To understand the role of TSSD transcripts in complex traits and disease further, we performed a qualitative assessment of our results by downloading the Genome-wide association study (GWAS) Catalog (https://www.ebi.ac.uk/gwas/docs/file-downloads, accessed 1 March 2019) (45) and mapping our 65 autosomal TSSD transcripts to genes identified in previous GWAS studies. A total of 18 TSSD transcripts were found to overlap with 22 complex traits and diseases, with top results, intelligence and blood protein level traits, mapping to four TSSD transcripts (Supplementary Material, Fig. S11). The TSSD transcript HOXA11 (Homeobox A11) on chromosome 7, for example, was previous implicated in studies examining waist-to-hip ratio (adjusted for BMI). The HOXA11 (Homeobox A11) gene showed significant heterogeneity in sex difference effect sizes, with discordant direction of sex differences across 13 tissue types (PTSSD = 1.46 × 10−6). Exclusion of adipose (visceral omentum) tissue from the analysis removed the detected heterogeneity in sex difference effect sizes (PTSSD = 0.24), indicating that TSSD in this transcript is being driven by adipose (visceral omentum) tissue alone.
Discussion
The biological mechanisms responsible for TSSD in human gene expression and how these patterns of expression interact and converge to affect downstream phenotypic differences between the sexes such as differences in risk, incidence, prevalence, severity and age-at-onset of disease are not yet fully understood. The usual analytical approach used to examine TSSD in gene expression in humans (14–16) and other species (17–19) identifies sex-differentially expressed genes within tissues and compares these gene sets across tissues. Genes not at the intersection of these gene sets are deemed TSSD. Differences in the statistical power in each tissue can lead to high rates of false-positive and false-negative associations using this approach. Further, no study has explored potential regulatory mechanisms responsible for TSSD in autosomal gene expression in humans. We address these limitations by formally testing differences in the magnitude and direction of sex difference effect sizes by modeling gene expression across 40 tissue types simultaneously and testing for heterogeneity in sex difference effect sizes. We identified 65 autosomal and 66 X-linked TSSD transcripts, with the majority of autosomal TSSD transcripts showing discordant direction of sex difference effect sizes across tissue types.
Results from our X-linked analysis broadly confirm previous observations that X chromosome inactivation is relatively stable across tissue type, with some evidence for variable or incomplete escape from X inactivation (40). In particular, we found enrichment for TSSD in the PAR of the X chromosome and among transcripts known to escape X-inactivation, where TSSD transcripts in the non-PAR have higher expression in females compared to males and TSSD transcripts in the PAR have higher expression in males compared to females across the expressed tissues. Tissue-specific escape from X-inactivation has been observed in previous studies in mouse (46) and humans (20,40,47), where a number of genes show variability in escape from X-inactivation between tissues or individual cells within a tissue. Overall, the results from our analysis of TSSD in X-linked gene expression demonstrates the validity of our analytical approach in that we are able to replicate previous findings and provide additional insights into X-linked TSSD expression. For example, we found statistical evidence for heterogeneity in sex difference effect sizes for the X-linked KAL1 gene, and the detected heterogeneity in sex difference effect sizes is driven by lung tissue alone. This result is in agreement with evidence for tissue-specific escape from X-inactivation observed in (40).
We hypothesized that autosomal TSSD in gene expression may be due to the interaction of androgen and estrogen binding sites, tissue-specific enhancers and other regulatory elements located in a cis region of the target gene. This type of mechanism has been demonstrated in experimental mouse models for the male-specific expression of the Slp gene across tissues, which is a product of interactions between androgen and a nearby androgen-responsive enhancer (36). In our analysis, we compared the cis regulatory landscape for TSSD transcripts to a set of control SSD transcripts. We found that the top cis-eQTLs for TSSD versus SSD transcripts were located similar distance away from the nearest androgen and estrogen binding motifs and the nearest enhancer. Enhancer regions that overlap the top cis-eQTLs for TSSD transcripts, however, were found to be more dispersed across 127 tissue types from the Roadmap Epigenome Project compared to the top cis-eQTLs for SSD transcripts, indicating that the top cis-eQTLs for TSSD transcripts were more often in tissue-specific enhancer regions. Together, these observations suggest that the androgen and estrogen regulatory elements that were located in a cis region may play a common role in the sex-differential expression of these sets of transcripts, but the divergence toward tissue-specific gene expression for TSSD transcripts may be due to the non-random distribution of causal variants in tissue-specific enhancer regions. Thus, similar to the mechanism observed in (36), TSSD transcripts may be a product of the interaction between tissue-specific enhancers and AR and ER binding sites in a cis region of the target gene. As an illustrative example of these results, we showed an extreme example where the top cis-eQTL for a TSSD transcript was found in a brain-specific enhancer region. We hypothesized that the male-bias expression that was observed across 31 tissue types is achieved through an interaction between a nearby androgen binding motif and the corresponding sex-specific hormone, with brain-specific expression due to an interaction between the top cis-eQTL and the enhancer region in the target tissue. We attributed additional variation across tissues to additional cis-eQTLs with tissue-specific effects (48); however, this was not comprehensively explored due to limitation of the data.
Finally, to understand the role of TSSD transcripts in complex traits and disease, we performed a qualitative assessment of our results by mapping the TSSD transcripts to genes identified in previous GWAS studies. Results from this analysis showed some evidence for a biological role of TSSD transcripts in complex traits and disease in relevant tissue types; however, a more comprehensive analysis is required to full appreciate the implications.
There are a few notable limitations to our study. First, the GTEx dataset used in this analysis provides the largest gene expression dataset in a wide range of tissues in many individuals. Despite this, however, the power to detect TSSD in gene expression is limited. As discussed in the Supplementary Notes, this limitation in power is primarily due to potentially over-adjusting our model to account for latent experimental confounding. Thus, the results from our analysis are highly conservative since we chose to prioritize the control of the false-positive rate over TSSD transcript discovery. Therefore, the TSSD transcripts identified in this analysis have relatively large effects of heterogeneity in sex difference effect sizes across tissues and suggest that larger and more appropriately designed studies may identify additional TSSD transcripts with more subtle effects. Second, we have implicitly assumed that the set of genetic variants (i.e. cis-eQTLs) identified in our analysis that reside in tissue-specific enhancer regions are causal variants that directly affect the expression of a target gene. While the true causal variant is not known (only tagged by SNPs in linkage disequilibrium), our analysis demonstrates that these sets of genetic variants that putatively affect gene expression are non-randomly distributed in the cis-regulatory region. Further, we do not explicitly test whether variation in gene transcription is a direct consequence of the interaction of AR and ER binding sites and these tissue-specific enhancers regions. Nevertheless, we provide a set of genetic loci and target genes for functional genomic follow-up studies in order to explore this further.
Conclusions
This analysis identifies transcripts that show heterogeneity in sex difference effect sizes across tissue types and points to a possible biological mechanism for autosomal tissue-specific sex difference gene regulation in humans.
Materials and Methods
The GTEx dataset
All individuals were deceased human donors. Informed consent was obtained for all individuals via next-of-kin consent for the collection and biobanking of de-identified, non-diseased, tissue samples (49). We restricted our analyses to 43 tissue samples from n = 617 individuals for which a sex covariate was available and within tissue sample size greater than 80. The fully-processed, normalized and filtered RNA-seq GTEx v7 data were downloaded from the GTEx Portal (https://www.gtexportal.org/home/datasets) along with corresponding covariate files. Additional covariates [e.g. cohort, age, body-mass-index, self-reported race and death classification (four-point Hardy Scale)] were obtained from dbGap (Accession phs000424.v7.p2) for sensitivity analysis. Briefly, for each tissue, gene expression normalization included filtering for transcripts with >0.1 TPM in at least 20% of samples and ≥6 raw read counts in at least 20% of samples, normalization between samples using TMM implemented in edgeR (50), and inverse quantile normalization across samples. Sample outliers were identified and excluded using a correlation-based statistic described in (51), and samples with less than 10 million mapped reads were excluded. Further details can be found in the GTEx documentation page (https://www.gtexportal.org/home/documentationPage). The number of individuals and expressed transcripts per tissue type can be found in Supplementary Material, Table S1.
Sex differences in gene expression
Differences in gene expression intensities across the sexes was examined in a linear regression framework where gene expression was modeled as a linear function of sex for each tissue, with males coded as 1 and females coded as 2. The model was adjusted for three genotyping principal components (PCs) and a total of 15 PEER factors for sample sizes n < 150, 30 PEER factors for sample sizes 150 ≤ n < 250, 45 PEER factors for sample sizes 250 ≤ n < 350 and 60 PEER factors for sample sizes n ≥ 350 in order to remove the effect of known and latent experimental confounders (37–39). Sex difference effect size was defined as the regression coefficient from the linear regression of gene expression on sex, where a negative value indicates that males have higher expression for the transcript compared to females, and a positive value indicates that females have higher expression for the transcript compared to males. We used the t-statistic to assess significance and calculated a P-value by comparing the test statistic to the t-distribution. Transcripts were deemed sex-differentially expressed if the P-value satisfied the within tissue Bonferroni corrected significance threshold, which corrects for the number of transcripts tested (see Supplementary Material, Table S1). We compared the sample size of each tissue to the number of sex-differentially expressed transcripts identified across tissue types and used the Cook’s distance test statistic to identify outliers using criteria described below.
To identify tissues that are highly influenced by the adjustment of PEER factors, we compared the sex difference effect size for each transcript before and after adjusting for all available PEER factors in each tissue, while controlling for three genotyping PCs. For each transcript, we used Cook’s distance to identify tissue outliers, which we defined as tissues with Cook’s distance test statistic greater than the median across all tissues plus four times the interquartile range. We recorded the proportion of transcripts that identify each of the 43 tissue types as an outlier.
TSSDs in gene expression
TSSDs in gene expression was examined with a mixed linear regression model implemented in the lmer function in R3.5.0 to test for heterogeneity in sex difference effect sizes for each transcript across tissue types. For each of the 30 640 autosomal and 1021 X-linked transcripts that were expressed in at least two tissue types, gene expression was modeled as
![]() |
(1) |
where y is a n × 1 vector of residual gene expression levels corrected for genotyping PCs and PEER factors as described above; μ is the mean expression levels; β1 and β2 are the regression coefficients for the fixed sex (with males coded as 1 and females coded as 2) and tissue type covariates, respectively; β3 is the regression coefficient for the fixed interaction effect between sex and tissue type; γ ∼ N () is a random intercept to account for repeated observations from the same individual; and e ∼ N (
) is the residual. We used the F statistic to assess significance of the sex-by-tissue interaction term and calculated the corresponding TSSD P-value using the lmerTest package in R (52). Transcripts were deemed TSSD if the sex-by-tissue interaction term satisfied the Bonferroni corrected significance threshold of PTSSD ≤ 1.58 × 10−6 [i.e. <0.05/(31 661)], which corrects for the number of transcripts tested. Transcripts with SSDs are transcripts that show differences in gene expression between the sexes, but not across tissues. These were defined as those with Psex < 1 × 10−4, Ptissue > 0.05 and PTSSD > 0.05 in Eq. (1).
Chromosomal enrichment for TSSD transcripts was determined by calculating the mean TSSD F statistic across all transcripts in each of the autosomal chromosomes. To assess significance, we used circular genomic permutations to obtain a distribution of mean TSSD F statistics for each chromosome (41). To do this, we ordered the 30 640 autosomal transcripts that are expressed in at least two tissue types according to their genomic position: first by chromosome, then by position on the chromosome. We considered the genome to be circular, ordered from chromosome 1 to chromosome 22 and restarting at chromosome 1 again. For each chromosome i, a random position, j, on the circular genome (i.e. a random number j between 1 and 30 640) was then chosen. We then rotated the first TSSD F statistic to position j, with all other TSSD F statistics rotating the same degree to the corresponding position on the genome, thus preserving the relative positions of the transcripts. We then calculated the mean TSSD F statistics across all transcripts in chromosome i. This was repeated 100 000 times. We calculated a P-value for each chromosome as the proportion of mean TSSD F statistics from the 100 000 permutations that were greater or equal than the observed mean TSSD F statistics.
We performed several sensitivity analyses to assess the robustness of our TSSD results. First, for each of the identified TSSD transcripts, we removed individual tissues from the dataset, in turn, to determine if the effect of the interaction term became non-significant (PTSSD > 0.01) upon removal of a single tissue from the analysis. Second, we adjusted Eq. (1) for potential confounders including cohort, which indicated whether the participant was a postmortem donor, an organ donor or a surgical donor; age, in years; race, as reported by the donor, family/next of kin or medical record abstraction; BMI; and death classification on the four-point Hardy Scale. Further details on covariate definitions can be found on dbGap (Accession phs000424.v7.p2). Third, to determine if the unequal proportions of males and females influenced our results (Supplementary Material, Fig. S1), we performed a cross-validation analysis for each of the identified TSSD transcripts, where male samples were randomly sampled so they match the sample size that of females within each tissue. TSSD was assessed in the same way as described above. This was performed 10 times and the mean TSSD F statistic was calculated across the 10 replicates and compared to the observed TSSD F statistic in the original analysis.
To determine enrichment of X-linked TSSD transcripts in the PAR of the X chromosome, we applied a Fisher’s exact test on a 2 × 2 contingency table, composed of two factors: whether the X-linked transcript is TSSD and whether the transcript is located in the PAR (PAR1, start: 60 001 bp, end: 2 699 520 bp; PAR2, start: 154 931 044 bp, end: 155 260 560 bp) of the X chromosome using the hg19 annotation. To determine enrichment of X-linked TSSD transcripts escaping X-inactivation, we downloaded annotations from (40). A total of 683 X-linked transcripts were available, where transcripts were classified as escape (82 transcripts), variable escape (89 transcripts), inactive (392 transcripts) or unknown (120 transcripts). We applied a Fisher’s exact test on a 2 × 2 contingency table composed of whether the X-linked transcript is TSSD and whether the transcript is annotated to escape X-inactivation or is inactive.
Investigating the cis-regulatory region of TSSD and SSD transcripts
Coordinates of AR binding motifs we obtained from JASPAR (53,54) and ER binding motifs from (33). We converted the genomic coordinates for the ER binding motifs from hg17 to hg19 using LiftOver tool on the UCSC genome browser (55). Summary single-tissue cis-eQTL data were downloaded from the GTEx Portal (https: //gtexportal.org/home/datasets, accessed 20 November 2017). Imputed signal tracks for histone ChIP-seq, DNase-seq, RNA-seq and methylation for 127 tissue types (56) were downloaded from the Roadmap Epigenomics Project (http://egg2.wustl.edu/roadmap/web_portal/imputed.html, accessed 3 February 2017) (43). These 127 tissue types are shown in the Supplementary Material, Table S3.
To prioritize genetic variants that affect the expression of the TSSD and SSD transcripts, we identified top cis-eQTLs among all 40 tissue types for each of the transcripts. We determined the distance from the base pair position of the top cis-eQTLs to the center base pair position [i.e. (end position + start position)/2] of the nearest AR and ER binding motif on the same chromosome and asked whether the top cis-eQTLs were located inside (i.e. within the start and end positions) of the motif. For top cis-eQTLs not inside AR or ER binding motifs, we calculated the distance to the nearest AR or ER binding motif and tested for mean difference between TSSD versus SSD top cis-eQTLs. We applied a log transformation before using a t-test to determine statistical significance, which was defined as P < 0.05. Next, we used data from the Roadmap Epigenome Project (43) to determine the distance of the top cis-eQTLs to the center of the nearest enhancer, which is defined in (56) as active enhancer 1, EnhA1; active enhancer 2, Enha2; active enhancer flank, EnhAF; weak enhancer 1, EnhW1; weak enhancer 2, EnhW2; and enhancer acetylation, EnhAc, across the 127 Roadmap Epigenome tissue types (see Supplementary Material, Table S3). Similar to the previous analysis, for each of the 127 Roadmap Epigenome tissue types, we determined the distance from the base pair position of the top cis-eQTLs to the center base pair position [i.e. (end position + start position)/2] of the nearest enhancer on the same chromosome and asked whether the top cis-eQTLs were located inside (i.e. within the start and end positions) the enhancer. For top cis-eQTLs located inside an enhancer, we counted the total number of Roadmap Epigenome tissue types the enhancer region was found in. We tested for homogeneity of variances for the number of tissue types the enhancer region was found in for TSSD top cis-eQTLs and SSD top cis-eQTLs using the Fligner–Killeen test, a non-parametric test that is robust against departures from normality. A small variance or a variance of zero indicates that the top cis-eQTLs are in enhancer regions that are relatively stable across the tissue types, while a large variance indicates that enhancers region are more dispersed across tissue types and represents more tissue specificity. Statistical significance was defined as P < 0.05. For top cis-eQTLs not located inside an enhancer region, we determined the nearest enhancer across the 127 Roadmap Epigenome tissue types. We tested for mean difference in distances using a t-test between TSSD top cis-eQTLs and the SSD top cis-eQTLs after applying a log transformation. Statistical significance was defined as P < 0.05.
Autosomal TSSD transcripts and complex disease
To understand the role of TSSD transcripts in complex traits and disease, we performed a qualitative assessment of our results by downloading the GWAS Catalog from (https://www.ebi.ac.uk/gwas/docs/file-downloads, accessed 1 March 2019) (45). A total of 107 784 entries were available but were filtered down to 65 093 entries with reported P-value for the strongest SNP risk allele of P < 5 × 10−8. Autosomal TSSD transcripts were then mapped to trait labels using the Entrez Gene ID (i.e. ‘SNP GENE IDS’ column). We combined trait labels, ‘Educational attainment (MTAG)’, ‘Educational attainment (years of education)’, ‘Highest math class taken’, ‘Highest math class taken (MTAG)’, ‘Self-reported math ability’, ‘Self-reported math ability (MTAG)’ into an ‘Intelligence’ category; ‘Monocyte count’, ‘Monocyte percentage of white cells’, ‘White blood cell count’ into a ‘Blood cell counts’ category; and ‘Waist circumference adjusted for body mass index’, ‘Waist-to-hip ratio adjusted for BMI’, ‘Waist-to-hip ratio adjusted for BMI (adjusted for smoking behaviour)’, ‘Waist-to-hip ratio adjusted for BMI x sex x age interaction (4df test)’, ‘Waist-to-hip ratio adjusted for body mass index’ into a ‘Waist-to-hip ratio adjusted for body mass index’ category.
Availability of data
The data used for the analyses described in this manuscript were obtained from the GTEx Portal (https://www.gtexportal.org/home/datasets) and dbGaP (Accession phs000424.v7.p2) and Roadmap Epigenomics Project (http://egg2.wustl.edu/roadmap/web_portal/imputed.html).
Conflict of Interest statement. The authors declare no competing financial interests.
Funding
The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), National Heart, Lung, and Blood Institute (NHLBI), National Institute on Drug Abuse (NIDA), National Institute of Mental Health (NIMH), and National Institute of Neurological Disorders and Stroke (NINDS); National Health and Medical Research Council (NHMRC) (program grant 1113400; Fellowship Scheme 1078037 to P.M.V. and 1078399 to A.F.M.); Australian Research Council (ARC) (Discovery grant DP160101343; Future Fellowship FT180100186 to J.Y.). Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under Award Number P01GM099568. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Supplementary Material
References
- 1. Khramtsova E.A., Davis L.K. and Stranger B.E. (2019) The role of sex in the genomics of human complex traits. Nat. Rev. Genet., 20, 173–190. [DOI] [PubMed] [Google Scholar]
- 2. Ngo S.T., Steyn F.J. and McCombe P.A. (2014) Gender differences in autoimmune disease. Front. Neuroendocrinol., 35, 347–369. [DOI] [PubMed] [Google Scholar]
- 3. Naugler W.E., Sakurai T., Kim S., Maeda S., Kim K., Elsharkawy A.M. and Karin M. (2007) Gender disparity in liver cancer due to sex differences in MyD88-dependent IL-6 production. Science, 317, 121–124. [DOI] [PubMed] [Google Scholar]
- 4. Cohn B.A., Wingard D.L., Cirillo P.M., Cohen R.D., Reynolds P. and Kaplan G.A. (1996) Differences in lung cancer risk between men and women: examination of the evidence. J. Natl. Cancer Inst., 88, 1867–1868. [DOI] [PubMed] [Google Scholar]
- 5. Lerner D.J. and Kannel W.B. (1986) Patterns of coronary heart disease morbidity and mortality in the sexes: a 26-year follow-up of the Framingham population. Am. Heart J., 111, 383–390. [DOI] [PubMed] [Google Scholar]
- 6. Davies W. (2014) Sex differences in attention deficit hyperactivity disorder: candidate genetic and endocrine mechanisms. Front. Neuroendocrinol., 35, 331–346. [DOI] [PubMed] [Google Scholar]
- 7. Li R. and Singh M. (2014) Sex differences in cognitive impairment and Alzheimer’s disease. Front. Neuroendocrinol., 35, 385–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gillies G.E., Pienaar I.S., Vohra S. and Qamhawi Z. (2014) Sex differences in Parkinson’s disease. Front. Neuroendocrinol., 35, 370–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Schaafsma S.M. and Pfaff D.W. (2014) Etiologies underlying sex differences in autism spectrum disorders. Front. Neuroendocrinol., 35, 255–271. [DOI] [PubMed] [Google Scholar]
- 10. Ober C., Loisel D.A. and Gilad Y. (2008) Sex-specific genetic architecture of human disease. Nat. Rev. Genet., 9, 911–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ellegren H. and Parsch J. (2007) The evolution of sex-biased genes and sex-biased gene expression. Nat. Rev. Genet., 8, 689–698. [DOI] [PubMed] [Google Scholar]
- 12. Parsch J. and Ellegren H. (2013) The evolutionary causes and consequences of sex-biased gene expression. Nat. Rev. Genet., 14, 83–87. [DOI] [PubMed] [Google Scholar]
- 13. Rinn J.L. and Snyder M. (2005) Sexual dimorphism in mammalian gene expression. Trends Genet., 21, 298–305. [DOI] [PubMed] [Google Scholar]
- 14. Mel’e M., Ferreira P.G., Reverter F., DeLuca D.S., Monlong J., Sammeth M., Young T.R., Goldmann J.M., Pervouchine D.D., Sullivan T.J. et al. (2015) The human transcriptome across tissues and individuals. Science, 348, 660–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Mayne B.T., Bianco-Miotto T., Buckberry S., Breen J., Clifton V., Shoubridge C. and Roberts C.T. (2016) Large scale gene expression meta-analysis reveals tissue-specific, sex-biased gene expression in humans. Front. Genet., 7, 183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Gershoni M. and Pietrokovski S. (2017) The landscape of sex-differential transcriptome and its consequent selection in human adults. BMC Biol., 15, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Seo M., Caetano-Anolles K., Rodriguez-Zas S., Ka S., Jeong J.Y., Park S., Kim M.J., Nho W.G., Cho S., Kim H. et al. (2016) Comprehensive identification of sexually dimorphic genes in diverse cattle tissues using RNA-seq. BMC Genomics, 17, 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Yang X., Schadt E.E., Wang S., Wang H., Arnold A.P., Ingram-Drake L., Drake T.A. and Lusis A.J. (2006) Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Res., 16, 995–1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Huby R.D.J., Glaves P. and Jackson R. (2014) The incidence of sexually dimorphic gene expression varies greatly between tissues in the rat. PLoS One, 9, e115792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Carrel L. and Willard H.F. (2005) X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature, 434, 400–404. [DOI] [PubMed] [Google Scholar]
- 21. Kukurba K.R., Parsana P., Balliu B., Smith K.S., Zappala Z., Knowles D.A., Fav’e M.J., Davis J.R., Li X., Zhu X., Potash J.B. et al. (2016) Impact of the X chromosome and sex on regulatory variation. Genome Res., 26, 768–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Trabzuni D., Ramasamy A., Imran S., Walker R., Smith C., Weale M.E., Hardy J., Ryten M. and North American Brain Expression Consortium (2013) Widespread sex differences in gene expression and splicing in the adult human brain. Nat. Commun., 4, 2771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Yao C., Joehanes R., Johnson A.D., Huan T., Esko T., Ying S., Freedman J.E., Murabito J., Lunetta K.L., Metspalu A. et al. (2014) Sex- and age-interacting eQTLs in human complex diseases. Hum. Mol. Genet., 23, 1947–1956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Dimas A.S., Nica A.C., Montgomery S.B., Stranger B.E., Raj T., Buil A., Giger T., Lappalainen T., Gutierrez-Arcelus M., MuTHER Consortium et al. (2012) Sex-biased genetic effects on gene regulation in humans. Genome Res., 22, 2368–2375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Kassam I., Lloyd-Jones L., Holloway A., Small K.S., Zeng B., Bakshi A., Metspalu A., Gibson G., Spector T.D., Esko T. et al. (2016) Autosomal genetic control of human gene expression does not differ across the sexes. Genome Biol., 17, 248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Hartiala J.A., Tang W.H.W., Wang Z., Crow A.L., Stewart A.F.R., Roberts R., McPherson R., Erdmann J., Willenborg C., Hazen S.L. et al. (2016) Genome-wide association study and targeted metabolomics identifies sex-specific association of CPS1 with coronary artery disease. Nat. Commun., 7, 10558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ammerpohl O., Bens S., Appari M., Werner R., Korn B., Drop S.L.S., Verheijen F., van der Zwan Y., Bunch T., Hughes I. et al. (2013) Androgen receptor function links human sexual dimorphism to DNA methylation. PLoS One, 8, e73288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Lahita R.G., Bradlow L., Fishman J. and Kunkel H.G. (1982) Estrogen metabolism in systemic lupus erythematosus: patients and family members. Arthritis Rheum., 25, 843–846. [DOI] [PubMed] [Google Scholar]
- 29. Folomeev M., Dougados M., Beaune J., Kouyoumdjian J.C., Nahoul K., Amor B. and Alekberova Z. (1992) Plasma sex hormones and aromatase activity in tissues of patients with systemic lupus erythematosus. Lupus, 1, 191–195. [DOI] [PubMed] [Google Scholar]
- 30. Sànchez-Guerrero J., Liang M.H., Karlson E.W., Hunter D.J. and Colditz G.A. (1995) Postmenopausal estrogen therapy and the risk for developing systemic lupus erythematosus. Ann. Intern. Med., 122, 430–433. [DOI] [PubMed] [Google Scholar]
- 31. Lage K., Hansen N.T., Karlberg E.O., Eklund A.C., Roque F.S., Donahoe P.K., Szallasi Z., Jensen T.S. and Brunak S. (2008) A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc. Natl. Acad. Sci. U. S. A., 105, 20870–20875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Greene C.S., Krishnan A., Wong A.K., Ricciotti E., Zelaya R.A., Himmelstein D.S., Zhang R., Hartmann B.M., Zaslavsky E., Sealfon S.C. et al. (2015) Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet., 47, 569–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Carroll J.S., Meyer C.A., Song J., Li W., Geistlinger T.R., Eeckhoute J., Brodsky A.S., Keeton E.K., Fertuck K.C., Hall G.F. et al. (2006) Genome-wide analysis of estrogen receptor binding sites. Nat. Genet., 38, 1289–1297. [DOI] [PubMed] [Google Scholar]
- 34. Cheng Y., Yu P., Duan X., Liu C., Xu S., Chen Y., Tan Y., Qiang Y., Shen J. and Tao Z. (2015) Genome-wide analysis of androgen receptor binding sites in prostate cancer cells. Exp. Ther. Med., 9, 2319–2324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Georgatsou E., Bourgarel P. and Meo T. (1993) Male-specific expression of mouse sex-limited protein requires growth hormone, not testosterone. Proc. Natl. Acad. Sci. U. S. A., 90, 3626–3630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Nelson S.A. and Robins D.M. (1997) Two distinct mechanisms elicit androgen-dependent expression of the mouse sex-limited protein gene. Mol. Endocrinol., 11, 460–469. [DOI] [PubMed] [Google Scholar]
- 37. Stegle O., Parts L., Piipari M., Winn J. and Durbin R. (2012) Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc., 7, 500–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Stegle O., Parts L., Durbin R. and Winn J. (2010) A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol., 6, e1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Parts L., Stegle O., Winn J. and Durbin R. (2011) Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet., 7, e1001276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Tukiainen T., Villani A.C., Yen A., Rivas M.A., Marshall J.L., Satija R., Aguirre M., Gauthier L., Fleharty M., Kirby A. et al. (2017) Landscape of X chromosome inactivation across human tissues. Nature, 550, 244–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Cabrera C.P., Navarro P., Huffman J.E., Wright A.F., Hayward C., Campbell H., Wilson J.F., Rudan I., Hastie N.D., Vitart V. et al. (2012) Uncovering networks from genome-wide association studies via circular genomic permutation. G3 (Bethesda), 2, 1067–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Herzing L.B., Romer J.T., Horn J.M. and Ashworth A. (1997) Xist has properties of the X-chromosome inactivation Centre. Nature, 386, 272–275. [DOI] [PubMed] [Google Scholar]
- 43. Roadmap Epigenomics Consortium, Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J. et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Steele M.P., Luna L.G., Coldren C.D., Murphy E., Hennessy C.E., Heinz D., Evans C.M., Groshong S., Cool C., Cosgrove G.P. et al. (2015) Relationship between gene expression and lung function in idiopathic interstitial pneumonias. BMC Genomics, 16, 869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E. et al. (2019) The NHIGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics. Nucleic Acids Res., 47, D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Berletch J.B., Ma W., Yang F., Shendure J., Noble W.S., Disteche C.M. and Deng X. (2015) Escape from X inactivation varies in mouse tissues. PLoS Genet., 11, e1005079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Carrel L. and Willard H.F. (1999) Heterogeneous gene expression from the inactive X chromosome: an X-linked gene that escapes X inactivation in some human cell lines but is inactivated in others. Proc. Natl. Acad. Sci. U. S. A., 96, 7364–7369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Fu J., Wolfs M.G.M., Deelen P., Westra H.J., Fehrmann R.S.N., Te Meerman G.J., Buurman W.A., Rensen S.S.M., Groen H.J.M., Weersma R.K. et al. (2012) Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLoS Genet., 8, e1002431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Consortium G.T.E. (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Robinson M.D. and Oshlack A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol., 11, R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Wright F.A., Sullivan P.F., Brooks A.I., Zou F., Sun W., Xia K., Madar V., Jansen R., Chung W., Zhou Y.H. et al. (2014) Heritability and genomics of gene expression in peripheral blood. Nat. Genet., 46, 430–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kuznetsova A., Brockhoff P.B. and Christensen R.H.B. (2017) lmerTest package: tests in linear mixed effects models. J. Stat. Software, 82, 1–26. [Google Scholar]
- 53. Mathelier A., Zhao X., Zhang A.W., Parcy F., Worsley-Hunt R., Arenillas D.J., Buchman S., Chen C.Y., Chou A., Ienasescu H. et al. (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res., 42, D142–D147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Sandelin A., Alkema W., Engstr¨om P., Wasserman W.W. and Lenhard B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res., 32, D91–D94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Hinrichs A.S., Karolchik D., Baertsch R., Barber G.P., Bejerano G., Clawson H., Diekhans M., Furey T.S., Harte R.A., Hsu F. et al. (2006) The UCSC genome browser database: update 2006. Nucleic Acids Res., 34, D590–D598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Ernst J. and Kellis M. (2015) Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol., 33, 364–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used for the analyses described in this manuscript were obtained from the GTEx Portal (https://www.gtexportal.org/home/datasets) and dbGaP (Accession phs000424.v7.p2) and Roadmap Epigenomics Project (http://egg2.wustl.edu/roadmap/web_portal/imputed.html).