Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Oct 6.
Published in final edited form as: In Silico Biol. 2009;9(3):55–63.

Gene set enrichment analyses revealed differences in gene expression patterns between males and females

Wei Zhang 1, R Stephanie Huang 1, Shiwei Duan 1, M Eileen Dolan 1,2,3,*
PMCID: PMC2757739  NIHMSID: NIHMS101229  PMID: 19795565

Abstract

Men and women differ not only in their physical attributes and reproductive functions but also in many other characteristics, including the risks for some diseases as well as response to certain therapeutic treatments. Though genetically-identical for autosomal chromosomes, males and females could have gender-specific transcriptional or translational regulation, leading to differential mRNAs or protein products for some genes. To illustrate the gender-specific differences in mRNA-level expression, we compared gene expression patterns between males and females using a whole-genome microarray dataset on the unrelated HapMap lymphoblastoid cell lines derived from individuals of European (58 individuals) and African (59 individuals) ancestry. We applied the Gene Set Enrichment Analysis to identify any overrepresented predefined gene sets in either men or women. Distinct patterns of upregulation and downregulation of certain chromosomal regions and other gene sets such as targets for certain microRNAs and transcription factors were identified in males or females, suggesting their potential roles in defining the gender-specific phenotypes. Gender-specific patterns of gene expression also appeared to be different between these two populations.

Keywords: gene expression, gender difference, HapMap, lymphoblastoid cell lines, gene set enrichment

Introduction

Men and women are different physically and physiologically in their reproductive systems, hormones, muscle mass and bone structure [1]. When it comes to health issues, gender has also been shown to play an important role. For example, besides gender-specific diseases (e.g. prostate cancer or ovarian cancer), males and females have different susceptibility to many common diseases such as diabetes, heart attack and certain cancers [24]. Males and females may also respond differently to the same therapeutic treatment [5].

In the past few years, gene expression as a quantitative phenotype has been shown to be varied within the same population [6, 7] and among geographically-separated populations [8]. For males and females, although the genetic information carried in autosomes is identical, there could be gender-specific transcription or translation of genes through mechanisms not based on DNA sequence variation. Differences in gene expression, in turn could contribute to the health disparities between males and females. For example, women have more CYP3A [9], a key phase I metabolism enzyme, which modulates the effectiveness of drugs in women. Using a public microarray expression dataset (~4,000 expressed genes), we previously identified a handful of genes (<0.5%) with gender-specific expression in a panel of lymphoblastoid cell lines (LCLs) derived from individuals of European ancestry from Utah, USA, collected by Centre d' Etude du Polymorphisme Humain (CEPH) [10], suggesting that gender differences in gene expression may be modest for the majority of genes at the mRNA level. Therefore, in addition to using a conventional statistical approach (e.g. t-test) to identify individual gender-specific genes, in the present work, we applied the Gene Set Enrichment Analysis (GSEA) [11, 12], which is a knowledge-based approach for interpreting genome-wide expression profiles [13]. GSEA allowed us to identify gender-specific gene sets using the Affymetrix Human Exon 1.0ST array (exon array) dataset that we previously generated on the HapMap (http://www.hapmap.org/) [14] CEU (CEPH people from Utah, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples [15]. We also compared the gender-specific gene sets between the two populations.

Methods

Gene expression data

We previously generated a dataset of gene expression using the Affymetrix GeneChip Human Exon 1.0 ST array on 87 CEU (CEPH people from Utah, USA) and 89 YRI (Yoruba people from Ibadan, Nigeria) samples (Gene Expression Omnibus, GEO accession: GSE7851, http://www.ncbi.nlm.nih.gov/projects/geo/). The details for cell line preparation, RNA isolation, chip hybridization and signal normalization/summarization were described in Zhang et al. [15]. The same set of 9,156 transcript clusters deemed to be reliably expressed [15] was used in this study. To avoid ambiguity and chip cross-hybridization, our analysis set was further limited to the 8,443 transcript clusters with unique gene identity annotations retrieved from the Affymetrix NetAffx website (http://www.affymetrix.com/analysis/index.affx) (NCBI build 36). The unrelated HapMap samples (58 CEU parents: 29 males and 29 females; 59 YRI parents: 29 males and 30 females) were used in further analyses.

Identifying gender-specific genes individually

Two-tail t-test assuming unequal variance was used to compare log2-transformed expression levels of each gene between males and females in each population separately. The nominal P-values were adjusted by the Benjamini-Hochberg (B-H) correction for false discovery rate (FDR) [16] (Padjusted < 0.05).

Gene set enrichment analyses

The GSEA (GSEA Molecular Signatures Database, released December 14, 2007; http://www.broad.mit.edu/gsea/) [11, 12] was used to evaluate the differences in expression patterns between males and females in the unrelated CEU and YRI samples, separately. The GSEA method focuses on gene sets, which are defined as groups of genes that share common biological function, chromosomal location or regulation. The goal of GSEA is to determine whether the members of a gene set S are randomly distributed throughout the entire reference gene list L or primarily found at the top or bottom [11]. We expect that gender-specific gene sets will tend to show a non-random distribution. The GSEA software features more than 3,300 predefined genes sets under four categories: positional gene sets (C1), motif gene sets (C2), curated gene sets (C3) and computational gene sets (C4) [11]. In this study, we tested three categories: C1, C2 and C3. Particularly, only sets of human genes (min. size = 10, max. size = 200) were included in our analyses (permutation number = 1,000). The QVALUE-estimated [17] FDR of 25% was then used to select overrepresented gene sets.

Results

Identifying individual gender-specific genes

At Padjusted < 0.05, eight chromosome X-linked genes were differentially expressed between males and females in the CEU samples. In contrast, 12 gender-specific genes (11 chromosome X-linked and 1 on chromosome 16) were identified in the YRI samples (Fig. 1). Using a looser cutoff (Padjusted < 0.50), three more chromosome X-linked genes (EIF2S3, UBE1, RPS4X) were identified in the CEU samples, while four more chromosome X-linked (PCTK1, ARD1A, FUNDC1, STS) and two more autosomal genes (NUCB2, LOC401127) could be identified in the YRI samples (Supplemental Table 1).

Figure 1.

Figure 1

Differentially expressed genes between males and females (Padjusted < 0.05). Eight chromosome X-linked genes were differentially expressed in both populations. Four more genes (3 chromosome X-linked and 1 on chromosome 16) were also differentially expressed in the YRI samples. CEU: CEPH people form Utah, USA; YRI: Yoruba people form Ibadan, Nigeria.

Gene set enrichment analyses

Tab. 1 lists the summary of enriched gene sets. Among the 206 positional gene sets (C1), CHRXP11 and CHRXP22 of chromosome X were enriched with higher expression in both the CEU and YRI females (FDR < 25%). In addition, the CEU females had three more enriched cytobands on autosomes. No enriched cytobands were found in the YRI males, while 16 cytobands were enriched in the CEU males with higher expression. Supplemental Table 2 lists the enriched gender-specific cytobands. Among the 754 motif gene sets (C2), no enriched motifs were found in the CEU males or females. In contrast, two motifs including one for microRNA mir-524 and mir-525 targets were enriched in the YRI males; 15 motifs including transcription factor binding sites and two microRNA targets were enriched in the YRI females. Supplemental Table 3 lists the enriched gender-specific motifs in the YRI samples. Among the 731 curated gene sets (C3), no enriched curated gene sets were found in the male samples. One gene set, "DISTECHE_XINACTIVATED_GENES", was commonly enriched in the females of both populations. In addition, 16 more curated gene sets were enriched in the YRI females. Supplemental Table 4 lists the enriched curated gene sets in females.

Table 1.

Summary of enriched gender-specific gene sets (FDR<25%).

Males Females
Population C1 C2 C3 C1 C2 C3
CEU 16 0 0 5 0 1
YRI 0 2 0 2 15 17

C1: Positional gene sets

C2: Motif gene sets

C3: Curated gene sets

Discussion

Gene-expression variation may play a significant role in gender-specific health disparities, probably through upregulating or downregulating genes within physiological pathways. Using a conventional approach (e.g. t-test), the proportion of gender-specific genes appears to be small (<0.5%) in some recent whole-genome studies on LCLs [10] and peripheral leukocytes [18], suggesting that the transcriptional differences could be modest for the majority of genes between genders. Our analysis using the exon array data [15] also supported this observation. Of the >8,400 genes, eight and twelve genes were gender-specific in the CEU and YRI samples, separately (Padjusted < 0.05) (Fig. 1). Among them, only MT1E (metallothionein 1E) in the YRI samples is on an autosome, while the remaining genes are all chromosome X-linked, probably due to the escape of X-inactivation [19]. The overexpression of MT1E in the YRI females (P = 3.81 × 10−5, Padjusted = 0.029) could be an interesting finding, because it has been shown that metallothionein (MT) can be down-regulated by estrogen [20]. Furthermore, MT protein overexpression is frequently encountered in a number of human primary tumors including renal cell carcinoma (RCC) [21] and has been shown to be correlated with disease progression. Therefore, the better survival rate found in female RCC patients could have a molecular basis due to the differences in gene expression.

Still, one drawback associated with individual-gene analysis is that the stringent statistical cutoff for adjusting thousands of multiple tests may result in missing biologically meaningful genes. Since a biological process often affects sets of genes simultaneously, the contribution of an individual gene can be modest, but the combined effect may be synergistic or additive. Therefore, we applied the GSEA method [11] which focuses on the predefined gene sets to evaluate the gender-specific expression patterns.

The positional category C1 is comprised of gene sets corresponding to each cytoband that has at least one gene. These gene sets are helpful in identifying effects related to chromosomal deletions or amplifications, dosage compensation, epigenetic silencing and other regional effects. At a more stringent significance level (Padjusted < 0.10), in both the CEU and YRI samples, two X-linked cytobands: CHRXP11 and CHRXP22 were found overexpressed in females (Supplemental Table 2). Though the enrichment of chromosome X-linked genes suggests the validity of the approach and dataset, it could be due to the escape of X-inactivation, which has been shown to be extensive (~35% X-linked genes) in humans [19]. In addition, three autosomal cytobands: CHR1P36, CHR2Q37 and CHR7P13 were found downregulated in the CEU females. If clustered by gene families, the common gene family among the three cytobands was transcription factors (21 genes on CHR1P36, 4 genes on CHR2Q37 and 1 gene on CHR7P13), suggesting their roles in regulating gender-specific gene expression.

The motif category C2 is comprised of gene sets that represent known or likely regulatory elements in promoters and 3'- UTRs (untranslated regions) [22]. These gene sets make it possible to link changes in a microarray experiment to a conserved, putative cis-regulatory element [11]. For the CEU samples, there were no enriched C2 gene sets in either males or females (Supplemental Table 3). For the YRI males, the genes of the gene set "CDPCR3", defined by genes with promoter regions containing the motif that matches the annotation for CUTL1 (cut-like homeobox 1), a transcription factor, were overexpressed in males. Another overexpressed gene set in the YRI males is comprised of genes targeted by mir-524 and mir-525. The majority of the genes in these two gene sets have transcription factor activities (e.g. 21 of 57 genes in the "CDPCR3" set), indicating the regulatory roles of these particular transcription factors in the overall gender-specific phenotypic differences. In contrast, some gene sets targeted by different transcription factors and microRNAs were also overexpressed in the YRI females, though further functional validation of these genes are necessary to clarify their biological functions.

The GSEA curated gene sets were collected from various sources such as Gene Ontology [23], and knowledge of domain experts [11]. For both populations, no gene sets were overrepresented in males (Supplemental Table 4). Not surprisingly, the set containing genes that escape X-inactivation "DISTECHE_XINACTIVATED_GENES" was overrepresented in the female samples. In addition, many cancer or disease-related gene sets were enriched in the YRI females. The most significant overrepresented gene set "CANCER_NEOPLASTIC_META_UP" (Padjusted = 0.03) is comprised of 67 genes commonly upregulated in cancer [24]. Our observation that gender-specific expression patterns might also differ between the two populations could provide new insights into the relationship between gene expression and ancestry-related health disparities [25]. Though our findings suggest the possible mechanisms responsible for phenotypic gender differences (e.g. through certain transcription factors and microRNAs) and provide a list of overrepresented or underrepresented gene sets, a systems biology approach will be necessary to illustrate the complete picture of the interplay of these gene sets or pathways in the future.

One limitation of this study is that the LCLs represent only one tissue type, while many pharmacology- or health-related genes have distinct tissue-specific expression patterns (e.g. UGT1A isoforms [26]). Therefore, comparing expression patterns in other tissues could provide a more comprehensive view of gender differences and could be more relevant to specific diseases. Notably, because the CEU samples were collected and transformed decades earlier than the YRI samples, the bias due to cell line collection time could be a potential confounding factor when interpreting the results of population differences. In addition, there are some limitations of the GSEA algorithm such as those proposed in Dinu et al. [27]. We expect that with the advances of the GSEA tools and our knowledge of gene function, novel enriched gene sets between genders may be discovered in the near future.

Supplementary Material

Supplemental tables

Acknowledgements

The authors wish to thank Dr. Nancy Cox for helpful discussion. This Pharmacogenetics of Anticancer Agents Research Group study was supported by NIH/NIGMS grant GM61393 and NIH/NCI Breast SPORE P50 CA125183. Data deposits associated with this research are supported by GM61374. The authors are grateful to T. A. Clark, T. X. Chen, A. C. Schweitzer and J. E. Blume (Expression Research, Affymetrix Laboratory, Affymetrix Inc., Santa Clara, CA 94706) for their contribution in generating the Exon Array data.

References

  • 1.Federman DD. The biology of human sex differences. N. Engl. J. Med. 2006;354:1507–1514. doi: 10.1056/NEJMra052529. [DOI] [PubMed] [Google Scholar]
  • 2.Kaminsky Z, Wang SC, Petronis A. Complex disease, gender and epigenetics. Ann. Med. 2006;38:530–544. doi: 10.1080/07853890600989211. [DOI] [PubMed] [Google Scholar]
  • 3.Rivera MP, Stover DE. Gender and lung cancer. Clin. Chest Med. 2004;25:391–400. doi: 10.1016/j.ccm.2004.01.006. [DOI] [PubMed] [Google Scholar]
  • 4.Hensler P. Viewpoint: Gender differences in heart disease. Circulation. 2006;114:f170–f171. [PubMed] [Google Scholar]
  • 5.Huang RS, Kistner EO, Bleibel WK, Shukla SJ, Dolan ME. Effect of population and gender on chemotherapeutic agent-induced cytotoxicity. Mol. Cancer Ther. 2007;6:31–36. doi: 10.1158/1535-7163.MCT-06-0591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG. Genetic analysis of genome-wide variation in human gene expression. Nature. 2004;430:743–747. doi: 10.1038/nature02797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Duan S, Huang RS, Zhang W, Bleibel WK, Roe CA, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME. Genetic architecture of transcript-level variation in humans. Am. J. Hum. Genet. 2008;82:1101–1113. doi: 10.1016/j.ajhg.2008.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang W, Ratain MJ, Dolan ME. The HapMap resource is providing new insights into ourselves and its application to pharmacogenomics. Bioinform. Biol. Insights. 2008;2:15–23. doi: 10.4137/bbi.s455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Paine MF, Ludington SS, Chen M-L, Stewart PW, Huang S-M, Watkins PB. Do men and women differ in proximal small intestinal CYP3A or P-glycoprotein expression? Drug Metab. Dispos. 2005;33:426–433. doi: 10.1124/dmd.104.002469. [DOI] [PubMed] [Google Scholar]
  • 10.Zhang W, Bleibel WK, Roe CA, Cox NJ, Dolan ME. Gender-specific differences in expression in human lymphoblastoid cell lines. Pharmacogenet. Genomics. 2007;17:447–450. doi: 10.1097/FPC.0b013e3280121ffe. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
  • 13.Nam D, Kim S-Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 2008;9:189–197. doi: 10.1093/bib/bbn001. [DOI] [PubMed] [Google Scholar]
  • 14.Hui J, Hung L-H, Heiner M, Schreiner S, Neumüller N, Reither G, Haas SA, Bindereif A. Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J. 2005;24:1988–1998. doi: 10.1038/sj.emboj.7600677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME. Evaluation of genetic variation contributing to differences in gene expression between populations. Am. J. Hum. Genet. 2008;82:631–640. doi: 10.1016/j.ajhg.2007.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B. 1995;57:289–300. [Google Scholar]
  • 17.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Idaghdour Y, Storey JD, Jadallah SJ, Gibson G. A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet. 2008;4:e1000052. doi: 10.1371/journal.pgen.1000052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–404. doi: 10.1038/nature03479. [DOI] [PubMed] [Google Scholar]
  • 20.Calaf GM, Roy D. Human drug metabolism genes in parathion-and estrogen-treated breast cells. Int. J. Mol. Med. 2007;20:875–881. doi: 10.3892/ijmm.20.6.875. [DOI] [PubMed] [Google Scholar]
  • 21.Nguyen A, Jing Z, Mahoney PS, Davis R, Sikka SC, Agrawal KC, Abdel-Mageed AB. In vivo gene expression profile analysis of metallothionein in renal cell carcinoma. Cancer Lett. 2000;160:133–140. doi: 10.1016/s0304-3835(00)00534-6. [DOI] [PubMed] [Google Scholar]
  • 22.Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature. 2005;434:338–345. doi: 10.1038/nature03441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad. Sci. USA. 2004;101:9309–9314. doi: 10.1073/pnas.0401994101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang W, Dolan ME. Ancestry-related differences in gene expression: findings may enhance understanding of health disparities between populations. Pharmacogenomics. 2008;9:489–492. doi: 10.2217/14622416.9.5.489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang W, Liu W, Innocenti F, Ratain MJ. Searching for tissue-specific expression pattern-linked nucleotides of UGT1A isoforms. PLoS ONE. 2007;2:e396. doi: 10.1371/journal.pone.0000396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007;8:242. doi: 10.1186/1471-2105-8-242. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental tables

RESOURCES