Abstract
Background
By studying genome-wide expression patterns in healthy and diseased tissues across a wide range of pathophysiological conditions, DNA microarrays have revealed unique insights into complex diseases. However, the high-dimensionality of microarray data makes interpretation of heterogeneous gene expression studies inherently difficult.
Results
Using a large-scale analysis of more than 40 microarray studies encompassing ~2400 mammalian tissue samples, we identified a common theme across heterogeneous microarray studies evident by a robust genome-wide inverse regulation of metabolic and cell signaling pathways: We found that upregulation of cell signaling pathways was invariably accompanied by downregulation of cell metabolic transcriptional activity (and vice versa). Several findings suggest that this characteristic gene expression pattern represents a new principle of mammalian transcriptional regulation. First, this coordinated transcriptional pattern occurred in a wide variety of physiological and pathophysiological conditions and was identified across all 20 human and animal tissue types examined. Second, the differences in metabolic gene expression predicted the magnitude of differences for signaling and all other pathways, i.e. tissue samples with similar expression levels of metabolic transcripts did not show any differences in gene expression for all other pathways. Third, this transcriptional pattern predicted a profound effect on the proteome, evident by differences in structure, stability and post-translational modifications of proteins belonging to signaling and metabolic pathways, respectively.
Conclusions
Our data suggest that in a wide range of physiological and pathophysiological conditions, gene expression changes exhibit a recurring pattern along a transcriptional axis, characterized by an inverse regulation of major metabolic and cell signaling pathways. Given its widespread occurrence and its predicted effects on protein structure, protein stability and post-translational modifications, we propose a new principle for transcriptional regulation in mammalian biology.
Background
Transcriptional profiling by DNA microarrays allows the simultaneous quantitative analysis of tens of thousands of transcripts in a single experiment. By applying transcriptional profiling technology to healthy and diseased tissues across a wide range of pathophysiological conditions, DNA microarrays have revealed unique insights into complex disease patterns. However, the high-dimensionality of microarray data makes interpretation of heterogeneous gene expression studies inherently difficult. One of the main challenges in the analysis of microarray data is to identify common underlying biological themes by integrating multiple similar experiments. A frequent approach to this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis by grouping them into pathways.
In a previous study examining failing and non-diseased dog hearts, we observed an intriguing reciprocal transcriptional regulation of selected cell signaling and metabolic processes [1]. To extend this initial observation beyond myocardial tissue and selected pathways, we used a systems biology approach based on KEGG pathways (Kyoto Encyclopedia of Genes and Genomes [2]) in a large collection of ~2400 mammalian tissue samples derived from more than 20 diseased and non-diseased tissues. As a result, we identified a robust genome-wide reciprocal regulation of metabolic and cell signaling pathways which was present across all 20 different tissues examined.
Results
We examined gene expression patterns across 20 large microarray datasets of different human tissues by comparing, in each tissue type, the 10 samples with the highest vs. the lowest gene expression of transcripts belonging to the KEGG pathway of oxidative phosphorylation (OXPHOS) using Significance Analysis of Microarrays [3]. The differentially expressed genes were then grouped into KEGG pathways and depicted as a heat map where KEGG pathways were sorted based on their similarity to OXPHOS expression. A highly coordinated transcriptional response pattern became apparent, as all major metabolic pathways were positively correlated to OXPHOS expression, while cell signaling pathways were inversely correlated to OXPHOS (Figures 1A, B, and Additional Files 1A-1C; detailed study and sample characteristics are listed in Additional Files 2 and 3). What is more, using serial comparisons of large microarray datasets of human colon, myocardial, bladder, leukocytes and breast cancer samples, we found that the total number of differentially expressed genes declined monotonically when tissue samples with decreasing differences in OXPHOS expression were compared to each other (Figures 2A and 2B). Finally, tissue samples with similar expression levels of metabolic transcripts did not show any differences in gene expression (Figure 2B, comparisons 8-10), that is, the differences in metabolic gene expression predict the magnitude of differences for signaling and all other pathways. Thus, the highly coordinated genome-wide transcriptional response which was observed in gene expression datasets of both malignant and non-malignant tissue impacts on the pattern (Figures 1A and 1B) and magnitude (Figure 2B) of the observed gene expression changes.
To test the hypothesis that the majority of gene expression changes invariably occur along the metabolic - signal transduction axis, we examined gene expression patterns of diverse pathophysiological processes, such as malignant growth, heart failure of ischemic and non-ischemic origin, atrial fibrillation, ageing, liver cirrhosis, psoriasis, diabetes, malaria and inflammatory bowel disease (a complete list of the datasets is given in Additional File 2). When the net direction of regulation between the MAPK and OXPHOS pathways was compared across all human and animal microarray studies, defined as the number of up- minus down-regulated genes of these KEGG pathways expressed as percentage of the total number of regulated genes within a study, a negative correlation was found (Figure 3C), whereas TCA-cycle and OXPHOS pathways as well as JAK-STAT and MAPK pathways showed a positive correlation (Figure 3A and 3D, respectively). Remarkably, the tight regulation extended beyond KEGG pathways important for metabolic and signaling functions, as evident by the positive correlation between OXPHOS and proteasomal transcripts (Figure 3B), as well as KEGG pathways of "protein export", "cell cycle" and ubiquitin-mediated proteolysis" (Figure 4B). In contrast, "calcium-mediated signaling", and structural components important for cell-cell contact (e.g. "cell adhesion molecules", "tight junctions", "gap junctions", "adherens junctions") were negatively correlated with OXPHOS (Figure 4B; the complete list is given in Additional Files 1A-1C;). Taken together, these data suggest that in a wide range of physiological and pathophysiological conditions, gene expression changes are not random, but instead exhibit a recurring pattern along a transcriptional axis which is characterized by an inverse regulation of major metabolic and cell signaling pathways (Figure 4A). Importantly, transcriptional changes along this axis accounted for >80% of the transcriptional alterations across all datasets (as defined by the number of KEGG pathways that show a statistically significant Pearson correlation coefficient to the OXPHOS pathway, p < 0.05).
The significance of this transcriptional pattern is highlighted by its predicted impact on the proteome: First, significant differences in protein structure were noted between proteins of metabolic vs. signaling pathways. Intrinsically unstructured proteins (IUPs) lack a rigid 3D structure and possess an increased exposed surface area, facilitating interaction with multiple targets [4,5]. These and other properties are ideal for proteins that mediate signaling, transcription and coordinate regulatory events, where binding to multiple partners in high-specificity/low-affinity interactions are paramount [5]. In line with this finding, intrinsic disorder is found in disproportionately higher frequency in proteins belonging to cell signaling compared with metabolic pathways (Figure 5). Second, posttranslational modifications such as phosphorylation can affect the abundance or half-life of certain IUPs [6,7]. Computational studies using phosphorylation site-prediction methods have suggested that unstructured regions are enriched for sites that can be post-translationally modified [8]. We analyzed the predicted occurrence of mucin-type O-glycosylation (O-GalNAc), N-glycosylation, SUMOylation (Small Ubiquitin-like Modifier) and 212 kinase phosphorylation sites and found that these post-translational modification sites were significantly enriched in signaling compared to metabolic pathways (Figures 6A-F). Of note, differences in tyrosine phosphorylation sites between metabolic and signaling pathways were not as pronounced as differences in serine/threonine phosphorylation sites, with the latter being significantly enriched in signaling pathways (Figure 6F). Overall, this indicates that proteins of the signaling pathways are not only the source but also a preferred target of post-translational modification, which may be an important mechanism for fine-tuning their function and possibly also controlling their availability.
Discussion
Cells react to changes in their environment by a coordinated transcriptional response. Using a meta-analysis of more than 40 diverse microarray studies which included different microarray platforms (long and short oligonucleotide arrays, cDNA and bead microarrays) and different methods of normalizations (MAS5, RMA, GC-RMA, VSN, LOWESS), we demonstrate a robust interaction between gene expression in signaling and metabolic pathways. While metabolic pathways were positively correlated to each other, they were negatively correlated to signal transduction pathways. Several findings suggest that this characteristic gene expression pattern represents a novel paradigm for mammalian transcriptional regulation. First, this coordinated transcriptional pattern occurred in a wide variety of physiological and pathophysiological conditions and was identified in all 20 different tissue types examined. Importantly, it occurred independently of the proliferative potential of the underlying tissue, as the inverse regulation of metabolism and signal transduction was observed in terminally differentiated organs like brain and heart, but also in more rapidly dividing malignant tumors. Second, and most strikingly, these changes in steady-state mRNA levels predict a profound effect on the proteome, as KEGG cell signaling pathways are characterized by an increased magnitude of IUPs as compared to metabolic and biosynthetic pathways. The lack of a rigid 3D structure in IUPs is thought to provide several functional advantages, including conformational flexibility to interact with multiple targets, increased interaction surface area, and accessible post-translational modification sites [4,5]. These and other properties are ideal for proteins that mediate signaling, transcription and coordinate regulatory events, where binding to multiple partners and high-specificity/low-affinity interactions play a crucial role [5]. The critical role of IUPs in signaling is further supported by the finding that eukaryotic proteomes, characterized by their rich interaction networks, are highly enriched in IUPs compared to prokaryotes [9]. An increase of IUPs has been associated with perturbed cellular signaling in a wide range of pathological conditions such as cancer, diabetes, and neurodegenerative diseases; thus, intracellular levels of IUPs need to be tightly controlled [10]. Gsponer et al. demonstrated that IUPs as a class had a significantly shorter half-life and lower abundance compared to highly structured proteins in both unicellular and multicellular organisms, suggesting an evolutionarily conserved pattern [10]. Consistent with its role as an ATP-consuming proteolytic system [11], gene expression of proteasomal degradation pathways was positively correlated with metabolic pathways (Figures 3B and 4B). In addition to D- and KEN-boxes, ubiquitin proteasome-dependent degradation is mediated by the N-end-rule and PEST-mediated degradation pathways. Consistent with the shorter protein half-life of IUPs compared to structured proteins [10], recent studies have found IUPs to contain a significantly greater fraction of PEST motifs (regions rich in proline, glutamic acid, serine, and threonine), while no differences were noted for the N-end-rule pathway [10,12]. Importantly, the 20S proteasome can distinguish between intrinsically unstructured and other proteins, as it can digest IUPs under conditions in which native, and even molten globule states, are resistant to degradation [13]. In line with this finding, it has been suggested that the 20S proteasome degradation assay provides a powerful system for operational definition of IUPs [13]. While protein degradation is not determined by a single characteristic, but is a multi-factorial process that shows large protein-to-protein variations [14], it is tempting to speculate that an increased abundance of proteins belonging to metabolic pathways contributes to the down-regulation of signaling pathways via concurrent up-regulation of proteasomal degradation pathways.
Conclusions
In summary, proteins in signaling and metabolic pathways have fundamentally different properties ranging from inversely regulated transcriptional patterns (Figures 1 and 3), abundance and stability of respective mRNAs to underlying differences in the translational rate, protein abundance and stability [10]. Additionally, profound differences in post-translational modifications exist between signaling and metabolic pathways, as evident by differences in SUMOylation, mucin-type O-glycosylation, N-glycosylation and serine/threonine phosphorylation sites (Figure 6). Ultimately, this novel transcriptional pattern provides a unifying concept for the interpretation of heterogeneous and multi-dimensional microarray datasets, as the dynamic interaction between cellular signaling and metabolic pathways impacts on the quantity (Figure 2B) and pattern (Figures 1, 3 and 4) of the observed gene expression changes. Given the widespread occurrence of this transcriptional pattern and the predicted differences in IUPs, protein stability and post-translational modifications, we propose the reciprocal relationship between metabolic and signaling pathways as a new canonical principle for transcriptional regulation in mammalian biology.
Study Limitations
In the present study, we noted a striking and robust reciprocal correlation of transcriptional changes between metabolic and signaling pathways. Importantly, correlations do not prove cause and effect. Therefore, we can not determine whether transcriptional changes in metabolic activity anticipate changes in signaling pathways or vice versa. While this study was centered on pathway analysis, future studies will need to identify individual genes or hub nodes that connect metabolic and signaling pathways. In addition, the role of up- and down-stream regulatory events, e.g. transcription factors, miRNAs, splicing, 3' end termination and/or stability of mRNAs need to be examined.
Future studies will need to address the role of this transcriptional pattern in various disease processes. While the association of IUPs with various disease processes might suggest that down-regulation of metabolism and up-regulation of signaling pathways is a common theme in a wide range of disease processes, we found this generalization is not universal. This could be related to a different baseline level of OXPHOS activity in various tissues and cancer specimens and/or differences in tissue handling. Clearly, future studies need to address whether this transcriptional pattern will help in refining the distinction between diseased and non-diseased tissue samples.
Methods
Gene Expression Data
Public datasets were obtained from the GEO database [15]. A detailed summary of all datasets used in the present meta-analysis is given in Additional File 2. The criteria for the selection of the dataset were as follows: (1) whole-genome coverage of microarray platforms (covering ≥ 20,000 transcripts; the only exception was the comparison between human adult and fetal hearts, for which whole-genome microarray datasets were not publicly available), (2) quality of normalization procedure: comparable levels of mean signal intensity and variance of signal intensity across experimental groups, (3) non-myocardial tissue datasets had to include at least 50 samples and (4) human myocardial datasets had to have more than ten non-failing samples.
Statistical Analysis
To determine differentially expressed genes, unpaired two-class Significance Analysis of Microarrays (SAM) was used [3]. Differences in gene expression were regarded as statistically significant if a false discovery rate (FDR) of q<0.05 was achieved. Functional annotation of differentially expressed genes was based on the KEGG pathways database. Overrepresentation of specific KEGG pathways in a gene set was statistically analyzed by the Database for Annotation, Visualization and Integrated Discovery (DAVID) [16]. The net regulation of a pathway was defined as number of up- minus down-regulated transcripts of a KEGG pathway expressed as percentage of the total number of regulated genes within a study. Clustering of the expression of KEGG pathways and phosphorylation sites was done using Genesis [17].
Batch prediction of long disordered regions was carried out using the IUPforest-L software, based on the Moreau-Broto autocorrelation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences [18]. Non-parametrical rank tests (Kolmogorov-Smirnoff and Wilcoxon) incorporated into StatView (SAS Institute Inc., NC, USA) were used to determine statistical significance for the distribution of IUP across metabolic and signaling pathways. Batch prediction of N-glycosylation, mucin-type O-glycosylation, SUMOylation and protein kinase phosphorylation sites were carried out using NetNGlyc 1.0 http://www.cbs.dtu.dk/services/NetNGlyc, NetOGlyc 3.1 [19], SUMOsp 2.0 [20], and GPS 2.1 [21], respectively.
Abbreviations
IUP: Intrinsically Unstructured Proteins; KEGG: Kyoto Encyclopedia of Genes and Genomes; OXPHOS: Oxidative Phosphorylation; SAM: Significance Analysis of Microarrays; DAVID: Database for Annotation, Visualization and Integrated Discovery; GEO: Gene Expression Omnibus.
Authors' contributions
ASB conceived the study, carried out the experiments and drafted the manuscript. AK and CC provided assistance with the bioinformatic and statistical analysis, respectively; KBM and TPC participated in study design. GFT conceived the study and drafted the manuscript. All authors read and approved the final manuscript.
Supplementary Material
Contributor Information
Andreas S Barth, Email: abarth3@jhmi.edu.
Ami Kumordzie, Email: aek0049@jhu.edu.
Carlo Colantuoni, Email: ccolantu@jhsph.edu.
Kenneth B Margulies, Email: ken.margulies@uphs.upenn.edu.
Thomas P Cappola, Email: thomas.cappola@uphs.upenn.edu.
Gordon F Tomaselli, Email: gtomase1@jhmi.edu.
Acknowledgements
The work was supported by NIH P01 HL077180, HL072488, R33 HL087345 and RC1HL099892 to G.F.T., R01 AG17022 to K.B.M., R01 HL088577 and R21 HL092379 to T.P.C., and NIH T32 HL007227 to A.S.B. G.F.T. is the Michel Mirowski M.D. Professor of Cardiology.
References
- Barth AS, Aiba T, Halperin V, DiSilvestre D, Chakir K, Colantuoni C, Tunin RS, Dimaano VL, Yu W, Abraham TP. et al. Cardiac resynchronization therapy corrects dyssynchrony-induced regional gene expression changes on a genomic level. Circ Cardiovasc Genet. 2009;2(4):371–378. doi: 10.1161/CIRCGENETICS.108.832345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98(9):5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriwacki RW, Hengst L, Tennant L, Reed SI, Wright PE. Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state: conformational disorder mediates binding diversity. Proc Natl Acad Sci USA. 1996;93(21):11504–11509. doi: 10.1073/pnas.93.21.11504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys. 2008;37:215–246. doi: 10.1146/annurev.biophys.37.032807.125924. [DOI] [PubMed] [Google Scholar]
- Chu I, Sun J, Arnaout A, Kahn H, Hanna W, Narod S, Sun P, Tan CK, Hengst L, Slingerland J. p27 phosphorylation by Src regulates inhibition of cyclin E-Cdk2. Cell. 2007;128(2):281–294. doi: 10.1016/j.cell.2006.11.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimmler M, Wang Y, Mund T, Cilensek Z, Keidel EM, Waddell MB, Jakel H, Kullmann M, Kriwacki RW, Hengst L. Cdk-inhibitory activity and stability of p27Kip1 are directly regulated by oncogenic tyrosine kinases. Cell. 2007;128(2):269–280. doi: 10.1016/j.cell.2006.11.047. [DOI] [PubMed] [Google Scholar]
- Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32(3):1037–1049. doi: 10.1093/nar/gkh253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337(3):635–645. doi: 10.1016/j.jmb.2004.02.002. [DOI] [PubMed] [Google Scholar]
- Gsponer J, Futschik ME, Teichmann SA, Babu MM. Tight regulation of unstructured proteins: from transcript synthesis to protein degradation. Science. 2008;322(5906):1365–1368. doi: 10.1126/science.1163581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hershko A, Ciechanover A, Rose IA. Resolution of the ATP-dependent proteolytic system from reticulocytes: a component that interacts with ATP. Proc Natl Acad Sci USA. 1979;76(7):3107–3110. doi: 10.1073/pnas.76.7.3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh GP, Ganapathi M, Sandhu KS, Dash D. Intrinsic unstructuredness and abundance of PEST motifs in eukaryotic proteomes. Proteins. 2006;62(2):309–315. doi: 10.1002/prot.20746. [DOI] [PubMed] [Google Scholar]
- Tsvetkov P, Asher G, Paz A, Reuven N, Sussman JL, Silman I, Shaul Y. Operational definition of intrinsically unstructured protein sequences based on susceptibility to the 20S proteasome. Proteins. 2008;70(4):1357–1366. doi: 10.1002/prot.21614. [DOI] [PubMed] [Google Scholar]
- Tompa P, Prilusky J, Silman I, Sussman JL. Structural disorder serves as a weak signal for intracellular protein degradation. Proteins. 2008;71(2):903–909. doi: 10.1002/prot.21773. [DOI] [PubMed] [Google Scholar]
- Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–210. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4(5):P3. doi: 10.1186/gb-2003-4-5-p3. [DOI] [PubMed] [Google Scholar]
- Sturn A, Quackenbush J, Trajanoski Z. Genesis: cluster analysis of microarray data. Bioinformatics. 2002;18(1):207–208. doi: 10.1093/bioinformatics/18.1.207. [DOI] [PubMed] [Google Scholar]
- Han P, Zhang X, Norton RS, Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics. 2009;10:8. doi: 10.1186/1471-2105-10-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Julenius K, Molgaard A, Gupta R, Brunak S. Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology. 2005;15(2):153–164. doi: 10.1093/glycob/cwh151. [DOI] [PubMed] [Google Scholar]
- Ren J, Gao X, Jin C, Zhu M, Wang X, Shaw A, Wen L, Yao X, Xue Y. Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0. Proteomics. 2009;9(12):3409–3412. doi: 10.1002/pmic.200800646. [DOI] [PubMed] [Google Scholar]
- Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics. 2008;7(9):1598–1608. doi: 10.1074/mcp.M700574-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.